Cyber Crime — Confusion Matrix

Tanushree jain
4 min readJun 5, 2021

Classification in ML

Classification is one of the supervised learning approach to machine learning. Classification is a process of categorizing a given set of data into classes, It can be performed on both structured or unstructured data. The process starts with predicting the class of given data points. The classes are often referred to as target, label or categories.

The classification predictive modeling is the task of approximating the mapping function from input variables to discrete output variables. The main goal is to identify which class/category the new data will fall into.

For example, John has a heart disease or not. Here, only two cases are possible either he has heart disease or doesn’t have heart disease. So, such detections can be identified as Classification problems, and this is a binary classification problem because in binary classification there can be only two classes either true or false.

There are various classification algorithms but most commonly used are Logistic Regression which is a binary classification algorithm.

Classification

Confusion Matrix in Classification

A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a integrated view of how well our classification model is performing and what kinds of errors it is making. In binary Classification we have a 2 x 2 matrix with four values.

Confusion Matrix for Binary Classification

here,

· the target variable can have only two values either positive or negative

· the columns represents actual values of target variable

· The rows represents predicted values of target variable

The most critical part of the matrix is understanding the terms TP, FP, FN and TN.

True Positive(TP)- The number of predicted value matches the actual value. The actual value was positive and the model predicted a positive value.

True Negative(TN)- The number of predicted value matches the actual value. The actual value was negative and the model predicted a negative value.

False Positive (FP) — Type 1 error- The predicted value was falsely predicted. The actual value was negative but the model predicted a positive value. Also known as the Type 1 error.

False Negative (FN) — Type 2 error- The predicted value was falsely predicted. The actual value was positive but the model predicted a negative value. Also known as the Type 2 error.

The Type 1 error is the most dangerous and crucial error as it gives the user false hope that their is no error present in model but actually there are error present in it.

Cyber Crime

Cybercrime is any criminal activity that involves a computer, networked device or a network. Some cybercrimes are carried out against computers or devices directly to damage or disable them, while others use computers or networks to spread malware, illegal information, images or other materials. Some cybercrimes do both — i.e., target computers to infect them with a computer virus, which is then spread to other machines and, sometimes, entire networks.

Lets take an example, that a company is using a machine learning model that predicts and detects cyber-attacks by using both machine-learning algorithms and the data from previous cyber-crime cases. So if the model had given confusion matrix as,

So according to the generated confusion matrix, we can say that we have False Positive(FP) = 10 that means 10 security issues may be present there but the model is predicting as no issue so that could be harmful for our company. That is why confusion matrix play a major role in cyber security. Also in finding the accuracy of model we need confusion matrix.

Accuracy = ((TP + TN)/(TP + TN + FP + FN))*100% = 150/165 = 91%

So, accuracy of our model is 91% .

More effective and secured training and warning systems can be created for people with similar characteristics by the evaluation of the characteristics of the attack victims. Crime, criminal, victim profiling and cyber-attacks can be predicted using deep learning algorithms and the results can be compared. Intelligent criminal-victim detection systems that can be useful to law enforcement agencies in the fight against crime and criminals can be created to reduce crime rates.

Thanks for reading!!

--

--