Confusion Matrix

The confusion matrix is one of the most powerful tools for predictive analysis in machine learning. A confusion matrix gives you information about how your machine classifier has performed, pitting properly classified examples against misclassified examples.

Let’s take a look at how to interpret a confusion matrix and how a confusion matrix can be implemented in Scikit-learn for Python.

What Is a Confusion Matrix?

Perhaps you are wondering: What exactly is a “confusion matrix”?

Put simply, a confusion matrix is a tool for predictive analysis. It’s a table that compares predicted values with actual values. In the machine learning context, a confusion matrix is a metric used to quantify the performance of a machine learning classifier. The confusion matrix is used when there are two or more classes as the output of the classifier.

Confusion matrices are used to visualize important predictive analytics like recall, specificity, accuracy, and precision. Confusion matrices are useful because they give direct comparisons of values like True Positives, False Positives, True Negatives and False Negatives. In contrast, other machine learning classification metrics like “Accuracy” give less useful information, as Accuracy is simply the difference between correct predictions divided by the total number of predictions.

All estimation parameters of the confusion matrix are based on 4 basic inputs namely True Positive, False Positive, True Negative and False Negative. In order to understand what they are, let’s look at a binary-classification problem.

In the graphic below, we have a dataset with pre-chosen labels Positive (light green) and Negative (light red). Because the examples in the square area are based on the fact, we call it “The fact.” On the other hand, we are trying to learn a classification model for “The fact” by predicting the label of “The fact” via its features. Let’s define “The selection” as our predictions that we predict them as positive labels, represented as a circle inside the square. Obviously, the area outside the circle is the predictions which we predict negative labels.

True/False is used to describe our predictions with “The fact.” If a prediction conforms with the label it was chosen in “The fact,” it will be true, otherwise, it will be false.

Let’s take a deep look at area (1) in figure 1. Because this area was positive in “The fact,” but we predicted it negative so we got a false prediction. Thus, it is a False Negative (False means we were incorrect, Negative is from our predictions).

Similarly, area (2) is negative in “The fact” and also negative in our predictions, which makes it a True Negative. Likewise, area (3) is True Positive and area (4) is False Positive.

There are two possible predicted classes: "yes" and "no". If we were predicting the presence of a disease, for example, "yes" would mean they have the disease, and "no" would mean they don't have the disease

The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).

Out of those 165 cases, the classifier predicted "yes" 110 times, and "no" 55 times.

In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let's now define the most basic terms, which are whole numbers (not rates):

true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.

true negatives (TN): We predicted no, and they don't have the disease.

false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.")

false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")

Precision

Precision is similar to recall, in the respect that it’s concerned with your model’s predictions of positive examples. However, precision measures something a little different.

Precision is interested in the number of genuinely positive examples your model identified against all the examples it labeled positive.

Mathematically, it is the number of true positives divided by the true positives plus the false positives.

If the distinction between recall and precision is still a little fuzzy to you, just think of it like this:

Precision answers this question: What percentage of all chosen positive examples is genuinely positive?

Recall answers this question: What percentage of all total positive examples in your dataset did your model identify?

Specificity

If sensitivity/recall is concerned with the true positive rate, specificity is concerned with tracking the true negative rate.

Specificity is the number of genuinely negative examples your model identified divided by total negative examples in dataset. It is mathematically defined by the proportion of true negative examples to true negative and false positive examples.

Accuracy

Accuracy is the simplest. It defines your total number of true predictions in total dataset. It is represented by the equation of true positive and true negative examples divided by true positive, false positive, true negative and false negative examples.