Confusion Matrix in Machine Learning: TP, TN, FP, FN, Precision, and Recall

When working on a classification model, one overall score is usually not enough. A model that misses many real positives is very different from a model that raises too many false alarms, even if their total accuracy looks similar.

That is why the confusion matrix matters. Instead of giving only one final score, it breaks prediction results into a small set of basic outcomes that can then be used to calculate precision, recall, accuracy, and F1.

This article covers these parts:

What is a confusion matrix
Calculation of confusion matrix in binary classification problems

A quick way to remember the four terms

The easiest way to avoid confusion is to use this order:

First check whether the model predicted positive or negative
Then check whether that prediction was correct

That gives you:

Predicted positive and correct: True Positive (TP)
Predicted positive and wrong: False Positive (FP)
Predicted negative and correct: True Negative (TN)
Predicted negative and wrong: False Negative (FN)

What is a Confusion Matrix

In binary classification, the confusion matrix is built from four basic outcomes:

True Positive (TP): actually positive and predicted positive
False Negative (FN): actually positive but predicted negative
False Positive (FP): actually negative but predicted positive
True Negative (TN): actually negative and predicted negative

As shown below:

confusion_matrix

Here is a more concrete example using a pregnancy test scenario:

confusion_matrix_pre

True Positive:

The test result says “pregnant”, and the real result is also pregnant.

True Negative:

The test result says “not pregnant”, and the real result is also not pregnant.

False Positive (Type I error):

The test result says “pregnant”, but the actual result is not pregnant.

False Negative (Type II error):

The test result says “not pregnant”, but the actual result is actually pregnant.

In summary, first look at whether the prediction is positive or negative, then look at whether it is correct.

If predicted positive and prediction is correct, it is True Positive.
If predicted positive but prediction is wrong, it is False Positive.
If predicted negative and prediction is correct, it is True Negative.
If predicted negative but prediction is wrong, it is False Negative.

Calculation of Confusion Matrix in Binary Classification Problems

Once the four cells of the confusion matrix are clear, you can compute common metrics such as recall, precision, accuracy, and F1.

math

Recall Calculation

recall = TP / (TP + FN)

This measures how many of the truly positive samples were successfully found by the model.

If your task cares more about avoiding missed positives, recall becomes especially important.

In the example, samples 2, 4, 5, and 7 are truly positive. The model predicts only 2 and 4 as positive, so recall is 2/4 = 1/2.

Precision Calculation

precision = TP / (TP + FP)

This measures how many predicted positives are actually correct.

If your task cares more about avoiding false alarms, precision matters more.

In the example, three samples (2, 3, and 4) are predicted as positive, and samples 2 and 4 are truly positive, so precision is 2/3.

Accuracy Calculation

accuracy = (TP + TN) / (TP + FP + TN + FN)

This is the proportion of correctly classified samples among all samples.

In the example, there are 7 samples in total, and samples 1, 2, 4, and 6 are classified correctly, so accuracy is 4/7.

However, accuracy can be misleading when classes are highly imbalanced.

In practice, recall and precision often move in different directions. To compare models when both matter, people often use F-measure, usually the F1 score.

f-measure Calculation

The f-measure, or F1 score:

f_measure = 2 * recall * precision / (recall + precision)

F1 combines recall and precision into one value and is useful when both are important.

How to use the confusion matrix in practice

When reading a confusion matrix, do not focus on only one metric. The more useful questions are:

Is the business problem more sensitive to missed positives or false alarms?
Are the classes severely imbalanced?
Do you need to tune a threshold and trade precision against recall?

For example:

Medical screening often cares more about recall because missing a real positive case can be costly
Filtering or review pipelines may care more about precision because false alarms create unnecessary downstream work
In highly imbalanced data, accuracy alone is often not enough, so precision, recall, and F1 should also be checked

Summary

This article introduced one of the most important evaluation tools for classification problems: the confusion matrix. It also explained TP, TN, FP, and FN, and showed how they are used to calculate recall, precision, accuracy, and F1.

In real work, the goal is not just to memorize formulas, but to understand which kind of error matters more in the business context. Once that part is clear, confusion-matrix-based evaluation becomes much easier to interpret.