A confusion matrix is a simple and powerful tool to understand the effectiveness of a classification system. We see classification systems all around us. These are simply systems that classify things or people into two categories. Health screenings for diseases, baggage and X-ray systems at the airport and even our gut reaction to people we like or dislike are all classification systems.
The confusion matrix is a representation of the performance of our classification system. There are four outcomes of a classification system.
Let’s say our baggage screening system flags a bag, there can be two outcomes – safe or dangerous. A “True Positive” is when the system correctly flags a dangerous bag. A “False Positive” is when the system falsely flags a safe bag. False positives are called “Type I” errors.
Next, a “False Negative” is when the system doesn’t flag a dangerous bag. The terminology gets a bit confusing here. The simple translation of a false negative is that the negative predicted was false. These are called “Type II” errors. Finally, a “True Negative” is a correctly predicted safe bag.
Source and thanks to: Codeproject.com
Based on these 4 metrics, we can now construct a good picture of the quality of the classification system. Some of the common metrics are –
1. Prevalence – how often in our sample do we find a yes? (True Positives + False Negatives) / Total of all 4
2. Accuracy – how often is the classifier correct? = (True Positives + True Negatives) / Total of all 4
3. False positive rate – when it is actually no, how often does it predict yes? = False Positives / (False Positives + True Negatives)
4. True Positive rate or Recall – when it is actually yes, how often does it predict yes? = True Positives / (True Positives + False Negatives)
5. Precision – when it predicts yes, how often is it correct? = True Positives / (True Positives + False Positives)
These metrics typically give us the health of a classification system. There isn’t one “standard” for what the health should be. Ideally, you want as small a false positive and false negative rate as possible. But, building hyper accurate systems is very expensive – especially in cases where the prevalence is low. So, it comes down to what trade offs we are willing to make. For example, a cancer test that has a high false positive rate is problematic since it unnecessarily jeopardizes the health and happiness of someone falsely flagged. However, all cancer tests need to have very low false negative rates. This is why healthcare systems often set up less accurate preliminary screenings before moving us over to advanced expensive screenings that have higher accuracy.
If you’re ever wondering if a classifier is working well, just plot the confusion matrix. And, more importantly, when you take important medical tests, understand the False Positive and True Positive rates.
PS: It takes a repeated, intentional study of the Confusion matrix to feel comfortable with it. It is called the “Confusion” matrix for a reason.