Recall and precision: The most important AI concepts for product managers
If you are a product manager, an entrepreneur or in any way responsible for the business side of an AI project, you need to get the concept of recall and precision. Getting this straight will allow you to make better product decisions and at the same have more meaningful conversation with the tech-team. So do yourself a favor and read this carefully.
Outcomes of predictions
Before we begin we have to get some basics in order. When machine learning models classify the input it receives - say, is this a picture of a cat? - a number of different outcomes can happen:
True Positive (TP): The picture was of a cat and the ML-model classified it as such
False Positive (FP): The picture was not of a cat but the ML-model classified it as a cat anyway
True Negative (TN): The picture was not of a cat and a the ML-model correctly classified it as not a cat
False Negative (FN): The picture was of a cat but the ML-model did not classified it as a cat
The application of this concept is as you’ll see in a minute extremely relevant when doing real life AI-applications.
Accuracy
First, let’s talk about accuracy. It often feels to me that this is the default understanding and way to measure the quality of applied AI. Sometimes it makes sense and sometimes it doesn’t. Accuracy is how often overall our model correctly classifies the input it gets out of all input. In a math notation that looks like this:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
Simple right? Of all cases how often is the model right. That might seem like the only obvious way to measure but let’s see some other ways here.
Precision
Precision is conceptually a little harder to grasp but once you get it, it makes a lot of sense. In a nutshell, precision is how often you hit the target spot on (identify the cat), and often you falsely predict the cat. The math looks like this:
Precision = TP / (TP + FP)
Notice how the False positives are in the denominator. This means that you get a lower score by guessing cat, when there is no cat. You should also notice that False Negatives (FN) is not in the equation. That means you only get punished for guessing a cat when there is no cat.
To sum up precision - To achieve high precision, you should only guess a cat, when you are very sure that you see a cat.
Recall
Lastly we have recall. The most unknown of the three but equally important. Recall is how often you get the prediction right without missing any True Positives(actual cat pictures). That means that you do not take into consideration how often you incorrectly guess a cat where there is no cat. Recall is in other words the “better safe than sorry” score. The math looks like this:
Recall = TP / (TP + FN)
So if you want to have a high recall you should, contrary to precision, be optimistic and guess “cat”even if you are not sure.
How do I use this information?
The idea here and the reason for the importance is that different business goals require different measurements. Some will require high recall and some will require high precision and so. Some cases could even use a combination. E.g. you can have a minimum requirement for recall and on top aim for high accuracy.
My most important point here though is that this is a business/product decision. This is not for the tech people alone to decide what is important here. It directly affects user experience and business goals.
I’ve tried to put the concepts recall and precision into real life cases for this to make more sense.
Credit card fraud - Please be precise
Let’s say you want to build a credit card fraud detector. So everytime someone swings their little piece of plastic through a machine you would like to classify if the transaction that is about to happen is fraud.
In this case True positives are the transactions that are fraud and our detector classifies as fraud.
In my mind this calls for high precision. A very small amount of transactions are actually fraud. At the same time users that have their cards declined for no reason(The False Positives) will be extremely frustrated. So remember the False Positives being in the denominator? Avoiding False Positives will provide higher precision score and that is what you want here if user satisfaction is your goal.
The autonomous car - High recall or hit the brakes
Now let’s build an autonomous car. Or at least a model that can predict if an object a head of a car is a human and should be avoided. In this model True Positive means “object is human and we guessed it as such“.
In this case it’s better to be safe than sorry. The car should rather break if not sure than to go on.
We get this with high recall. With recall we had the False Negatives in the denominator meaning that the score is lower if we miss an object.
You can put your money on accuracy
In some cases accuracy is all we need. Say that we are building a model predicting who is going to win a football tournament. In this case we don’t care about anything else than getting as right as possible overall. A True Positive here would be guessing the winner. Every other case doesn’t matter and as a result is punished in the equation.
Conclusion
I would actually advise everyone working with AI to use these concepts actively early in a project. Have discussion about how you want to measure success. Is it accuracy, precision or recall? Or maybe a combination like “highest possible accuracy given a minimum recall”.
If you do not have this discussion you either leave the discussion to the tech people, that no matter how talented they are, do not have the same business insight and customer engagement as you. Or you will often default to accuracy without that being a conscious choice.