Frequentism vs bayesianism - The paradigm war that makes you understand AI better
Frequentism and bayesianism is two paradigms or schools of statistics. I’m by no means an expert in statistics and you don’t have to be either to read this post. Actually pretty simple to understand without prior knowledge about statistics.
So why read this blog post about two statistical paradigms you ask? Because it’ll make you understand AI at a whole different level than before. I promise.
There are some extremely useful learnings from understanding this paradigm war. It will make you understand the gray areas of AI. By this I mean the reason why a lot of AI often does not have hard conclusions. To boil it down to it’s essence, the two paradigms offer completely different ways of predicting and understanding the world with often completely different results but still being equally correct. That’s it. In statistics that very often the foundation of AI models different results can be equally correct. How can that be possible? Let’s dig deeper with an example.
I’m stealing this great example from the book The Signal and Noise by Nate Silver. A book that will definitely make you think differently about the world. I really recommend it.
The example is simple. You come home one day early and you find underwear in your bedroom that is definitely not yours. How do you explain this? Did your partner cheat on you or is there another explanation? The frequentist will only look at the observed evidence, the underwear. Suggesting a very high probability of adultery. The bayesian would take into account the prior beliefs. This could be the overall adultery rates and your trust with your partner. With high trust you might dismiss the evidence.
Both paradigms provide equally true ways to view the world but come with pros and cons. In this underwear example I at least don’t see a right or wrong way to approach the problem. But I see that there might be very different outcomes depending on what school you prefer in this scenario.
Now let’s learn a bit about each paradigm.
The Frequentists paradigm
The core of frequentism is that only observed data is taken into account when calculating the probability of the occurrence of an event. So what does that mean? It means that when calculating the probability of an event you take a sample of the domain you are trying to measure and observe the frequency of events. Say you want to measure the probability of landing tails when flipping a coin. The maths look like this:
P(A) = n/N
The probability of A is the number of times an event happened(landing tails) divided by the number of possible events(The number of coin flips). The frequentist reservation is that they can assume that the world is random and the results they predict are looked upon in the long term. It’s also crucial that what we do can be replicated. In the coin toss problem we will see 50% tails over the long term and this makes a lot of sense.
The Bayesianists paradigm
Bayesianism offers a way to take into account our prior beliefs when calculating a probability. In other words, we are not only looking at the observed data but including our own biases. So the bayesian statistician would start with a prior belief, observe data, add the data to the prior and end with a posterior belief. The posterior is the new probability. Bayenists divide the problem of probability into two cases. The probability of the evidence given an event is happening and the probability of evidence given the event is not happening. The maths looks like this:
P = xy * (xy + z(1-x))
x = you prior probability of event
y = conditional probability of evidence given event did occur
z = conditional probability of evidence given event did not occur
It might look a little confusing but the explanation is simple. Say you get a fever and you want to know the probability of you now having Covid-19. Let’s try it out. I’m going to be loose with the numbers here but the idea is the same. In Denmark, where I’m located, the Covid-19 virus is not very present. Only about 1.200 out of 5.8 million people are currently affected. That’s 0.02% and that’s our x. Due to social distancing other viral diseases and as a result fever is very rare too. So I’m going to put the probability of having a fever without being affected by Covid-19 low. Let’s say 20%. That’s our “z”. If you have Covid-19 then fever is very likely. It’s a common symptom. Let’s say 80%. That’s our “y”. So the math is now like this:
P = 0.0002 * 0.8 / ( 0.0002 * 0.8 + 0.2(1-0.0002)) =0.08%
So what just happened? Before you had a fever your chance of having Covid-19 you had a 0.02%. But after this new piece of evidence(the fever) your chance was still low but had risen to 0.08%. Interesting right?
If you want to know more about Bayes and conditional probability the wikipedia article is really good: Bayes Theorem.
So what approach should you take?
Bayesian statistics takes into account your prior beliefs or in other words, your biases. If you think it would not objectively harm then you might go for a bayesian method. If you have an easily repeatable low bias situation then you might like to go for a more frequentist approach. This is of course put on the edge. It’s never just that simple.
One thing I would add is when I hear people saying stuff like “I’m really data driven in my decision making” or even worse “I only act on data never on” I hear a sign of danger. Believing that data should be blindly trusted is a naive world view. The fundamental error in that is that you never know what data you haven’t seen and if the data is biased. One good example is the housing market crash in 2007. The logic was that the housing market had never crashed and by frequentist reasoning it could not crash. The result was that the market was pushed to extremes that it hadn’t been in before.
How does this relate to AI?
So to make the connection here to AI a lot of AI is based on statistical methods. Several approaches can be taken to make AI understand the world and they may be very different. There’s no given correct way. You just have to be wary of whatever results you see since the underlying mechanics is just not so black and white. Preparing for being wrong is my best advice.
I would like to end the post with a personal pet peeve. The media today seem to have a clear tendency to prefer frequentism regardless of the matter and rarely offers the alternative view. To make it even worse, journalists will often require politicians to be totally sure that their politics are going to have the desired effects when it’s not possible to be sure. That discourages testing new politics on a small scale before going all in. If a bit more bayesianism could be applied in journalism giant political failures could be avoided.