Black Swans in Artificial Intelligence
This article is a cutout of my forthcoming book that you can sign up for here: https://www.danrose.ai/book
A significant concept in understanding your data is the concept of Black Swans. The black swan theory was coined by statistician and author of Fooled by Randomness Nassim Nicholas Taleb. A book I can only recommend.
For many years it was commonly known that black swans did not exist. As black swans had never been observed, they did not exist in any data. Had you at that time put your bid on the chance that the next swan you saw would then be black, you would probably bet against such an event. It turned out that there were lots of black swans. They just had not been observed yet. They first became so when we discovered Australia, which was full of black swans. In other words, the data only represented the known and observed world and not the actual world.
That is also an excellent time to mention that data is only historical. And as the notion goes in data science, historical data is quite bad, but the best we got.
Black Swans are, on an individual level, very rare. The latest Covid pandemic is a testament to a rare black Swan event.
One could also mention the financial crisis in 2007. Historically, the housing market never crashed, so any model built on historical data could not have predicted such an event.
But Black swans are not rare on an aggregated level. They are more common than you would intuitively think. Pandemics, crashing housing markets or war in Europe seem like such unique events that they must be uncommon. But less media-attractive firsts and rare events happen all the time.
As such, you should also expect black swans events. Whatever your data shows from historical events is just history. Relying on it should be done with the knowledge that the future can be vastly different. That also translates to the accuracy of models. As AI models use parts of the historical training data to calculate their accuracy, they do so with the past in mind. As a result, you should always expect AI models to perform at least a bit worse in production than they provide as accuracy.
You should also not try to predict black swans. They are, by nature, unpredictable. Instead, make sure the processes. Especially the Decision models and decision levels consider that a black swan can appear at any minute.
For more tips, sign up for the book here: https://www.danrose.ai/book