Likelihood

July 31, 2019

Summary: Statistics < Data Science < ?

I could tell something must have gone wrong. The last stretches of the highway leading up to Tai Lam Tunnel — including the 3.8-kilometer tunnel itself — were brought to a complete standstill.

As it turns out, there was a crash involving two double-decker buses, injuring 77 people.

Before I ramble on, I'd like to wish everyone who's injured a speedy recovery.

I happened to be minutes behind the crash. While I was stuck in the heavy traffic with nothing better to do, I couldn't help but sat there and overthought the situation a bit.

Statistics

Back in the days of traditional statistics, the mathematical analysis would be something like this:

What are the odds of being involved in that accident, out of...

All the people
Who were headed that direction
Taking that particular means of transportation
At that specific time
(Plus a whole bunch of other factors)

Don't worry, the result isn't something that's alarming at all. But at the same time, the ballpark figure also isn't telling you much.

Data Science

And then there is modern-day data science.

How likely is it for someone to be involved in that accident, given:

The number of past accidents
That occurred in the proximity of the crash site
Around the same time of the day
In a similar model of vehicle

Here's the thing, though. The prediction can range from wildly inaccurate to highly accurate, depending upon various conditions.

Has enough data been used? Is the data relevant?

Have enough elements been taken into account? Did we forget to consider things like weather, driver's physical state, materials used, et cetera et cetera?

And that is precisely the danger of relying too heavily on half-baked, "data-driven" "A.I." algorithms in any decision-making process.

I have shared similar thoughts on these risks in some of my past talks, except with different cases and scenarios.

(But now it's time for me to go to bed and stop overthinking random things for a few hours.)