The great data swindle

In which plenty of mistakes are made with bad data.

Rachel Riley from Countdown

I made a comment today that with the computing power in my scientific calculator we were able to send men to the moon and back, yet the computers at work often fall over because too many people are using them. In response someone mentioned that for a misplaced full stop a space shuttle crashed and that it is just human error. Perhaps this is the problem; the world has got too complicated. There is just too much data.

The BBC ran a good article yesterday about data overload and I think that it’s pretty insightful, we have more and more information at our fingertips than ever before in human history but is it just too much noise?

I’ve worked as a data analyst for a number of companies and I have often produced data for data’s sake. We have data on the gender mix for NHS spectacle wearers under 45? We must have a weekly report for that then, despite the fact that the numbers are so low that they are meaningless.

What happens when there is a slight fluctuation in the number of people with a P in their surname having a sight test? Complete panic and an overhaul of the current direction and strategy, until the next week where the blip disappears because…well it was just a blip and blips happen.
When presenting I would often get stopped and asked do I have the number for subset X of sub-subset Y of segment Z for people who do this on a Monday. Having large volumes of information seems to mean that people ask detailed specific questions and expect an answer. The assumption seems to be that we have all the data in a brain-based pivot table ready to reel off at the drop of a hat.

Data is used as a safety blanket; it does not seem to be used as a reason to do something but as a way of protecting ourselves when something goes wrong. When you look at what can be achieved with little (but good) data it shows that knowing the minutiae of everything means you lose focus of the wider image.

In 1854 John Snow identified a cholera outbreak using nothing more than a map plotted with dots it helped saved numerous lives, today he would be asked to identify whether or not women in the 45-55 age bracket with a history of smoking were more affected. Or if there was a socio-economic reason for the outbreak, do C1’s suffer more than E3’s? We would be answering so many questions to find the answer that we would not realize that it was due to a bad water pump.
We spend billions on analysis to try to make a few more sales whilst one man got rich by just making a judgment of does a business look good. Warren Buffett did not make his billions on the back of a team of data analysts producing a myriad of reports.

The BBC article suggests that a better way to produce data is by visualization, producing images that allow lay-data people to understand. I agree with this philosophy but it often creates issues, lay-data people want to appear smart so they want to ask questions to demonstrate how on the pulse they are. More than once I have been on a meeting where the focus has been on an obscure metric that they want to know as opposed to the big fact that your sales are 20% down and you are losing money.

I realize that as a data analyst I am basically admitting my profession is a glorified snake oil salesman selling and Excel Elixir, and to an extent that is true. Data should not be used as a tool to make a decision; it is a way of finding out if whether you made the right or wrong choice. And if you were wrong I will find you the data to tell others you were right.

Author: Daddysaurus

Ah, so you worked out the riddle. You just needed to use dwarfish and the doors to Geek Ergo Sum opened. Or perhaps you just used Google. Either way you are here, on my little corner of the Internet.

