Ask an analyst their biggest pain and they’ll usually say data. Sourcing it is bad enough. Delivering it makes migraines. But, of all the problems that go along with data, there’s one that creates more ulcers than everything else combined: quality.
There’s nothing more frustrating than spending hours working with messy, unstructured, and just downright painful data. It gets so bad that rather than try and fix the problem, most people just end up side-stepping the problem and going back to the source. Even though the data they need probably already exists, they’ll go and get it all over again!
About the only satisfying thing about having bad data is the schadenfreude that goes along with it. There’s cold solace in knowing that regardless of how poor your data is, everyone else’s is equally as bad.
The thing is, poor quality data doesn’t just appear from the ether. It’s created. Leave the dirty dishes for long enough and you’ll end up with cockroaches and cholera. Ignore data quality and eventually you’ll have black holes of untrustworthy information.
Here’s the hard truth: we’re the reason bad data exists. It’s nice to shift the blame, but if we’re the people who need that data the most, we’re also probably the people with the greatest incentive to fix it.
No-one likes cleaning things. I can’t speak for anyone else, but I really dislike emptying the pantry to get rid of all those crackers that fell down the back. Even so, most of us will do big a spring clean once a year to get to all those spots we’ve missed.
It’s the same with data: most teams make an infrequent but large effort to put their ‘house’ in order.
The irony is that as well-intentioned as it is, big clean-ups only treat the symptom, not the cause. In the long run, it just leads to inefficiency, cost, and even more frustration.
Some connections are obvious even before they’re made. It doesn’t take much of a leap to think about having snack foods at the movies, enjoying ice creams in summer, or savouring a hot drink in the snow.
Other connections seem obvious after the fact. Cheese and wine or peanut butter and chocolate may not be revolutionary but in their own small way, they make the world a better place.
The most interesting connections, however, are those that at first seem totally counter-intuitive. Who would have thought that the Tamagotchi, an electronic pet that dies if you don’t push a button regularly, would end up selling over 76 million units?
It’s intuitive and natural to think that data quality is a technological problem. It’s not; it’s a cultural problem. Putting everything into a common analytical datamart doesn’t help much if everyone names things differently, does inconsistent quality checks, and uses different normalisation structures.
The real answer is that you need to create a culture that values standardised, efficient, and repeatable information. Just like it’s always impossible to find a specific tool in someone else’s garage, it’s impossible to re-use information without an up-to-date map.
Re-use, efficiency, and most importantly quality comes from fixing the problem at the root rather than simply treating the symptoms. Rather than trying to manage a shanty of half-baked source tables, effective teams put the effort into designing, maintaining, and documenting their data. Instead of being a one-off activity, it becomes part of business as usual, something that’s simply part of daily life.
The best teams have complete clarity on the mappings between their conceptual, logical, and physical data architectures. And more importantly, every change gets documented in an up-to-date data dictionary.
When it comes to analytical data quality, technology is critical. So is culture. Having one without the other is a recipe for inefficiency and wasted effort.
Evan Stubbs is the author of The Value of Business Analytics, a book that explains why teams fail or succeed. His most recent book, Delivering Business Analytics explains the link between business analytics and competitive advantage, outlines the Data Scientist’s Code (a series of management principles that move organisations towards best practice), and provides solutions to twenty-four common business analytics problems.