Are you late to the big data party?

You may have missed the important up-front work if you are

by Jill Dyché, Vice President, SAS Best Practices

I'm not gonna lie. I occasionally find myself begrudging what I call the data-come-latelies. They’re consultants, journalists, bloggers and vendors who have recently embraced all things data. Big data. Extreme data. Multistructured data. Data everywhere. It’s as if they’re the first at the cocktail reception to have read the new novel, and they want to explain the story to the rest of the partygoers.

But in reality they’re late to the party. My fellow revelers and I have been livin’ la data loca for a decade or longer. Over the years we’ve found ourselves normalizing, correcting, annotating, moving, verifying, denormalizing, loading, analyzing and defending data. Like most parties, sometimes it’s fun. Sometimes there are adult beverages. And sometimes you’d just as soon have stayed home.

Bringing data together from across heterogeneous data sources is hard work. Even the brightest data scientist would admit he spends most of his time preparing data to be analyzed, not analyzing it. Sure, data preparation is cumbersome. But that’s because it comes from so many different places. The data scientist has to know the data source. And then he has to know where other versions of that data are. He has to access those data sources. He has to reconcile the data. Many data scientists have enlisted the help of programmers so they can actually do their day jobs. It’s the old 80-20 rule, and right now 80 percent of the time is being spent just trying to make sense of it all.

Bringing data together from across heterogeneous data sources is hard work ... What if we had a tool to automate it all?

And this isn’t just true of big data – it’s true of all data. One CIO who says his company isn’t quite ready for big data whispered, “We have multiple versions of the single version of the truth.” Big data or no, that’s the brutal reality at most companies.

What if we didn’t have to rummage around in all those varied, often-contradictory, deceptively complex, mixed-legacy source systems? What if we could instead navigate to the authoritative version of the data and provision it back to our operational systems and analytics applications? What if we had a tool to automate it all?

Westwood Vacations, the fictional hospitality company featured in a new e-book, has the tool. It’s called data virtualization, and – as you’ll read – it helped the company increase the velocity of high-value analytics by automating connections to the sources, abstracting the data layer and … On second thought, I’ll let you read how they did it. The result was getting reports into the hands of decision makers. Costs saved. Revenue generated. Customers contacted with relevant messaging. Mission accomplished.

Maybe the tale of Westwood Vacations and its pragmatic executive team will inspire you to add data virtualization to your analytics toolbox. Or maybe you’ll continue to rely on a merry band of programmers, data stewards, business analysts and data scientists who are still struggling to figure out where all the data really is. Sure, it’s keeping them busy. But it’s no party.

Jill Dyche

JILL DYCHÉ is an acknowledged speaker, author, and blogger on the topic of aligning IT with business solutions. As the Vice President of SAS Best Practices, she speaks, writes, and blogs about the business value of analytics and information.

Jill has counseled executive teams and boards of directors on the strategic  importance of their information investments. Executives from companies including Charles Schwab, Verizon, and Microsoft have relied Jill’s counsel for data strategy planning and execution.


Read More