The seven steps of big data delivery
Everything you need to know to get started
Remember Tim Allen's character on the 1990s hit show, Home Improvement? When Allen – outfitted in his hard hat and tool belt – starts talking shop, he makes a simian grunt that represents the particular pleasure men take from high-performance machines: "Arghh." Belt sander? Arghh. Sliding miter saw? Arghh. Classic Ford truck with a blown and injected 426 Hemi? Arrrgggggghhhh!
The big data trend represents the evolving need to process large amounts of data with a new crop of technology solutions that aren't necessarily your father's database. So, what does a company need to consider when contemplating getting started with big data?
Before we go too far, here's my definition of big data: The emerging technologies and practices that enable the collection, processing, discovery and storage of large volumes of structured and unstructured data quickly and cost-effectively.
Big data – from financial trades to human genomes to telemetry sensors in cars to social media interactions to Web logs and beyond – is expensive to process and store in traditional databases. To solve that problem, new technologies use open source solutions and commodity hardware to store data efficiently, parallelize workloads and deliver screaming-fast processing power.
As more IT departments research big data alternatives, the discussion centers on stacks, processing speeds and platforms. Inasmuch as IT departments are savvy enough to grasp the limitations of their incumbent technologies, many can't articulate the business value of these alternative solutions, let alone how they will classify and prioritize the data once they identify it. Enter big data governance.
As companies develop their big data business cases, the platform and speed discussions are only part of the overall conversation about big data delivery. In reality, we're seeing seven steps necessary for realizing the full potential of big data:
By establishing processes and guiding principles, governance sanctions behaviors around data. And big data needs to be governed according to its intended consumption. Otherwise, the risk is disaffection of constituents, not to mention overinvestment.
Most of the early adopters charged with researching and acquiring big data solutions focus on the Collect and Store steps at the expense of the others. The question is implicit: "How do we gather all these petabytes of data, and where do we put 'em all once we have 'em?"
But the processes for defining discrete business requirements for big data still elude many IT departments. Business people often see the big data trend as just another pretext for IT résumé building with no clear endgame. Such an environ-ment of mutual cynicism is the single biggest culprit for why big data never transcends the tire-kicking phase.
As Lorraine Lawson, author of IT Business Edge, said in a recent blog post, "The only way to ensure your analysis is sound is to ensure you have a governance program in place for big data."
Entrenching data governance processes on behalf of a big data effort ensures that:
In short, data governance means that the application of big data spurs business results. It's an insurance policy that guarantees that the right questions are being asked. So the immense power of new big data technologies is being truly harnessed to make processing, storage and delivery speed more cost-effective and more nimble than ever.
This story appears in the Fourth Quarter 2012 issue of