The Analytical Nerve Center

By Sheldon Shaw, Cyber Analytics, SAS

It has become apparent that Big Data analytics has moved beyond hype to the point where it’s a necessary conduit from massive captured data stores to actionable insight. Many organizations are in the position where they need “something” from their Big Data projects, but aren’t sure what that “something” is.

It’s a critical competitive differentiator: The gulf between enterprises that are scrambling to find value from Big Data projects and those that are moving forward with a clear program is widening daily. And that competitive gulf will further increase as newer implementations of the Hadoop Distributed File System (HDFS) speed analytical discovery.

We’ve seen an enormous uptake in Hadoop projects in the last two years. The open source community that supports the underlying infrastructure—built on the HDFS storage architecture and MapReduce processing architecture—has produced an enormous amount of enterprise-quality code to run on this core. Much like the evolution of open source operating system Linux to an enterprise-ready platform, Hadoop is no longer suspect. Enterprises know there is a support structure that will provide new applications and resource management frameworks that will allow them to continue down the Big Data path and push its potential beyond our imaginations.

But there’s a trap for organizations looking for immediate results—their “hyperloop” moment—from their Big Data projects. Pressure to produce can outpace ingenuity. Enterprises can invest in technology and skilled employees without understanding organizational data flow, data models, and data architectures. The enterprise must understand what it needs to know, and how its data can deliver that knowledge.

In my daily discussions with clients, I’ve found there is one characteristic of successful implementations of Hadoop that stands out from the less successful. Clients who get the most out of their Hadoop installations have moved past the idea of storage. More precisely, they have moved beyond thinking of Hadoop as merely a storage point in their infrastructure. It’s what they envision their Hadoop clusters doing that’s important.

The most visionary clients that I meet see an analytical nerve center evolving, one that allows them to perform analytics not just at a single point of their operation, but throughout. It’s a central nervous system of the organization with access to multiple data flows.

In this sense, I don’t subscribe to the “data lake” [SSB1] view of the world. More accurately, we’re discussing a nervous system that has access, if it so chooses, to machine data, customer data, security data, and all forms of flow and log data. The opportunity for these clients lies not only in the Internet of things, but in the analysis of things. These clients have the opportunity to describe an analytical process earlier in the creation of data, rather than waiting for later in the extract-transform-load (ETL) lifecycle of the transactional database.

The analytical nerve center only applies analytical action to data before sending it to the Hadoop environment. Many falsely believe that if we put all our data in Hadoop, we will eventually find something. There is an amount of analytical rigour that must be applied before Hadoop can make sense of the data.

There will be many further developments within Apache community that will allow us to interact with Hadoop in many new ways. The analytical nervous centre that I am describing won’t exist in a mutually exclusive environment; both have a role to play.

What will drive ingenuity in the short term is the unfailing need to drive actual business results from Hadoop. The novelty of constructing large clusters is gone. Now it’s time for thought leaders to press forward with analytics, not just on top of Hadoop, but at the edge of the network.

For more information on Cyber Analytics, click here.