What kind of big data problem do you have?
Does it seem like almost everything is a "big data" problem right now? And nearly every vendor is offering big data or big analytics solutions? Is big analytics more important than big data? And what is the difference? I've encountered this confusion in the market a lot over the past year as I've traveled the globe talking to business and government leaders about big data.
In the process of explaining the market to others, I've come up with a clearer way to illustrate the landscape. This explanation has helped a lot of businesses understand what type of analytic problems they actually have, and sometimes it helps them see that their problems are more of the big analytics variety instead of the standard big data issue.
Sometimes, for example, you don't have that much data, but it's still taking you five hours to run a marketing optimization job because of the number of possible offers. There really aren't a lot of records, but you have to do multiple passes on the data, running complex algorithms with each step. That's a big analytics problem – not a big data problem.
Let's dig into those differences a bit further.
Our first step is to revisit the distinction we've made over the years between reactive and proactive analytics. Standard business reports, ad hoc reports, OLAP, and even alerts and notifications based on analytics are in the reactive category. Now, reactive analytics can still be very useful. It's required for a lot of finance and regulatory reporting, and it helps business users perform ad hoc analysis every day, but it is ultimately informing you about the past.
Proactive analytics – such as optimization, predictive modeling, forecasting and statistical analysis – is forward-looking. It allows you to identify trends, spot weaknesses or determine conditions for making decisions about the future. In addition to optimization of complex problems with many dependencies, it includes predictive modeling, regression analysis and other advanced methods for proactive decision making.
The next thing we need to define is big data. Put simply, when you have exceeded the capacity of conventional database systems, you're dealing with big data. Before that, it's what I like to call "growing data." It is still a large amount of data, but it hasn't hit the limitations seen with big data.
Today, we can store lots and lots of data, but processing times have become excessive because traditional storage environments are not conducive for proactive analytics. When you have reached a point where processing times become unacceptable, you may be dealing with big data sizes, but you may also be dealing with big analytics.
To better understand the difference, let's create a chart with reactive and proactive analytics on the Y axis and the size of the data on the X axis, like this:
Now we can see the four major types of software solutions available in the analytics market today. They are:
Business intelligence (BI). If you are dealing with a large amount of data and providing reporting capabilities for end users so they can gain access to information, summarize data and even drill down into that data themselves, you are dealing with business intelligence applications. These solutions provide a strong look at various performance aspects of the company that occurred in past. That is BI. That is the lower-left quadrant in Figure 2.
Big data BI. Now, when data gets bigger and you're dealing with outside data sources or – as more companies are starting to see – you're pulling in unstructured data, your problems are also getting bigger. It's taking users too long to get the information they need, or it's difficult to combine data sources fast enough to provide reports the way you used to and you need technology that allows quick access to data – but you're still providing reactive analytics. This situation is the most common big data scenario in the market right now, and most businesses are trying to solve it with SQL-based solutions. That is big data BI. It is in the lower-right quadrant of Figure 2.
Big analytics. As I mentioned before, it takes a different kind of analytics to support forward-looking decisions. If you're looking at customer preferences, markdown optimizations or fraud predictions, you need a different type of architecture. These problems typically involve growing data sizes and proactive analytics. It's not the size of the data that's slowing you down, it's the fact that you're making multiple passes on data that may take hours and hours to get results, and you're running advanced analytic calculations that take longer to process. For today's issues, you need those answers in seconds or minutes. This is big analytics. It is located in the upper-left quadrant of Figure 2.
Big data analytics. Now, what about organizations that have a whole lot of data and are dealing with proactive decision making? Here, we're talking about hundreds of millions of SKUs across multiple retail stores. We're looking at future sources of data, too, such as telematics data in the auto industry, which can be useful for manufacturers and insurers. These are the types of problems most businesses really haven't dealt with in past. And these aren't small data problems. You don't want to summarize that information. Manufacturers want to predict safety problems before they affect customers, while insurance companies want to adjust rate plans for the best drivers, for example. This is big data analytics. You'll find it in the upper-right quadrant of Figure 2.
My point here is not to say that one is better than the other, but they each do different things and require different architectures. As you look at what's going on the market and in your business, you must understand the difference between each of these four areas and how the different problems can be solved.
Analytics continues to be a broad term in the market, but it's worthwhile to look at the problems you are trying to solve and then determine where you fall in this landscape. It will help you plan your next steps in your big data journey.