Have we reached a point in the “big data” hype cycle where the term is overused and undefined? Is there legitimate opportunity with today’s new data sources? Are your analysts and technology platforms equipped to make the most of the opportunity?
To find out how real businesses are approaching analytics in a world where Hadoop, Hive, Pig and data scientists are all the rage, SAS invited executives from three top partners to discuss the reality and the hype surrounding big data. The panel included:
- Chris Twogood, VP of Product Management, Teradata
- Tony Hamiltonn, Enterprise Marketing Manager, Intel
- Vince Dell’Anno, Senior Executive, Accenture
Scott VanValkenburgh, Senior Director of Alliances at SAS, moderated the panel in Las Vegas at Analytics 2012 by posing a series of true/false type statements and asking whether the statement is reality or hype. Keep reading to see if you agree with their assessments.
Reality or hype: Big data’s true value is only in large amounts of unstructured data, mainly social.
Chris Twogood: Social media is more hype than reality right now. It’s more where people want to go than where they are. A lot of businesses are evolving from looking at transactions to looking at observations and interactions. Instead of just looking at a purchase, for example, they look at what you did before the transaction. They are complementing their transaction data with other data sources, and social media data will grow and become more impactful in that capacity. But most of our customers today are looking more at location analytics, customer sequencing and other areas.
Vince Dell’Anno: We do see social media come into play by companies who are viewing feeds and facilitating a real-time recommendation in context with who that person is. We are working with clients to determine not just who that customer is, but what are the indications of their clicks to you and how do you optimize your site as a result to add value to the customer experience?
Reality or hype: Using big data always leads to better predictions.
Dell’Anno: We struggle with talking about this as an industry. The total cost of ownership on the technology side is clear. On the business side, it’s hard. How do I know in advance that using a twitter feed will help improve this model? You don’t know. Plus, there are other layers of value to that data beyond just predictive analytics.
Tony Hamilton: You really have to consider the requirements under which you have your big data solution deployed. Understand some level of detail what you’re going to do with it in terms of storage, analysis and transportation. The crux is about value proposition.
Twogood: Look at how we take pictures – how many more we snap and save now that we have digital cameras. That’s how I see big data today. You don’t know what you’re going to need to know. Capture it all. It doesn’t mean you apply analytics to all of it – but you capture it all. As a group, we’re spending a lot of time thinking about how to data model big data. There needs to be a hybrid approach between sampling and big data analytics.
Dell’Anno: The big data market is really confusing right now. It’s an alphabet soup of technologies. We are mere mortals looking at what these things really mean. We’ve started to go back to basics, asking: What are your business use patterns? What are your challenges?
Reality or hype: Big data requires large scale data management. Small data requires less data management. (Hadoop, Pig, Hive and Python vs SQL, SAS Data Step, etc).
Dell’Anno: For raw storage, you can’t beat Hadoop. It goes back to how we look at the data usage patterns: Am I looking at low latency, high volume or high latency, low volume? It all depends on what you’re trying to accomplish.
Twogood: Hadoop and data management is an oxymoron. You use workload specific platforms to meet the needs. Hadoop is excellent for raw storage and refining. For what it does, it is the best platform. You need a unified data platform. Tying that together with software makes it transparent.
Dell’Anno: It used to be that structuring, quality and an elaborate data management process was needed first. Now, I can be doing analysis on the data and looking at data quality issues at the same time. As I get comfortable working with it, I can think about pushing it to a production environment.
Hamilton: We are taking Hadoop very seriously. We are looking at how to optimize enterprise technology on Hadoop. We are looking at efficiency challenges and how to make them better. It’s something you have to embrace and understand.
Reality or hype: Big data requires small math and unsophisticated analytic procedures and techniques.
Twogood: What customers are doing today with big data is not necessarily what they will be doing tomorrow with big data. The tools are relatively simple right now. A lot of calculations are hand-coded in Java or MapReduce. As more capabilities are developped, businesses will be able to do even more with their big data.
Dell’Anno: You can have censor data that’s relatively thin. You can do some basic analysis on that but that’s not where it’s interesting. When you do combinatorial math, it becomes interesting.
Reality or hype: Data scientists are completely different from traditional analysts.
Dell’Anno: The hype for this title is driven by challenges around the new technologies. Data scientists are expected to do more than coding. There’s a little bit of hype but there is a big demand.
Hamilton: We have business analysts and IT architects at Intel. What I’ve witnessed in the last six to eight months is that business analysts are starting to become more inquisitive about what is behind the architectures. They are challenging IT and getting involved.
Dell’Anno: On my team, fundamentally, they are mathematicians. They just really have a deep interest in the data. They’re motivated by the challenge to find insights that nobody else could find.
Twogood: People are trying to wrap all these skills into somebody who can do everything from coding MapReduce to sharing results. It’s an unrealistic expectation for a single person to span all those different capabilities.
Hamilton: Data scientists are the ones asking the questions. Data modelers are using tools and techniques to answer those questions.