The Knowledge Exchange / Business Analytics / Big data: Reality or hype

Big data: Reality or hype

Q&A to learn how real businesses are approaching big data with analytics

From L to R: Scott VanValkenburgh, Vince Dell’Anno, Tony Hamilton and Chris Twogood discuss the hype surrounding big data.

Have we reached a point in the “big data” hype cycle where the term is overused and undefined? Is there legitimate opportunity with today’s new data sources? Are your analysts and technology platforms equipped to make the most of the opportunity?

To find out how real businesses are approaching analytics in a world where Hadoop, Hive, Pig and data scientists are all the rage, SAS invited executives from three top partners to discuss the reality and the hype surrounding big data. The panel included:

  • Chris Twogood, VP of Product Management, Teradata
  • Tony Hamiltonn, Enterprise Marketing Manager, Intel
  • Vince Dell’Anno, Senior Executive, Accenture

Scott VanValkenburgh, Senior Director of Alliances at SAS, moderated the panel in Las Vegas at Analytics 2012 by posing a series of true/false type statements and asking whether the statement is reality or hype. Keep reading to see if you agree with their assessments.

Reality or hype: Big data’s true value is only in large amounts of unstructured data, mainly social.

Chris Twogood: Social media is more hype than reality right now. It’s more where people want to go than where they are. A lot of businesses are evolving from looking at transactions to looking at observations and interactions. Instead of just looking at a purchase, for example, they look at what you did before the transaction. They are complementing their transaction data with other data sources, and social media data will grow and become more impactful in that capacity. But most of our customers today are looking more at location analytics, customer sequencing and other areas.

Vince Dell’Anno: We do see social media come into play by companies who are viewing feeds and facilitating a real-time recommendation in context with who that person is. We are working with clients to determine not just who that customer is, but what are the indications of their clicks to you and how do you optimize your site as a result to add value to the customer experience?

Reality or hype: Using big data always leads to better predictions.

Dell’Anno: We struggle with talking about this as an industry. The total cost of ownership on the technology side is clear. On the business side, it’s hard. How do I know in advance that using a twitter feed will help improve this model? You don’t know. Plus, there are other layers of value to that data beyond just predictive analytics.

Tony Hamilton: You really have to consider the requirements under which you have your big data solution deployed. Understand some level of detail what you’re going to do with it in terms of storage, analysis and transportation. The crux is about value proposition.

Twogood: Look at how we take pictures – how many more we snap and save now that we have digital cameras. That’s how I see big data today. You don’t know what you’re going to need to know. Capture it all. It doesn’t mean you apply analytics to all of it – but you capture it all. As a group, we’re spending a lot of time thinking about how to data model big data. There needs to be a hybrid approach between sampling and big data analytics.

Dell’Anno: The big data market is really confusing right now. It’s an alphabet soup of technologies. We are mere mortals looking at what these things really mean. We’ve started to go back to basics, asking: What are your business use patterns? What are your challenges?

Reality or hype: Big data requires large scale data management. Small data requires less data management. (Hadoop, Pig, Hive and Python vs SQL, SAS Data Step, etc).

Dell’Anno: For raw storage, you can’t beat Hadoop. It goes back to how we look at the data usage patterns: Am I looking at low latency, high volume or high latency, low volume? It all depends on what you’re trying to accomplish.

Twogood: Hadoop and data management is an oxymoron. You use workload specific platforms to meet the needs. Hadoop is excellent for raw storage and refining. For what it does, it is the best platform. You need a unified data platform. Tying that together with software makes it transparent.

Dell’Anno: It used to be that structuring, quality and an elaborate data management process was needed first. Now, I can be doing analysis on the data and looking at data quality issues at the same time. As I get comfortable working with it, I can think about pushing it to a production environment.

Hamilton: We are taking Hadoop very seriously. We are looking at how to optimize enterprise technology on Hadoop. We are looking at efficiency challenges and how to make them better. It’s something you have to embrace and understand.

Reality or hype: Big data requires small math and unsophisticated analytic procedures and techniques.

Twogood: What customers are doing today with big data is not necessarily what they will be doing tomorrow with big data. The tools are relatively simple right now. A lot of calculations are hand-coded in Java or MapReduce. As more capabilities are developped, businesses will be able to do even more with their big data.

Dell’Anno: You can have censor data that’s relatively thin. You can do some basic analysis on that but that’s not where it’s interesting. When you do combinatorial math, it becomes interesting.

Reality or hype: Data scientists are completely different from traditional analysts.

Dell’Anno: The hype for this title is driven by challenges around the new technologies. Data scientists are expected to do more than coding. There’s a little bit of hype but there is a big demand.

Hamilton: We have business analysts and IT architects at Intel. What I’ve witnessed in the last six to eight months is that business analysts are starting to become more inquisitive about what is behind the architectures. They are challenging IT and getting involved.

Dell’Anno: On my team, fundamentally, they are mathematicians. They just really have a deep interest in the data. They’re motivated by the challenge to find insights that nobody else could find.

Twogood:  People are trying to wrap all these skills into somebody who can do everything from coding MapReduce to sharing results. It’s an unrealistic expectation for a single person to span all those different capabilities.

Hamilton: Data scientists are the ones asking the questions. Data modelers are using tools and techniques to answer those questions.

Tags: , ,
  • Facebook
  • del.icio.us
  • Twitter
  • Digg
  • LinkedIn
  • email

6 Comments

  1. Abhijit Kulkarni
    Posted October 22, 2012 at 2:39 am | Permalink

    I think this interesting quote summarizes it:

    More data beats better algorithms, but asking smarter questions beats having more data!
    Gregory Piatetsky-Shapiro, at PAW Boston – 2012

    • Alison Bolen, Editor of blogs and social content at SAS
      Posted October 30, 2012 at 11:45 am | Permalink

      Great point, Abhijit! I think the panelists would probably agree too.

  2. Posted October 30, 2012 at 8:29 am | Permalink

    This was a fascinating dialogue! Whomever organized it, the selection of three very different panelists and points of view, did an excellent job.

    The comment posted earlier, a quote sourced from a Kaggle-affiliated person, is good, regarding big data. Regarding the role of data scientists (that is my own area of interest) and whether they are or aren’t completely different from traditional analysts, I found Mr. Hamilton’s and Mr. Twogood’s final observations to be an astute summary:

    “Data modelers are using tools and techniques to answer questions… we have business analysts and IT architects… People are trying to wrap all these skills into somebody who can do everything from coding MapReduce to sharing results. It’s an unrealistic expectation for a single person to span all those different capabilities.”

    This is the very first time that I have seen this acknowledged. Of course, it would be great to find people who could do all the coding, as well as have the subject matter expertise to analyze the results in context and then present it effectively, both internally and to customers. That’s always been desirable though, even 20 (40?) years ago, no?

    • Alison Bolen, Editor of blogs and social content at SAS
      Posted October 30, 2012 at 11:50 am | Permalink

      I agree, Ellie. Finding someone who can do it all has always been a goal, but it’s not always a realistic goal. I do think the “data scientist” movement is bringing more focus on the soft skills of analytics, but that still needs to be balanced and sometimes managed across a team of people instead of trying to wrap everything into one person.

  3. Posted November 24, 2012 at 11:44 pm | Permalink

    I am trying to get back to my roots in database applications programming (80′s SAS & SPSS) for statistical analysis. It is true the high thinking Statistical Specialist communicated with the database programmer to get what they wanted. They were always two separate entities.

    Now that I have been an Accounting Consultant for multiple businesses, I have been on the other side. But my love of problem solving has brought me back to the algorithm side.

    Do you think I can combine both worlds? Maybe as a team leader – the glue?

    Is it worth updating my skills in both areas or should I concentrate on only one?

  4. Scott Van Valkenburg
    Posted November 27, 2012 at 10:12 pm | Permalink

    Hi Melissa, thanks for your thoughtful post. I really do think its possible to update your skills in both areas. I believe this area and intersection – what I call the “translator” is a highly sought after skill today in the market place. If you can effectively communicate both the IT and analyts world’s with the business side’s needs and goals, they’ll want to clone you (or at least give you a nice raise). Lastly, you might even considering calling yourself a newly minted “data scientist”. Hope this helps!

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>