Cloud and Big Data: Your guide to a successful relationship
By Tamara Dull, SAS
At the March 2013 South by Southwest (SXSW) conference, Tim Byers of Motley Fool said, "… in many ways, [cloud and big data] are becoming one and the same – cloud resources are needed to support big data storage and projects, and big data is a huge business case for moving to cloud." When it comes to the marriage of big data and cloud technologies, it's a match made in heaven.
But as romantic as that sounds, we all know that a strong relationship doesn't happen overnight and often requires a lot of hard work. Cloud and big data are no exception.
What does big data bring to the table?
In this highly hyped relationship, big data – in addition to her sex appeal – is the breadwinner, bringing usable information to your organization, which is what big data is all about. But before adding big data to your corporate mix, you need to answer these questions:
- How much big data do you really have? Does the volume warrant extending your current infrastructure?
- What is the nature of your data – structured, semistructured and/or unstructured? Do you currently have the infrastructure and technology to support these different types of data?
- Where is your big data coming from – internal, external and/or open sources? Big data brings with it an abundance of data sources – some new and some old – and these sources are rapidly growing.
Most importantly: Before embarking on any big data initiative, identify the business issue being addressed and the expected value it will bring.
What does cloud bring to the table?
If big data is the sexy breadwinner, then cloud brings a reliable, stable foundation – i.e., the infrastructure – to the relationship. Cloud offers multiple infrastructure options:
- Internal private cloud: a virtualized, dedicated infrastructure inside your firewall.
- External private cloud: a shared, but customized infrastructure hosted outside your firewall.
- Public cloud: a shared infrastructure hosted by a third party.
- Hybrid: a mixed environment of on-premises, private cloud (internal and external) and public cloud.
Cloud is also known for bringing speed to innovation, agility and rapid scalability, and a lower total cost of ownership (TCO) to its relationships.
Getting to your happily ever after
If you've answered all the questions I raised about big data, and understand what cloud can deliver, you're ready to take the plunge. But like any relationship, there are things you'll need to figure out along the way. Here are seven dynamics you'll need to consider:
At the heart of the big data hype is open source software, most notably Hadoop and its many related projects. The good news is that open source software is free, but it requires a solid understanding of the open source ecosystem, whether it's installed on-premises or in the cloud.
Data storage and processing:
Big data has many compelling use cases, including the staging, preprocessing, processing, and storing of data short term and long term. Each use case may be best served by different facets of the infrastructure. For example, you can stage and preprocess data in a private internal cloud to keep it close to structured, on-premises data; process structured data in the private cloud; and/or store long-term data in the public cloud.
The new big data technologies require skills you may not have in-house: open source (e.g., Hadoop), cloud integration, security and analysis tools, to name a few. Most importantly, however, are the business analysts and data scientists who must provide insight around this big data.
With the additional hardware, software and skills required by big data, an organization needs to decide who is best suited to support this extensive infrastructure. If you’re only interested in an internal private cloud, IT can manage this. But if you’re looking to go outside your corporate firewall, you will need the support of a third party – such as a software vendor or cloud service provider – to host and help manage your corporate infrastructure.
The closer you keep your data together, the better the performance will be. If your data is stored across the country or on another continent, you will need to consider the network traffic to upload and access the data; the results could be brutal. The volume of big data alone could bring your infrastructure to its knees and dissatisfaction to your internal and external customers.
In the early stages of your big data journey, you will most likely work with the data in a standalone environment, whether it be on-premises or in the cloud. Long term, however, you will want to integrate big data with existing applications, systems and processes. The integration of big data across internal and external systems, in and outside the cloud, is forcing companies to reexamine their existing skills.
With big data, organizations can easily tap into new (and old) data sources – such as social, open and machine data – and combine it with existing operational and analytical data like never before. This can lead to fascinating and innovative insights about customers. But therein lies the challenge: These new insights may infringe on a customer's privacy rights. Take heed of this important topic as it evolves.
One final tip. Build a solid foundation by addressing each of these considerations, and you'll see that the relationship between cloud and big data is truly a match made in heaven.
Tamara Dull, Director of Emerging Technologies at SAS, has more than 25 years of technology services experience, with a strong foundation in data analysis, design, and development.