Hadoop in the corporate world

Five ways to get the baby elephant enterprise-ready

By Tony Hamilton

You may already know that Hadoop – with its cute baby elephant logo – is a popular, open-source framework for storing and analyzing big data in a distributed computing environment.

Tony Hamilton is the Global Product Marketing Principal for Data Management at SAS. He holds a BSc degree from the University of Strathclyde in Scotland. Hamilton recently served on the board of advisors for the International Institute for Analytics with Tom Davenport and Jack Phillips.

Why the elephant? Hadoop’s namesake is a stuffed animal – a yellow baby elephant – belonging to the child of Hadoop co-creator Doug Cutting. That baby elephant is more than a good name and memorable logo. It’s an apt metaphor for a young but powerful system.

That powerful system has two parts specifically designed to:

1. Easily store a lot of data.
2. Quickly process that data using multiple, simultaneous passes at the data.

Oh, and it’s open source, which means it’s readily available for anyone to use, if you have the hardware and the skills to make it work. But just like you can’t take a baby elephant out of the jungle and put him right to work without some planning and training, you can’t necessarily download Hadoop from the Web and start using it immediately for business decisions.

A lot of people think open source is open source: You take it from the website and run your business. That isn’t quite true. You have to have other features and functions working around it first, especially if you’re going to be business compliant and use it to make sound business decisions.

With that in mind, let’s review the five areas you need to work on to get the baby elephant enterprise-ready:

1. Data access.

Data in Hadoop should be accessible in the same way other data sources are accessed for analysis in the enterprise. Unified access is important for big data problems. Your access tools and connectivity tools should extend into the Hadoop framework in a similar way that works with other business software, like relational databases and CRM, for instance.

2. Security.

Make sure there are safety measures built around your Hadoop framework. If you’re going to run a business on this framework, you don’t want just anyone on it. And you want to make sure it doesn’t break down and block you from your data. This can be done through multiple data management layers with gatekeeping security features between the different layers. Security will also help establish ground rules for the Hadoop environment to interoperate with your traditional environment.

3. Performance.

Yes, Hadoop is built for big data, but you still have to manage it for optimum performance. Consider how you’ll meet service-level agreements and make sure to understand the capabilities of the environment and the software you’re using. Balancing capacity for workloads and expectations is still important with Hadoop. You wouldn’t let a baby elephant carry 50 logs; he’s not the same size as a mature elephant. You have to train Hadoop to carry the right-sized load the same way you would train that elephant.

4. Integration.

Make sure there’s a solid understanding of how Hadoop connects into the other environments in your compute, storage and IO infrastructure. It is important to understand how the hardware fits with the growth of the workload. Make sure it’s behaving at correct levels, understand what information is coming in and be confident about what's going out. The smartest businesses are blending ideas and insight paradigms for big data with their traditional data sources. Just like elephants, who never forget, a good Hadoop implementation will take advantage of all in-memory implementation scenarios that are available in the industry. This will help your analysis become more granular and your decision makers become more confident.

5. Real time.

In the Hadoop world, you’re moving from batch processing to real time and making results visible in a mobile environment. Everyone’s got a cellphone or tablet, and we expect instant results on these devices. There is a lot of preparation required from data management to model development and visualization to make sure the outputs from your Hadoop environment are on time, current and mature.

This might be your first time raising a baby elephant, but it’s likely not your first time implementing a new technology. Consider the lessons you learned from the early days of ERP or Linux. A lot of these projects began for specific purposes or reasons and then over time, governance was applied. Hadoop is in a similar position today. The more you can apply lessons learned from earlier implementations – and from the list above – the more ready you will be to take advantage of Hadoop.

sascom magazine logo 50% gray

Read more:

SAS, Cloudera team up on Hadoop

Joint engineering for Hadoop eases use, adds functionality

Organizations are looking to unleash insights from big data – integrating analytics, visualization and high-speed scoring methods with Hadoop will get them closer. That’s why SAS and Cloudera have teamed up to introduce SAS/ACCESS® Interface to Impala (Cloudera’s new production-ready, SQL-on-Hadoop solution).

“With the economic scalability to store and analyze data in Hadoop, SAS and Cloudera allow customers to reveal hidden insights that have loomed out of reach. By providing a more visual and interactive Hadoop experience, SAS makes it easier to discover significant trends and insights,” said Randy Guard, SAS Vice President of Product Management.

“Enterprises struggle to get more out of their most strategic asset – their data,” said Tim Stevens, Vice President of Business and Corporate Development at Cloudera. “As a leader in data management, Cloudera continues to strengthen joint big data and analytics offerings with SAS. We allow our enterprise customers to easily access and manipulate data in real time, ensuring quicker decision making and faster time to value.”

Read more 

Back to Top