How do you know if you’re ready for Hadoop?
Hear from an early adopter about training, use cases and analyzing consumer data in Hadoop
By Alison Bolen, SAS Insights Editor
Without a doubt, Bob Zurek meets the definition of an early adopter. The Amazon Echo in his living room can answer questions about his morning commute or rattle off player stats from yesterday’s Red Sox game. The Apple Watch™ on his wrist tracks the number of steps he takes each day and chimes to remind him about his evening activities. The Nest thermostat in his home adjusts automatically based on the time of day and the location of everyone in the house.
Thinking about these new devices, the data streaming from each one, and all the advancements that are yet to come, Zurek likes to tell his kids, “You are in for an amazing adventure.”
That philosophy and excitement for new technology carries over into his job, where Zurek oversees a large Hadoop implementation at Epsilon. As Vice President of Products, Zurek is responsible for Epsilon’s line of digital marketing and customer loyalty offerings.
One of Epsilon’s most popular products is a digital messaging platform called Agility Harmony that some of the world’s largest brands use to store and process customer data, so they can create meaningful connections with their customers via mobile devices and other online channels. Combining the power of Hadoop and SAS, Agility Harmony gives Epsilon’s customers a complete view of individual consumers for digital campaigns.
I spoke with Zurek recently about his Hadoop experience and his advice for organizations who are considering a move to Hadoop as a storage and analytics platform.
One benefit of Hadoop is its ability to scale at heights we’ve never seen before at a very economical price model. As we gather and store more and more information, that information is going to be valuable to consumers in ways we’ve never seen before.
Vice President of Products, Epsilon
Let’s start with your overall outlook on Hadoop and analytics. How are things starting to change?
Bob Zurek: Hadoop is becoming pretty mainstream pretty fast, and people are getting skilled up quickly. One thing that’s great to see is the amount of innovation that data scientists and technology leaders are applying to help solve problems with Hadoop. For example, the security around Hadoop has continued to improve.
Early on, Cloudera and Hortonworks turned their attention to making sure that organizations could rely on Hadoop day in and day out, and feel a sense of security around the ecosystem. We need to continue to grow the security capabilities, since security is becoming an important part of the partnership ecosystem for the pioneers in Hadoop.
Just as SAS has focused on interoperability with Hadoop, many vendors are driving new technologies to embrace the capabilities of Hadoop. Whether it’s data integration solutions that move data in and out, or BI technologies that allow you to reach into Hadoop and do data discovery, visualization and analytics, the big technology ecosystem players have all embraced it well.
Hadoop has good use cases now in every industry. It’s moving away from being just an ETL alternative or an alternative to a data warehouse, and getting into supporting more interactive applications. It’s not just batch in nature. It can support interactive applications and pure analytic applications.
How are you using analytics with Hadoop?
Zurek: We use Hadoop to capture consumer profile data and segment customer data on behalf of the brands that we support. Brands are using Agility Harmony to email people who have opted in to get digital communications, and each brand collects information and preferences from opt-in customers who have a special interest in the brand.
Someone at the brand might say, “I want to send an email to females with a passion for soccer who live within a 150-mile radius around Boston.” We use Hadoop to power the segmentation that has to go through advanced querying of the system. The brand can then align the campaign and quickly target consumers with a special deal.
In the meantime, we get a lot of event data coming back for each mailing, like what time it was opened, who clicked what, and from what device. That event data is sent back into the system, so you can tie those events back into the segmentation rules and, for example, look for consumers that have a propensity to transact for a specific type of campaign. So the next mailing might target female soccer players who are looking for a good deal on gear.
Our data science team is full of SAS users. It’s great for machine learning, advanced analytics and solving complex problems. And our customers ask very complex questions of the data. For example, they might decide they want to send a breakfast special to people who open emails on cell phones near a particular location at 7 a.m. And we can do that.
Hadoop is often synonymous with the term big data. How big is your data, and how important is scalability?
Zurek: When you make a platform like Agility Harmony available to the top 50 retailers, banks or quick serve restaurants, there’s a lot going on. We work with a lot of brands, and a single brand might have 2 million consumers stored in Hadoop. This is a cloud-based system that’s managing 40 billion pieces of communication over the course of a month. It has to scale pretty quickly.
One benefit of Hadoop is its ability to scale at heights we’ve never seen before at a very economical price model. There’s this whole notion of performance and scale as we deal with more structured, unstructured and semistructured data.
As we gather and store more and more information, that information is going to be valuable to consumers in ways we’ve never seen before. As consumers store more and more information on mobile devices, they want that information to be easily accessible and consumable. You need the scalability and robustness of technologies like Hadoop to do that.
How did you know you were ready for Hadoop? How might other organizations know if they are ready for Hadoop?
We have very smart IT operational experts globally. For them, it was like, “Hadoop, no problem. We’ll take it on and manage it.” The developers here got excited and began to embrace it immediately too. You can almost hear the buzz when they start applying their algorithms and techniques to the data in Hadoop.
Pick a reliable distribution partner. We selected Cloudera three years ago because, at the time, they were in best position to support us. We’ve seen great success in working with them, but if you’re a huge Oracle, IBM or SAS shop and have investments in that technology, these vendors are in a position to help too. They all have teams of Hadoop experts, and a lot of technology services companies can help too.
It’s very important to identify core Hadoop use cases for your modernization project. Pick the easy ones. Take a baby step with Hadoop. Make sure you understand if you’ll use HBase or a SQL interface on top of Hadoop. Is it compatible with tools and existing coding models in your platform? If you’re looking at modernization, make sure you understand the use case and that Hadoop will support that use case.
How does Hadoop complement your existing technology structures?
You can’t do everything with Hadoop. You can do quite a bit, but you have to complement it with other database technologies.
For example, the metadata that’s part of our application is stored in traditional database technology. We also have an event database on Cassandra that does high-speed event data management. We are seeing instances where we can port some of that over to Hadoop, but not all.
Our graph databases also are on a traditional RDBMS platform. We use Hadoop, Casandra and a little MySQL in there also. And we have Polyglot platform that underpins our Agility Harmony platform too.
There’s so much excitement about the Internet of Things, and many organizations are counting on Hadoop to store all that data. How do you see the use of these technologies evolving?
When you take a look at the Internet of Things, it’s amazing. If I look at my Apple Watch and think about all the events created by this thing, it’s pretty substantial. It has the ability to capture and share more and more information. We need robust data systems like Hadoop to support that expansive amount of information coming in from all our sensors and connected devices.
Or take the Amazon Echo. Think about the unstructured data that it has to capture. I can ask, “What’s my commute like today?” and get an answer. All that information has to be worked out on behalf of the consumer. And this takes us to the world of machine learning. It gets better and better the more data you feed it. Where are they going to get all of that data? Out of Hadoop.
The proliferation of these connected devices means more and more data in all kinds of formats. That really is why venture capitalists are investing in companies that are embracing Hadoop. They can see the multibillion-dollar opportunities are there.
And we’re just scratching the surface. Advances in speech recognition, advances in sensor technologies, and the instrumenting of lots of different products are pushing the boundaries of consumer convenience.
As a final thought, what one recommendation do you have for other business leaders as they pursue their own opportunities with analytics and Hadoop?
Embrace change. Hadoop is a vehicle for change for your organization in a good way. We are seeing significant ROI from our investment in Hadoop, in the 10x range, from increased efficiencies and reduced costs. It’s a very good, strong economic return.