Data as a differentiator

By Mike Luke, National Practice Leader, Data Management, SAS Canada


MAP and PIG and HIVE, oh my!

The Big Data era has fostered the Hadoop ecosystem, which, in turn, has marketing professionals awash in a “Lake” of acronyms and buzzwords. Let me give you the Sqoop—Hadoop is simply an environment for storing and processing data. Huge volumes of data, to be sure. But it’s still a tool to be mastered, not to be ruled by.

Hadoop’s primary components are a file system (HDFS) that distributes petabytes of data (a petabyte is one quadrillion bytes) in large blocks over multiple computers in a data centre or cloud computing environment, and programming models (such as MapReduce and Spark) that move processing code to the machines that store the relevant data. Analytics can be performed on huge datasets in parallel on multiple machines.

That’s the upside: The ability to process vast quantities of data, and quickly. The downside is its relative naiveté with respect to the nature of the data. In a traditional database, records and fields identify the function of data. In a Hadoop environment, data must be prepped and cleansed and mastered to be useful. Seventy to eighty per cent of a data scientist’s time is spent making data usable, which isn’t the most effective use of high-end talent.

Tools are emerging, such as our own SAS Data Loader for Hadoop that move these data integration and data quality capabilities right into Hadoop and provide an interface that can put at least the lightweight data work firmly in the hands of the business analyst rather than the technology professional. Once the BA has power, it opens up a new world of understanding customer behavior and how to affect it.


Data is the lifeblood of marketing. And there’s an almost inconceivable amount of it being created every day. It’s estimated that we create 2.5 quintillion bytes of data every day. Ninety per cent of the data created in the history of the world was created in the last two years.

That volume of data will only increase as new sources begin generating data. We’re well on the road to the “Internet of Things,” where Internet-enabled sensors on everything from store shelves to parking spaces will generate data to be sliced, diced and analyzed. Mobile phones create GPS data; digital surveillance cameras can help map dwell times and traffic patterns in a retail environment. Harnessing that data is leverage to move a customer conversion.

It’s not simply about collecting data willy-nilly. It’s not all relevant. And context is key to delivering the right offer to the customer.

Consider the humble online shopping cart. Thanks to a transaction database, clickstream data and your search history, Amazon can recommend other titles you might like when you furtively order 50 Shades of Grey. Contextual information—what other titles people bought who bought this item, what you’ve been searching for, the other items you’ve viewed—drive the offer.

It’s a simple example, but extended into a more complex, Big Data world, putting this data into context creates an iterative approach to improving data models, which deliver more relevant offers to the consumer with each parcel of data. Sensor data can create models of a retail outlet’s parking patterns to help make decisions about managing vehicle traffic; pedestrian flows within a mall can guide placement of product collateral.


New and emerging technologies are changing the way offers can be presented to prospects. The ubiquity of the smart phone and the proliferation of apps for them are central to this new model. Phones provide location information; venue-branded apps can provide search and transaction history. Beaconing technology can transmit hyperlocal offers to passing smart phones. Imagine an environment in which a prospect can enter a venue, search for a location or item by smart phone or kiosk, receive real-time turn-by-turn direction optimized to avoid congestion or mobility obstacles, and receive a customized offer through a smart phone app en route. It’s the very definition of the right offer at the right time. And it’s not imagination. Shopping malls, hospital campuses, and convention centres around the world are rolling out wayfinding platforms to build these applications. But the model falls apart without accurate and complete underlying Big Data.


Not all data is yes/no, GPS co-ordinates, transactions, etc. We’re also creating more unstructured data—data that can’t be manipulated by formula. The most important sources of unstructured data for marketers is social media.

Facebook posts and tweets on Twitter can have a serious impact—positive or negative—on your brand. In the past, a customer might send complaints or flattery by mail, or call to complain. Someone would have to scan and report on these interactions for quality assurance purposes.

With the immediacy of social media, the volume of messages for—and about—brands has increased exponentially. Monitoring those messages manually is becoming unmanageable (the well-worn tale of the Porter Airlines technician who immediately responded to reports of a broken cappuccino machine notwithstanding).

Fortunately, there are tools that can bring structure to the unstructured, and bring it into the Big Data model to help drive marketing decisions. MapReduce, Hadoop’s built-in distributed processing model gives it the horsepower necessary to preprocess text, for example, assigning numeric values to the tone of the content (sentiment analysis), extracting keywords (region, location), and turning a tweet or Facebook post into data that can be processed by an application.

A logical extension of this is to capture the ultimate in unstructured data—your outbound and incoming telephone calls. If this call is being recorded for training and quality assurance purposes (as we’ve all heard before), its data that’s available for analysis. MapReduce’s processing power could be applied to improve the accuracy of voice-to-text conversion; combined with tools that already exist to analyze tone of voice to detect levels of agitation in real time and the call centre operator’s log, this could create a very comprehensive data file suitable for analysis.


In summary, there are a lot of reasons that marketers should wrap their heads around Hadoop and its potential.

* It’s cost effective. Or rather, the supporting infrastructure is less costly than legacy proprietary database and hardware; for example HDFS and MapReduce run on commodity x86 hardware. In addition, it’s an open source platform, it can be downloaded and run under a General Public License (GPL), meaning your only obligation is to contribute whatever improvements you make back to the community (this used to be referred to as “copyleft” as opposed to “copyright”). Of course there are commercial releases and support that wrap more features around the platform for a fee therefore making the platform viable to a broader audience.

* It allows the processing of vastly larger data sets than traditional database model. This means you can capture and use more data about your customers, allowing you to detect unexpected patterns and connections.

* Its disruptive and its challenging the traditional approach to computing. The combination of the most recent financial crisis and the overall growth of data have propelled organizations to leverage new approaches to gaining insights about their clients to increase profit and lower costs. The ecosystem will continue to advance as technologies such as Hadoop becomes more mainstream. You can be sure that the “lake” of buzzwords and acronyms will only continue to grow at the same rate and pace as the available data.

To learn more about SAS solutions for Hadoop, visit:

As the National Practice Leader for Data Management at SAS Canada, Mike is assisting clients across Canada leverage one of their most important assets, their data. Mike has gained an extensive background in Information Technology working with Financial Institutions, Retailers and Telecommunication Providers across Canada.