9 questions on in-memory analytics
Tapan Patel on speed, analytics and data discovery
In-memory analytics is a rather bland name, but it represents an important paradigm shift in how organizations use data to tackle a variety of business challenges.
With in-memory analytics, all the data used by an application is stored within the main memory of the computing environment. Rather than accessing the data on a disk, data remains suspended in the memory of a powerful set of computers. Multiple users can share this data across multiple applications rapidly, securely and concurrently.
Gartner says in-memory analytics use is growing rapidly, “driven by technology maturation and decreasing cost, digital businesses' avidity for scale and performance (and) users' desire for deeper and timelier analytics.”
Tapan Patel, Global Product Marketing Manager at SAS, explains why in-memory analytics is such a hot trend.
IT organizations can set up a Hadoop-based sandbox environment and use in-memory processing to quickly explore unknown data relationships, create analytical models and refine them continuously.
What’s driving the popularity of in-memory analytics?
Tapan Patel: With in-memory analytics, companies can gain the speed and deeper insights needed to increase revenue, manage risks and pursue new product or service innovation. In-memory processing has speed and scaling advantages over conventional architectures for handling analytics workloads, and it enables deeper and more granular level insights. Meanwhile, falling main memory prices have made in-memory processing more achievable for analytical purposes on commodity hardware.
In layman’s terms, how different is in-memory processing from traditional analytics approaches?
Patel: The first difference is where the data is stored. Traditionally, the data is stored on a disk. With in-memory analytics, the persistent storage of the data is still on the disk, but the data is read into memory for analytical processing. Now, with commodity hardware that’s more powerful than ever, it’s effective to take advantage of in-memory processing power without constantly shuffling with data residing on the disk.
And that means more speed?
Patel: Yes. Compared to traditional batch processing, where a lot of back and forth happens between the disk and job/step boundaries, keeping data in memory allows multiple users to conduct interactive processing without going back to disk. This allows end users rapidly get answers without worrying about infrastructure constraints for analytical experiments. Data scientists are not restricted to a sample; they can apply as many analytical techniques and iterations as needed to find the best model.
How does in-memory computing complement the presence of a data warehouse?
Patel: A data warehouse is an essential component of any analytics environment, especially since it contains a set of data that is relevant, cleansed and refined for several use cases that require structured data. IT organizations can set up a Hadoop-based sandbox environment and use in-memory processing to quickly explore unknown data relationships, create analytical models and refine them continuously.
What do companies need to think about as they embark on an in-memory analytics path?
Patel: Organizations need to consider how they can modernize their technology on two fronts: analytics and infrastructure. On the analytics front, it’s important to shift from a traditional analytics mindset to a high-performance analytics mindset. On the infrastructure front, it’s important to examine how in-memory computing architecture can handle data scalability, user scalability and complex workloads. Ultimately, organizations are interested in removing latencies in the analytics lifecycle – whether it is related to data preparation, data discovery, model development or deployment.
What are some potential speed bumps in adopting in-memory analytics?
Patel: No matter how much an organization speeds up its data preparation and analytics life cycle steps, it must make sure that downstream business processes and decision makers can capitalize on the generated rapid insights. It is especially challenging in asset-intensive industries such as manufacturing, transportation, telecommunication and utilities – making collaboration between IT and the business even critical for operationalizing analytics. Organizations will not be able to realize value from generating rapid insights if all of the supporting business processes are not taking advantage of it.
What role do skills play here?
Patel: It’s easy, and a mistake, to underestimate the skills required to build and maintain these advanced analytics applications, using the latest machine learning techniques, along with a Hadoop-based data infrastructure. A lot of focus has been on the role of the data scientist, but IT skills required to manage and configure a big data infrastructure is equally important to meet service level agreements.
What are some of the IT-related considerations as organizations evaluate in-memory processing architecture for analytics?
Patel: Include IT early in the evaluation and planning process is important to determine how in-memory analytics can meet your needs to have a flexible and scalable analytics platform. In-memory analytics allows for more self-service capabilities for end users because there will be less dependence on IT to create, maintain and administer aggregates, indexes and reports. However, IT has to be careful that it’s not creating yet another silo. Instead, in-memory analytics should be part of a comprehensive information architecture, not a stand-alone strategy.
Does data integration effort change under an in-memory analytics environment?
Patel: Typically, 60 to 70 percent of organizations’ effort in any analytics exercise is around data integration, including preparing data before building models and deploying model score codes into operational systems. As an enterprise integrates new, more diverse data types and volumes – such as event streams, sensor data, log data, free-form text, social media data, etc. – to support in-memory analytics enabled use cases, data integration and data discovery will be even more critical for building analytical models downstream.