The journey to high-performance analytics
Organizations using a high-performance environment are seeing stunning results
Rome was not built in a day. Similarly, high-performance analytics is a product of many cumulative architectural, computational and analytical advances. The ability to solve complex business problems by applying algorithms from multiple disciplines to increasingly larger volumes of data of all types – both structured and unstructured – is not a small task. It requires innovation at many levels and is the result of years of effort built on customer experience and a strong history of building robust enterprise-class products.
For SAS, working with data and exploiting algorithms is built into our DNA. For more than 35 years, we've worked on innovative problems, and the size of data or the complexity of the analysis has never been a barrier. Every few years, we have tackled new challenges. Significant milestones in this journey have included:
- Creating software that runs on multiple operating systems.
- Accessing data from multiple sources.
- Taking advantage of distributed computing environments.
- Creating targeted, analytically based vertical applications that take advantage of our analytical strengths.
- Creating analytical applications that facilitate collaboration across the enterprise.
In every one of these areas, the requirement to tackle larger problems and more complex scenarios continues to grow, which in turn requires our core analytical tools to grow – be re-invented – to satisfy these demands. At every level, we take advantage of multiple processors on a single machine as well as run on multiple nodes in a distributed environment.
These years of experience have taught us that the big data/big analytics problem must be addressed at many different levels. SAS' vision for high-performance analytics includes all aspects of "big": volume, velocity, variety and complexity. More importantly, when you think of high-performance analytics, you need to ensure it goes hand-in-hand with master data management and data governance as well.
Our architectural breakthroughs make it possible for analytical developers to restructure their algorithms to exploit hardware advances and run in multiple distributed modes.
The performance of an analytical algorithm alone is not sufficient to solve the entire business pain for an enterprise; pay attention to the performance of data movement as well.
SAS® High-Performance Analytics offerings are designed to provide fast execution and minimize data movement for both model creation and deployment. For example, the SAS Scoring Accelerator and SAS Analytics Accelerator provide scoring and modeling inside the database. Catalina Marketing has seen reductions in model-scoring time from 4.5 hours to around 60 seconds with this technology.
The next frontier
As SAS continues this journey, the next frontier is exploiting the massively parallel capabilities of the database. This will facilitate manipulating and loading large quantities of data while providing complex analytical algorithms that can be encapsulated as loadable extensions and run alongside the database. Taking advantage of the parallel capabilities of the database and moving the analytics to the data will unleash the power of the mathematics on massive amounts of data.
It allows us to provide a high-end, enterprise-class platform that combines in-memory analytics with a data platform that supports hardware failover and data replication, terabytes of storage, querying capabilities, ETL and many other capabilities that are important to IT and the end user. Our architectural breakthroughs make it possible for analytical developers to restructure their algorithms to exploit hardware advances and run in multiple distributed modes. This, I believe, is a key milestone for our overall development paradigm, which immediately opens a vast array of possibilities for us.
Some of the current work ranges from running single analytical routines such as logistic regression to enabling the full data mining modeling process in a high-performance environment to order of magnitude performance enhancements for targeted business applications like markdown optimization and marketing optimization. Here are some of the performance results we can now realize:
Fitting a logistic regression model on a billion observations in about 60 seconds.
Solving a large variable selection problem with over 1,800 parameters in more than 100 model effects and 50 million observations in less than 60 seconds.
Solving problems that were previously intractable. For example, a marketing optimization problem based on more than 25 million customers and nearly 1,000 offers – the solution time dropped from more than five and a half hours to less than six minutes.
Exploiting the new architecture to increase performance for products that are already market leaders in performance, including SAS Marketing Optimization.
What does this mean for your business?
High-performance analytics have the potential to introduce game-changing options. For instance, a leading US retailer working with SAS needs to determine optimal clearance prices for more than 273 million product-by-location combinations involving hundreds of millions of potential pricing decisions per week.
SAS Markdown Optimization analyzes three terabytes of historical sales data with multiple estimation and pricing algorithms targeted for this business problem. Using new SAS High-Performance Analytics technologies, the computation time was reduced from 30 hours to about two hours. This immense reduction in time allows the retailer to run more scenarios in the same window of time, providing the ability to look at alternate pricing strategies. Now, the retailer can provide the right prices to the right customers at the right time, in the end maximizing profit and clearing inventory.
What does this mean for SAS® software?
SAS will continue to see tremendous growth in high-performance analytics. We will move an increasing number of our algorithms into the high-performance category to exploit hardware advances and harness the maximum performance gains. The infrastructure provides the ability to run on a variety of hardware configurations – commodity as well as platforms – where we can exploit the benefits provided by the hardware vendor. The overall effort across SAS enables us to handle many types of big problems – dealing with the big data issue and complex problems, which historically have taken days to run, even on small data.
Bio: Radhika Kulkarni is Vice President of Advanced Analytics R&D at SAS.