Predictive analytics and machine learning
By Katrina Wakefield, Marketing, SAS UK
For many organisations, big data – incredible volumes of raw structured, semi-structured and unstructured data – is an untapped resource of intelligence that can support business decisions and enhance operations. As data continues to diversify and change, more and more organisations are embracing predictive analytics, to tap into that resource and benefit from data at scale.
What is predictive analytics?
A common misconception is that predictive analytics and machine learning are the same thing. This is not the case. (Where the two do overlap, however, is predictive modelling – but more on that later.)
At its core, predictive analytics encompasses a variety of statistical techniques (including machine learning, predictive modelling and data mining) and uses statistics (both historical and current) to estimate, or ‘predict’, future outcomes. These outcomes might be behaviours a customer is likely to exhibit or possible changes in the market, for example. Predictive analytics help us to understand possible future occurrences by analysing the past.
Machine learning, on the other hand, is a subfield of computer science that, as per Arthur Samuel’s definition from 1959, gives ‘computers the ability to learn without being explicitly programmed’. Machine learning evolved from the study of pattern recognition and explores the notion that algorithms can learn from and make predictions on data. And, as they begin to become more ‘intelligent’, these algorithms can overcome program instructions to make highly accurate, data-driven decisions.
How does predictive analytics work?
Predictive analytics is driven by predictive modelling. It’s more of an approach than a process. Predictive analytics and machine learning go hand-in-hand, as predictive models typically include a machine learning algorithm. These models can be trained over time to respond to new data or values, delivering the results the business needs. Predictive modelling largely overlaps with the field of machine learning.
There are two types of predictive models. They are Classification models, that predict class membership, and Regression models that predict a number. These models are then made up of algorithms. The algorithms perform the data mining and statistical analysis, determining trends and patterns in data. Predictive analytics software solutions will have built in algorithms that can be used to make predictive models. The algorithms are defined as ‘classifiers’, identifying which set of categories data belongs to.
The most widely used predictive models are:
- Decision trees:
Decision trees are a simple, but powerful form of multiple variable analysis. They are produced by algorithms that identify various ways of splitting data into branch-like segments. Decision trees partition data into subsets based on categories of input variables, helping you to understand someone’s path of decisions.
- Regression (linear and logistic)
Regression is one of the most popular methods in statistics. Regression analysis estimates relationships among variables, finding key patterns in large and diverse data sets and how they relate to each other.
- Neural networks
Patterned after the operation of neuronsin the human brain, neural networks (also called artificial neural networks) are a variety of deep learning technologies. They’re typically used to solve complex pattern recognition problems – and are incredibly useful for analysing large data sets. They are great at handling nonlinear relationships in data – and work well when certain variables are unknown
- Time Series Algorithms: Time series algorithms sequentially plot data and are useful for forecasting continuous values over time.
- Clustering Algorithms: Clustering algorithms organise data into groups whose members are similar.
- Outlier Detection Algorithms: Outlier detection algorithms focus on anomaly detection, identifying items, events or observations that do not conform to an expected pattern or standard within a data set.
- Ensemble Models: Ensemble models use multiple machine learning algorithms to obtain better predictive performance than what could be obtained from one algorithm alone.
- Factor Analysis: Factor analysis is a method used to describe variability and aims to find independent latent variables.
- Naïve Bayes: The Naïve Bayes classifier allows us to predict a class/category based on a given set of features, using probability.
- Support vector machines: Support vector machines are supervised machine learning techniques that use associated learning algorithms to analyse data and recognise patterns.
Each classifier approaches data in a different way, therefore for organisations to get the results they need, they need to choose the right classifiers and models.
Applications of predictive analytics and machine learning
For organisations overflowing with data but struggling to turn it into useful insights, predictive analytics and machine learning can provide the solution. No matter how much data an organisation has, if it can’t use that data to enhance internal and external processes and meet objectives, the data becomes a useless resource.
Predictive analytics is most commonly used for security, marketing, operations, risk and fraud detection. Here are just a few examples of how predictive analytics and machine learning are utilised in different industries:
- Banking and Financial Services
In the banking and financial services industry, predictive analytics and machine learning are used in conjunction to detect and reduce fraud, measure market risk, identify opportunities and much, much more.
With cybersecurity at the top of every business’ agenda in 2017, it should come as no surprise that predictive analytics and machine learning play a key part in security. Security institutions typically use predictive analytics to improve services and performance, but also to detect anomalies, fraud, understand consumer behaviour and enhance data security.
Retailers are using predictive analytics and machine learning to better understand consumer behaviour; who buys what and where? These questions can be readily answered with the right predictive models and data sets, helping retailers to plan ahead and stock items based on seasonality and consumer trends – improving ROI significantly.
Developing the right environment
While machine learning and predictive analytics can be a boon for any organisation, implementing these solutions haphazardly, without considering how they will fit into everyday operations, will drastically hinder their ability to deliver the insights the organisation needs.
To get the most out of predictive analytics and machine learning, organisations need to ensure they have the architecture in place to support these solutions, as well as high-quality data to feed them and help them to learn. Data preparation and quality are key enablers of predictive analytics. Input data, which may span multiple platforms and contain multiple big data sources, must be centralised, unified and in a coherent format.
In order to achieve this, organisations must develop a sound data governance program to police the overall management of data and ensure only high-quality data is captured and recorded. Secondly, existing processes will need to be altered to include predictive analytics and machine learning as this will enable organisations to drive efficiency at every point in the business. Lastly, organisations need to know what problems they are looking to solve, as this will help them to determine the best and most applicable model to use.
Understanding predictive models
Typically, an organisation’s data scientists and IT experts are tasked with the development of choosing the right predictive models – or building their own to meet the organisation’s needs. Today, however, predictive analytics and machine learning is no longer just the domain of mathematicians, statisticians and data scientists, but also that of business analysts and consultants. More and more of a business’ employees are using it to develop insights and improve business operations – but problems arise when employees do not know what model to use, how to deploy it, or need information right away.
At SAS, we develop sophisticated software to support organisations with their data governance and analytics. Our data governance solutions help organisations to maintain high-quality data, as well as align operations across the business and pinpoint data problems within the same environment., Our predictive analytics solutions help organisations to turn their data into timely insights for better, faster decision making. These predictive analytics solutions are designed to meet the needs of all types of users and enables them to deploy predictive models rapidly.