Machine learning solutions Learn about machine learning solutions

Machine Learning

What it is and why it matters

Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence (AI) & based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention.

Evolution of machine learning

Because of new computing technologies, machine learning today is not like machine learning of the past. It was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks; researchers interested in artificial intelligence wanted to see if computers could learn from data. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that has gained fresh momentum.

While many machine learning algorithms have been around for a long time, the ability to automatically apply complex mathematical calculations to big data – over and over, faster and faster – is a recent development. Here are a few widely publicised examples of machine learning applications you may be familiar with:

The heavily hyped, self-driving Google car? The essence of machine learning.
Online recommendation offers such as those from Amazon and Netflix? Machine learning applications for everyday life.
Knowing what customers are saying about you on Twitter? Machine learning combined with linguistic rule creation.
Fraud detection? One of the more obvious, important uses in our world today.

Machine learning and artificial intelligence

While artificial intelligence (AI) is the broad science of mimicking human abilities, machine learning is a specific subset of AI that trains a machine how to learn. Watch this video to better understand the relationship between AI and machine learning. You'll see how these two technologies work, with useful examples and a few funny asides.

Machine learning in today's world

By using algorithms to build models that uncover connections, organizations can make better decisions without human intervention. Learn more about the technologies that are shaping the world we live in.

All about machine learning algorithms

There are four types of machine learning algorithms: supervised, semisupervised, unsupervised and reinforcement. Learn about each type of algorithm and how it works. Then you'll be prepared to choose which one is best for addressing your business needs.

Read the article Read about the 4 main types of machine learning algorithms

Building models with a developer workbench

A self-service, on-demand compute environment for data analysis and ML models increases productivity and performance while minimizing IT support and cost. In this Q&A, an expert explains why a developer workbench is an ideal environment for developers and modelers.

Read the blog post See why a developer workbench is so good for developers and modelers

Recognition for leading technology

Analyst reports confirm: SAS is an industry leader in artificial intelligence, machine learning, data science, predictive analytics, risk management, multichannel marketing, customer analytics, fraud management, anti-money laundering, and more.

Explore now Learn why SAS is a recognized leader in machine learning and more

Adopt trustworthy AI

Consumers have more trust in organizations that demonstrate responsible and ethical use of AI, like machine learning and generative AI. Learn why it’s essential to embrace AI systems designed for human centricity, inclusivity and accountability.

Learn how SAS does it Discover why it's so important to adopt trustworthy AI

Real-world applications of machine learning

CNG Holdings uses machine learning to enhance fraud detection and prevention while ensuring a smooth customer experience. By focusing on identity verification from the outset, they transitioned from reactive to proactive fraud prevention. Machine learning models help quickly validate identities, significantly reducing fraud instances and false positives. Real-time data access allows CNG to adjust strategies swiftly during fraud attempts, leading to reduced costs and more efficient investigations.

Why is machine learning important?

Resurging interest in machine learning is due to the same factors that have made data mining and Bayesian analysis more popular than ever. Things like growing volumes and varieties of available data, computational processing that is cheaper and more powerful, and affordable data storage.

All of these things mean it's possible to quickly and automatically produce models that can analyse bigger, more complex data and deliver faster, more accurate results – even on a very large scale. And by building precise models, an organisation has a better chance of identifying profitable opportunities – or avoiding unknown risks.

What's required to create good machine learning systems?

Data preparation capabilities.
Algorithms – basic and advanced.
Automation and iterative processes.
Scalability.
Ensemble modeling.

Did you know?

In machine learning, a target is called a label.
In statistics, a target is called a dependent variable.
A variable in statistics is called a feature in machine learning.
A transformation in statistics is called feature creation in machine learning.

Who's using it?

Most industries working with large amounts of data have recognized the value of machine learning technology. By gleaning insights from this data – often in real time – organizations are able to work more efficiently or gain an advantage over competitors.

Financial services

Banks and others in the financial industry can use machine learning to improve accuracy and efficiency, identify important insights in data, detect and prevent fraud, and assist with anti-money laundering. Data mining, a subset of ML, can identify clients with high-risk profiles and incorporate cyber surveillance to pinpoint warning signs of fraud.

Health care

Machine learning is a fast-growing trend in the health care industry, thanks to the advent of wearable devices and sensors that can use data to assess a patient's health in real time. The technology can also help medical experts analyze data to identify trends or red flags that may lead to improved diagnoses and treatment.

Insurance

Machine learning is revolutionizing the insurance industry by enhancing risk assessment, underwriting decisions and fraud detection. It also helps improve customer experience and boost profitability. By analyzing vast amounts of data, ML algorithms can evaluate risks more accurately, so insurers can tailor policies and pricing to customers.

Life sciences

Machine learning and other AI and analytics techniques help accelerate research, improve diagnostics and personalize treatments for the life sciences industry. For example, researchers can analyze complex biological data, identify patterns and predict outcomes to speed drug discovery and development. For treatment, analyzing patient data allows therapies to be tailored to individual genetic profiles and health histories (for personalized medicine).

Public sector

Government agencies responsible for public safety and social services have a particular need for machine learning because they have multiple sources of data that can be mined for insights. Analyzing sensor data, for example, identifies ways to increase efficiency and save money. Machine learning can also help detect fraud and minimize identity theft.

Retail and consumer goods

Websites that recommend items you might like based on previous purchases use machine learning to analyze your buying history. Retailers rely on machine learning to capture data, analyze it and use it to personalize a shopping experience, implement a marketing campaign, optimize prices, plan merchandise and gain customer insights.

How it works

To get the most value from machine learning, you have to know how to pair the best algorithms with the right tools and processes. SAS combines rich, sophisticated heritage in statistics and data mining with new architectural advances to ensure your models run as fast as possible – in huge enterprise environments or in a cloud computing environment.

Algorithms: SAS® graphical user interfaces help you build machine learning models and implement an iterative machine learning process. You don't have to be an advanced statistician. Our comprehensive selection of machine learning algorithms are included in many SAS products and can help you quickly get value from your big data – including data from the Internet of Things.

SAS machine learning algorithms include:

Neural networks.
Decision trees.
Random forests.
Associations and sequence discovery.

Expand list

Gradient boosting and bagging.
Support vector machines.
Nearest-neighbor mapping.
K-means clustering.
Self-organizing maps.
Local search optimization techniques (e.g., genetic algorithms).
Expectation maximization.
Multivariate adaptive regression splines.
Bayesian networks.
Kernel density estimation.
Principal component analysis.
Singular value decomposition.
Gaussian mixture models.
Sequential covering rule building.

Neural networks
Decision trees
Random forests
Associations and sequence discovery
Gradient boosting and bagging
Support vector machines
Nearest-neighbour mapping
k-means clustering
Self-organising maps

Local search optimisation techniques (e.g., genetic algorithms)
Expectation maximisation
Multivariate adaptive regression splines
Bayesian networks
Kernel density estimation
Principal component analysis
Singular value decomposition
Gaussian mixture models
Sequential covering rule building

Tools and processes: As we know by now, it’s not just the algorithms. Ultimately, the secret to getting the most value from your big data lies in pairing the best algorithms for the task at hand with:

Comprehensive data quality and management
GUIs for building models and process flows
Interactive data exploration and visualisation of model results
Comparisons of different machine learning models to quickly identify the best one

Do you need some basic guidance on which machine learning algorithm to use for what? This blog by Hui Li, a data scientist at SAS, provides a handy cheat sheet.

Boost your SAS® skills

Get in-depth instruction and free access to SAS software to build your machine learning skills. Courses include: 14 hours of course time, 90 days of free software access in the cloud and a flexible e-learning format, with no programming skills required.

Learn more and try it for free Sign up for machine learning courses, free access to software, and more

What are some popular machine learning methods?

Two of the most widely adopted machine learning methods are supervised learning and unsupervised learning – but there are also other methods of machine learning. Here's an overview of the most popular types.

Supervised learning

Supervised learning algorithms are trained using labeled examples, such as an input where the desired output is known. For example, a piece of equipment could have data points labeled either “F” (failed) or “R” (runs). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabeled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.

Unsupervised learning

Unsupervised learning is used against data that has no historical labels. The system is not told the "right answer." The algorithm must figure out what is being shown. The goal is to explore the data and find some structure within. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.

Semisupervised learning

Semisupervised learning is used for the same applications as supervised learning. But it uses both labeled and unlabeled data for training – typically a small amount of labeled data with a large amount of unlabeled data (because unlabeled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression and prediction. Semisupervised learning is useful when the cost associated with labeling is too high to allow for a fully labeled training process. Early examples of this include identifying a person's face on a webcam.

Reinforcement learning

Reinforcement learning is often used for robotics, gaming and navigation. It's also used in conjunction with generative AI techniques, like large language models. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximize the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.

Supervised learning algorithms are trained using labelled examples, such as an input where the desired output is known. For example, a piece of equipment could have data points labelled either “F” (failed) or “R” (runs). The learning algorithm receives a set of inputs along with the corresponding correct outputs, and the algorithm learns by comparing its actual output with correct outputs to find errors. It then modifies the model accordingly. Through methods like classification, regression, prediction and gradient boosting, supervised learning uses patterns to predict the values of the label on additional unlabelled data. Supervised learning is commonly used in applications where historical data predicts likely future events. For example, it can anticipate when credit card transactions are likely to be fraudulent or which insurance customer is likely to file a claim.

Unsupervised learning is used against data that has no historical labels. The system is not told the "right answer." The algorithm must figure out what is being shown. The goal is to explore the data and find some structure within. Unsupervised learning works well on transactional data. For example, it can identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or it can find the main attributes that separate customer segments from each other. Popular techniques include self-organising maps, nearest-neighbour mapping, k-means clustering and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.

Semi-supervised learning is used for the same applications as supervised learning. But it uses both labelled and unlabelled data for training – typically a small amount of labelled data with a large amount of unlabelled data (because unlabelled data is less expensive and takes less effort to acquire). This type of learning can be used with methods such as classification, regression and prediction. Semi-supervised learning is useful when the cost associated with labelling is too high to allow for a fully labelled training process. Early examples of this include identifying a person's face on a web cam.

Reinforcement learning is often used for robotics, gaming and navigation. With reinforcement learning, the algorithm discovers through trial and error which actions yield the greatest rewards. This type of learning has three primary components: the agent (the learner or decision maker), the environment (everything the agent interacts with) and actions (what the agent can do). The objective is for the agent to choose actions that maximise the expected reward over a given amount of time. The agent will reach the goal much faster by following a good policy. So the goal in reinforcement learning is to learn the best policy.

Data management needs AI and machine learning, and just as important, AI/ML needs data management. As of now, the two are connected, with the path to successful AI intrinsically linked to modern data management practices. Dan Soceanu Senior Product Manager for AI and Data Management, SAS

What are the differences between data mining, machine learning and deep learning?

Although all of these methods have the same goal – to extract insights, patterns and relationships that can be used to make decisions – they have different approaches and abilities.

Data mining can be considered a superset of many different methods to extract insights from data. It might involve traditional statistical methods and machine learning. Data mining applies methods from many different areas to identify previously unknown patterns from data. This can include statistical algorithms, machine learning, text analytics, time series analysis and other areas of analytics. Data mining also includes the study and practice of data storage and data manipulation.

Similar to statistical models, the goal of machine learning is to understand the structure of the data – to fit well-understood theoretical distributions to the data. With statistical models, there is a theory behind the model that is mathematically proven, but this requires that data meets certain strong assumptions. Machine learning has developed based on the ability to use computers to probe the data for structure, even if we don't have a theory of what that structure looks like. The test for a machine learning model is a validation error on new data, not a theoretical test that proves a null hypothesis. Because machine learning often uses an iterative approach to learn from data, the learning can be easily automated. Passes are run through the data until a robust pattern is found.

Deep learning combines advances in computing power and special types of neural networks to learn complicated patterns in large amounts of data. Deep learning techniques are currently state of the art for identifying objects in images and words in sounds. Researchers are now looking to apply these successes in pattern recognition to more complex tasks such as automatic language translation, medical diagnoses and numerous other important social and business problems.

Next steps

Enable everyone to work in the same integrated environment – from data management to model development and deployment.

SAS machine learning solutions Learn more about SAS machine learning solutions

Picking the perfect environment

Purpose-built for developers and modelers, SAS® Viya® Workbench is a self-service, on-demand compute environment for analytical development, including building AI and machine learning models for better data analysis.

Learn more Try SAS Developer Workbench for developers and modelers

Machine Learning

What it is and why it matters

Evolution of machine learning

Machine learning and artificial intelligence

Machine learning in today's world

All about machine learning algorithms

Building models with a developer workbench

Recognition for leading technology

Adopt trustworthy AI

Real-world applications of machine learning

Why is machine learning important?

Who's using it?

Financial services

Health care

Insurance

Life sciences

Public sector

Retail and consumer goods

Learn More About Industries Using This Technology

How it works

Neural networks

Decision trees

Random forests

Associations and sequence discovery

Gradient boosting and bagging

Support vector machines

Nearest-neighbour mapping

k-means clustering

Self-organising maps

Local search optimisation techniques (e.g., genetic algorithms)

Expectation maximisation

Multivariate adaptive regression splines

Bayesian networks

Kernel density estimation

Principal component analysis

Singular value decomposition

Gaussian mixture models

Sequential covering rule building

Comprehensive data quality and management

GUIs for building models and process flows

Interactive data exploration and visualisation of model results

Comparisons of different machine learning models to quickly identify the best one