What it is and why it matters
What is machine learning?
Machine learning is a branch of artificial intelligence (AI) based on two things – mathematical algorithms and automation. The idea is to automate the building of analytic models that use algorithms to “learn” from data in an iterative fashion. The “machine” (it’s really an algorithm) learns from its mistakes in previous steps to derive the best results without human intervention. These models can then be used to produce reliable, repeatable decisions.
The iterative aspect of machine learning is important because your models aren’t going to get any smarter by themselves. They need to learn from previous computations to produce the best results.
It’s a science that’s not new – but one that’s gaining fresh momentum. The highly hyped, self-driving Google car? Essence of machine learning. Online recommendation offers? A machine learning application for everyday life. Fraud detection? One of the more obvious, important uses in our world today.
Why is it important?
For most organizations, the race is on to extract valuable information from growing volumes and varieties of data. Electronic (and other) data is increasing at rates never seen before. Storage options for big data are more affordable than ever – maybe even free! And computational processing power has never been cheaper or more powerful.
That means with the right data, the right technologies and the right analytics, it’s possible to quickly and automatically produce models that can analyze bigger, more complex data. And deliver faster, more accurate results without human intervention. Even on a very large scale. The result: high-value predictions that can guide better decisions and actions in real time.
One key to this is automated model building. Analytics thought leader Thomas H. Davenport wrote recently in the The Wall Street Journal that with the rapidly changing, growing volumes of data, "fast-moving modeling streams were needed to keep up." And while "humans can typically create one or two models a week; machine learning can create thousands."
Ever wonder how an online retailer provides nearly instantaneous offers for other products you may be interested in? Or how lenders can provide near real-time answers to your loan requests? Many day-to-day activities are powered by machine learning algorithms.
- Fraud detection.
- Online recommendations.
- Real-time ad placements on Web pages and mobile devices.
- Text-based sentiment analysis.
- Credit scoring and next-best offers.
- Prediction of equipment failures.
- New pricing models.
- Network intrusion detection.
- Handwriting analyses.
- Email spam filtering.
It’s not your daddy (or granddaddy)'s ML anymore
Machine learning today is not like machine learning before. While many of the mathematical algorithms have been around for a long time, the ability to apply complex mathematical calculations to huge quantities of data – over and over, faster and faster – is a recent development. Cheaper data storage, distributed processing, more powerful computers, and the analytical opportunities they provide are all responsible for the resurging interest in these systems.
Machine learning methods explained
Two of the most widely adopted machine learning methods are supervised and unsupervised.
- Supervised learning enables you to discover patterns in data that relate attributes to labels (or the historic outcome the algorithm is learning to predict). For example, a piece of equipment could have data points labeled either “F” (failed) or “R” (runs). The algorithm uses the historical data to extract patterns of attributes that relate to outcomes labeled “F.” This is the learning phase. The patterns then predict the outcome of labels on future data. Machine learning models are called classification models when the label has distinct categorical values – for example “F” and “R,” or “Low,” “Medium” or “High.” Prediction models are those where the label is a numeric value, such as a credit score or amount of an insurance claim.
- Unsupervised learning is used against data that has no historic labels so the goal is to explore the data and find some structure within. Data is separated into classes or areas, so you can identify the data that is similar or dissimilar to each other. It’s widely used for tasks such as clustering and dimension reduction. For example, to identify segments of customers with similar attributes who can then be treated similarly in marketing campaigns. Or, to find the main attributes that separate customer segments from each other. Popular techniques include self-organizing maps, nearest-neighbor mapping and singular value decomposition. These algorithms are also used to segment text topics, recommend items and identify data outliers.
Another topic often discussed is the difference between machine learning and other statistical and mathematical approaches like data mining. A simple definition is that while machine learning uses many of the same algorithms as data mining, the main difference is what the two disciplines predict. Data mining discovers previously unknown patterns and knowledge. Machine learning is used to reproduce known patterns and knowledge, apply those to other data, and then apply that to decision making.
What You Need to Create Good Machine Learning Systems
- Data preparation capabilities
- Algorithms – basic and advanced
- Automation and iterative processes
- Ensemble modeling
Connect with the latest insights on analytics through related articles and research.
How SAS can help
SAS graphical user interfaces help you build machine-learning models and implement an iterative machine learning process. You don't have to be an advanced statistician. Our comprehensive selection of machine learning algorithms can help you quickly get value from your big data. They include:
- Neural networks.
- Decision trees.
- Random forests.
- Associations and sequence discovery.
- Gradient boosting and bagging.
- Support vector machines.
- Nearest-neighbor mapping.
- k-means clustering.
- Self-organizing maps.
- Local search optimization techniques such as genetic algorithms.
- Expectation maximization.
- Multivariate adaptive regression splines.
- Bayesian networks.
- Kernel density estimation.
- Principal components analysis.
- Singular value decomposition.
- Gaussian mixture models.
- Sequential covering rule building.
As we know by now, it’s not just the algorithms. Ultimately, the secret to getting the most value from your big data lies in pairing the best algorithms for the task at hand with:
- Comprehensive data quality and management.
- GUIs for building models and process flows.
- Comparisons of different machine learning models to quickly identify the best one.
- Interactive data exploration and visualization of model results.
- Automated ensemble model evaluation to identify the best performers.
- Easy model deployment so you can get repeatable, objective decisions quickly.
- An integrated end-to-end platform for the automation of the data-to-decision process.
At SAS, we are continuously searching for and evaluating new approaches. Ours is a long history of implementing the statistical methods best suited to solving the problems you face. We combine our rich, sophisticated heritage in statistics and data mining with new architectural advances to ensure your models run as fast as possible – even in huge enterprise environments.
We understand that quick time to value not only means fast, automated model performance but also time NOT spent moving data between platforms. High-performance, distributed analytical techniques take advantage of massively parallel processing integrated with Hadoop, as well as all major databases. You can cycle quickly through all steps of the modeling process – without moving data.
This high-performance big data analytics platform enables organizations to automate the entire analytics life cycle – not just the modeling process.