What it is and why it matters
Data mining defined
Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, organizations can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.
The importance of data mining
So why is data mining important? You’ve seen the staggering numbers – the volume of data produced is doubling every two years. Unstructured data alone makes up 90 percent of the digital universe. But more information does not mean more knowledge. Data mining allows us to sift through all the chaotic and repetitive noise, understand what is relevant and then make good use of it to assess likely outcomes.
Data mining history and current advances
Learning from data is extremely powerful, and its use is transforming business decision making in multiple industries at an accelerating pace, saving money and even lives. It's an exciting time to be a data miner! To be excellent at the work, we need to listen well to transform a real-world challenge into a close, but solvable, problem.
Founder and President, Elder Research Inc.
The process of digging through data to discover hidden connections and predict future trends has a long history of hundreds of years. Sometimes referred to as "knowledge discovery in databases," the generalized term "data mining" wasn’t coined until the 1990s. But its foundation is based primarily on three intertwined scientific disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like intelligence displayed by software and/or machines) and machine learning (algorithms that can learn from data to make predictions). What was old is new again, as data mining technology evolves to keep pace with the limitless potential of big data and affordable computing power.
Over the last decade in particular, advances in processing power and speed have allowed businesses to move beyond manual, tedious and time-consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more possibilities to recognize hidden relevant insights and keep a competitive advantage. Retailers, banks, manufacturers, telecommunication providers and insurers, among others, are using data mining solutions to discover relationships among everything from pricing, promotions and demographics to how the economy, risk, competition and social media are affecting their business models, revenues, operations and customer relationships.
Connect with the latest insights on analytics through related articles and research.
More on data mining
- White paper: Data Mining from A to Z
- Book excerpt: Big Data, Data Mining and Machine Learning
- White paper: How a Hybrid Anti-Fraud Approach Could Have Saved a Federal Health Agency More Than $100 Million
- White paper: Drilling Optimization Through Advanced Analytics Using Historical and Real-Time Data
- White paper: Leaning Forward in the Foxhole: The Emerging Analytics Imperative for the Department of Defense
- White paper: Heavy Reading: Advanced Predictive Network Analytics
- White paper: Building Believers: How to Expand the Use of Predictive Analytics in Claims
Data mining terminology, tools and techniques
Data mining, as a composite discipline, represents a variety of methods or techniques used in different analytic capabilities that address a gamut of organizational needs, ask different types of questions and use varying levels of human input or rules to arrive at a decision.
Descriptive modeling uncovers shared similarities or groupings in historical data to determine reasons behind success or failure, such as categorizing customers by product preferences or sentiment. Sample techniques include:
- Clustering – grouping similar records together.
- Anomaly detection – identifying multidimensional outliers.
- Association rule learning – detecting relationships between records.
- Principal component analysis – detecting relationships between variables.
- Affinity grouping – grouping people who have common interests or similar goals (e.g., people who buy X often buy Y and possibly Z).
Predictive modeling goes deeper to classify events in the future or estimate unknown outcomes. For example, using credit scoring to assess future risk of repayment. Predictive modeling also helps to uncover insights for drivers or events of interest, such as customer churn, campaign responses or credit defaults. Sample techniques include:
- Regression – a measure of the strength of the relationship between one dependent variable and a series of independent variables.
- Neural networks – computer programs that detect patterns, make predictions and learn).
- Decision trees – a tree-shaped diagram in which each branch represents a probable occurrence.
- Support vector machines – supervised learning models with associated learning algorithms.
The most important part of any data mining project is defining the problem clearly. No single model tells the complete story. There is no rule that says when you've exhausted the data. [There are diminishing returns, so ask] How much value or money can I bring to the company if I continue?
President of Abbott Analytics
Prescriptive modeling looks at internal and external variables and constraints to recommend one or more courses of action – for example, prescribing the best marketing offer to send to each customer. Sample techniques include:
- Predictive analytics plus rules – finding if/then rules from patterns and predicting outcomes.
- Marketing optimization – simulating optimal media mix in real time for optimal ROI.
With the growth in unstructured data from the web, comment fields, books, email, PDFs, audio and other text sources, adoption of text mining as a related discipline to data mining has also grown significantly. Businesses need the ability to successfully parse, filter and transform unstructured data in order to include it in predictive models for improved prediction accuracy.
In the end, data mining should not be looked at as a separate, standalone entity because the pre-processing (data preparation, data exploration) and post-processing (model validation, scoring, model performance monitoring) are equally essential.
Data mining uses and real-world examples
Automated algorithms help banks get a better view of market risks, detect fraud faster, manage regulatory compliance obligations and get optimal returns on their marketing investments. HSBC has achieved significantly lower incidence of fraud across tens of millions of debit and credit card accounts.
Large customer databases hold hidden insights about how to improve customer relationships, optimize marketing campaigns and forecast sales. Staples runs nearly 1,500 multichannel campaigns annually based on 25 million customer records. Data analysis of that campaign generation showed a 137 percent rate of return.
Analytic know-how gives insurance companies the ability to solve complex problems concerning fraud, compliance, risk management and customer attrition. OneBeacon Insurance Group used SAS to help price products in its personal and commercial businesses. As a result, the company improved its loss ratio by two to four points and reduced the time spent building models.
Aligning supply plans with demand forecasts is essential, as is the early detection of problems, quality assurance and investment in brand equity. Volvo analyzes more than 100 parameters on its vehicles to predict wear, avoid unplanned customer downtime and anticipate potential incidents for quicker response time.
In an overloaded market where competition is tight, the answers are often within your consumer data. The multimedia company Sanoma uses analytic models to make sense of millions of transactions a week, predict customer behavior and offer highly targeted and relevant campaigns.
With big data analytics, health insurers can reduce fraud claims, hospital care providers can improve patient outcomes, and patients can receive safer, more affordable care. Blue Cross and Blue Shield of North Carolina used predictive models to determine the potential for at-risk patient readmission so it could engage more with patients before discharge. The data model correctly beat chance by 400 percent in identifying those patients.
Educators can use unified, data-driven views of student progress to predict how well students will do before they set foot in the classroom, and to develop intervention strategies to keep them on course. More than 4,000 teachers and 350 administrators in the Plano Independent School District can quickly access student data and predict achievement. Many of the district's schools are in the 90th percentile for performance.
Armed with the right data, agencies can make faster decisions to keep citizens safe, reduce the burden that fraud is putting on government programs and tune in to public sentiment. The UK’s HM Revenue & Customs needed a data analytics solution to help identify significant tax evasion and fraud. Analytics helped the agency locate billions in additional tax revenue.
Predicting outages before they occur, managing pricing volatility and protecting market share are just some of the benefits of harnessing the power of big data. Automated marketing campaigns enabled EDP España to achieve a customer recovery rate of more than 80 percent, electricity customer loyalty of 95 percent and 80 percent loyalty among gas customers.
SAS has been developing machine learning techniques for data scientists since long before it was even a job title, let alone a sexy one. Because of the research and development commitment SAS has made over decades, and continues to make, we have the greatest breadth and depth of analytical algorithms in the industry, as well as the data processing tools and data manipulation techniques required to take a job from start to finish.
Senior Director, SAS Advanced Analytics R&D
Each year, HP conducts approximately 2.5 billion interactions via customer calls, website visits, emails and chat sessions, and has even more touchpoints through retail partners. The result is a 900TB data warehouse with 360 million customer records, growing by millions each month. HP’s goal was clear: find meaningful value in all that data, and achieve a 360-degree view of its customers to be more responsive and competitive.
Savings: With powerful data mining analytics, HP was able to accurately score more than 100 million customers in seconds to target its marketing and service efforts. As a result, HP has seen a 20 percent incremental ROI across campaigns. Orders shipped have increased by 50 percent in three years, and the overall operating profit of the HPDirect.com store has increased by more than 50 percent.
Read the complete story about HP