What it is and why it matters
Data mining defined
Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.
The importance of data mining
So why is data mining important? You’ve seen the staggering numbers – the volume of data produced is doubling every two years. Unstructured data alone makes up 90 percent of the digital universe. But more information does not necessarily mean more knowledge. Data mining allows us to sift through all the chaotic and repetitive noise, understand what is relevant and then make good use of that information to assess likely outcomes.
Data mining history and current advances
Learning from data is extremely powerful, and its use is transforming business decision making in multiple industries at an accelerating pace, saving money and even lives. It's an exciting time to be a data miner! To be excellent at the work, we need to listen well to transform a real-world challenge into a close, but solvable, problem.
Founder and President, Elder Research Inc.
The process of digging through data to discover hidden connections and predict future trends has a long history. Sometimes referred to as "knowledge discovery in databases," the term "data mining" wasn’t coined until the 1990s. But its foundation comprises three intertwined scientific disciplines: statistics (the numeric study of data relationships), artificial intelligence (human-like intelligence displayed by software and/or machines) and machine learning (algorithms that can learn from data to make predictions). What was old is new again, as data mining technology keeps evolving to keep pace with the limitless potential of big data and affordable computing power.
Over the last decade, advances in processing power and speed have enabled us to move beyond manual, tedious and time-consuming practices to quick, easy and automated data analysis. The more complex the data sets collected, the more potential there is to uncover relevant insights. Retailers, banks, manufacturers, telecommunications providers and insurers, among others, are using data mining to discover relationships among everything from pricing, promotions and demographics to how the economy, risk, competition and social media are affecting their business models, revenues, operations and customer relationships.
Connect with the latest insights on analytics through related articles and research.
More on data mining
- White paper: Data Mining from A to Z
- Book excerpt: Big Data, Data Mining and Machine Learning
- White paper: How a Hybrid Anti-Fraud Approach Could Have Saved a Federal Health Agency More Than $100 Million
- White paper: Drilling Optimization Through Advanced Analytics Using Historical and Real-Time Data
- White paper: Leaning Forward in the Foxhole: The Emerging Analytics Imperative for the Department of Defense
- White paper: Heavy Reading: Advanced Predictive Network Analytics
- White paper: Building Believers: How to Expand the Use of Predictive Analytics in Claims
Data mining terminology, tools and techniques
Data mining, as a composite discipline, represents a variety of methods or techniques used in different analytic capabilities that address a gamut of organizational needs, ask different types of questions and use varying levels of human input or rules to arrive at a decision.
Descriptive modeling uncovers shared similarities or groupings in historical data to determine reasons behind success or failure, such as categorizing customers by product preferences or sentiment. Sample techniques include:
- Clustering – grouping similar records together.
- Anomaly detection – identifying multidimensional outliers.
- Association rule learning – detecting relationships between records.
- Principal component analysis – detecting relationships between variables.
- Affinity grouping – grouping people with common interests or similar goals (e.g., people who buy X often buy Y and possibly Z).
Predictive modeling goes deeper to classify events in the future or estimate unknown outcomes – for example, using credit scoring to determine an individual's likelihood of repaying a loan. Predictive modeling also helps uncover insights for things like customer churn, campaign response or credit defaults. Sample techniques include:
- Regression – a measure of the strength of the relationship between one dependent variable and a series of independent variables.
- Neural networks – computer programs that detect patterns, make predictions and learn).
- Decision trees – tree-shaped diagrams in which each branch represents a probable occurrence.
- Support vector machines – supervised learning models with associated learning algorithms.
The most important part of any data mining project is defining the problem clearly. No single model tells the complete story. There is no rule that says when you've exhausted the data. [There are diminishing returns, so ask] How much value or money can I bring to the company if I continue?
President of Abbott Analytics
Prescriptive modeling looks at internal and external variables and constraints to recommend one or more courses of action – for example, determining the best marketing offer to send to each customer. Sample techniques include:
- Predictive analytics plus rules – developing if/then rules from patterns and predicting outcomes.
- Marketing optimization – simulating the most advantageous media mix in real time for the highest possible ROI.
With the growth in unstructured data from the web, comment fields, books, email, PDFs, audio and other text sources, the adoption of text mining as a related discipline to data mining has also grown significantly. You need the ability to successfully parse, filter and transform unstructured data in order to include it in predictive models for improved prediction accuracy.
In the end, you should not look at data mining as a separate, standalone entity because pre-processing (data preparation, data exploration) and post-processing (model validation, scoring, model performance monitoring) are equally essential.
Data mining uses and real-world examples
In an overloaded market where competition is tight, the answers are often within your consumer data. The multimedia company Sanoma uses analytic models to make sense of millions of transactions a week, predict customer behavior and offer highly targeted and relevant campaigns.
With analytic know-how, insurance companies can solve complex problems concerning fraud, compliance, risk management and customer attrition. By using SAS to price products in its personal and commercial businesses, OneBeacon Insurance Group improved its loss ratio by two to four points and reduced the time it took to build models.
With unified, data-driven views of student progress, educators can predict student performance before they set foot in the classroom – and develop intervention strategies to keep them on course. More than 4,000 teachers and 350 administrators in the Plano Independent School District can quickly access student data and predict achievement. Many of the district's schools are in the 90th percentile for performance.
Aligning supply plans with demand forecasts is essential, as is early detection of problems, quality assurance and investment in brand equity. Volvo analyzes more than 100 parameters on its vehicles to predict wear, avoid unplanned customer downtime and anticipate potential incidents to enable a faster response time.
Automated algorithms help banks get a better view of market risks, detect fraud faster, manage regulatory compliance obligations and get optimal returns on their marketing investments. HSBC has used data mining techniques to significantly lower the incidence of fraud across tens of millions of debit and credit card accounts.
Large customer databases hold hidden insights that can help you improve customer relationships, optimize marketing campaigns and forecast sales. Staples runs nearly 1,500 multichannel campaigns annually based on 25 million customer records. Data analysis of that campaign generation showed a 137 percent rate of return.
Armed with the right data, agencies can make faster decisions to keep citizens safe, reduce the burden that fraud is putting on government programs and tune in to public sentiment. The UK’s HM Revenue & Customs needed a data analytics solution to help identify significant tax evasion and fraud. Analytics helped the agency locate billions in additional tax revenue.
Predicting outages before they occur, managing pricing volatility and protecting market share are just some of the benefits of harnessing the power of big data. Automated marketing campaigns enabled EDP España to achieve a customer recovery rate of more than 80 percent, electricity customer loyalty of 95 percent and 80 percent loyalty among gas customers.
With big data analytics, health insurers can reduce fraud claims, hospital care providers can improve patient outcomes, and patients can receive safer, more affordable care. Blue Cross and Blue Shield of North Carolina used predictive models to determine the potential for at-risk patient readmission so it could engage more with patients before discharge. The data model correctly beat chance by 400 percent in identifying those patients.
It has been stated before, but being a data miner today is exciting – and, without any doubt, it will remain exciting for many years to come. Being able to address some of the most difficult problems in your industry, and turn answers into financial returns, is certainly a very rewarding activity. Our job is to provide you with the means to do so. For almost 40 years, SAS has developed the greatest breadth and depth of analytical algorithms, data processing tools, and data manipulation techniques required to get your data mining job done, from start to finish.
Senior Director, SAS Advanced Analytics R&D
Each year, HP conducts approximately 2.5 billion interactions via customer calls, website visits, emails and chat sessions, and has even more touchpoints through retail partners. The result is a 900TB data warehouse with 360 million customer records, growing by millions each month. HP’s goal was clear: find meaningful value in all that data, and achieve a 360-degree view of its customers to be more responsive and competitive.
Savings: With powerful data mining analytics, HP was able to accurately score more than 100 million customers in seconds to target its marketing and service efforts. As a result, HP has seen a 20 percent incremental ROI across campaigns. Orders shipped have increased by 50 percent in three years, and the overall operating profit of the HPDirect.com store has increased by more than 50 percent.