Data Mining Techniques: Theory and Practice

Duration: 3.0 days

Explore the inner workings of data mining techniques and how to make them work for you. Students are taken through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.

Learn how to

  • use a data mining methodology
  • build and use decision trees and neural networks for modeling and scoring
  • use survival analysis and create survival curves.

Who should attend: Business analysts, their managers, and statisticians

Prerequisites
No prior knowledge of statistical or data mining tools is required.
Course Contents
Introduction to Data Mining
  • what is data mining?
  • directed and undirected data mining
  • models
  • profiling and prediction
Data Mining Methodology
  • why have a methodology?
  • how data miners can inadvertently learn things that are not true
  • translating business problems into data mining problems
  • the importance of model stability
  • finding the right input variables
  • sampling to create balanced model sets
  • partitioning to create training, validation, and test sets
  • data preparation
  • model assessment
Data Exploration
  • developing intuition about data
  • data structure
  • data types
  • data values
  • exploring distributions
  • summary statistics
  • histograms
  • using SAS Enterprise Miner for data exploration
Statistics and Regression
  • the null hypothesis
  • statistical significance
  • confidence bounds
  • variance and standard deviation
  • standardized values
  • correlation
  • linear regression
  • logistic regression
  • using SAS Enterprise Miner to build regression models
Decision Trees
  • decision trees as data exploration and classification tools
  • decision trees for modeling and scoring
  • decision trees for variable selection
  • alternate representations of decision trees
  • algorithms used to build decision trees
  • splitting criteria
  • recognizing instability and overfitting in decision tree models
  • capturing interactions between variables
  • using SAS Enterprise Miner to build decision trees
Neural Networks
  • origins of neural networks
  • neural networks compared with regression
  • the algorithms used to train neural networks
  • data preparation requirements for neural networks
  • picking appropriate inputs for neural networks
  • creating neural network models using SAS Enterprise Miner
Memory Based Reasoning
  • similarity and distance
  • distance metrics appropriate for different kinds of data
  • the role of the training set in MBR
  • combining the votes of several neighbors
  • other K-nearest neighbor techniques
  • collaborative filtering
  • using the SAS Enterprise Miner MBR node
Clustering
  • more on similarity and distance
  • the K-means algorithm
  • divisive clustering
  • agglomerative clustering
  • data preparation for clustering
  • interpreting clusters
  • finding clusters with SAS Enterprise Miner
Survival Analysis
  • origins of survival analysis
  • how business data is different from clinical data
  • hazards and hazard charts
  • retention curves and survival curves
  • calculating survival from retention
  • calculating hazards empirically
  • parametric hazard models
  • censoring
  • competing risks
  • survival based forecasting
  • using SAS code in SAS Enterprise Miner to create survival curves
Miscellaneous Techniques
  • link analysis
  • genetic algorithms
  • association rules
  • using SAS Enterprise Miner to discover associations in retail data
Putting Data Mining Techniques to Work
  • formulating the business problem as a data mining problem
  • finding the tool that fits the problem
Software Addressed
This course addresses the following software product(s): SAS Enterprise Miner.

[SAS Institute Inc.] This page was created using SAS software.