Data Mining Techniques: Theory and Practice
Duration: 3.0 daysExplore the inner workings of data mining techniques and how to make them work for you. Students are taken through all the steps of a data mining project, beginning with problem definition and data selection, and continuing through data exploration, data transformation, sampling, portioning, modeling, and assessment.
Learn how to
- use a data mining methodology
- build and use decision trees and neural networks for modeling and scoring
- use survival analysis and create survival curves.
Who should attend: Business analysts, their managers, and statisticians
Prerequisites
No prior knowledge of statistical or data mining tools is required.Course Contents
Introduction to Data Mining- what is data mining?
- directed and undirected data mining
- models
- profiling and prediction
- why have a methodology?
- how data miners can inadvertently learn things that are not true
- translating business problems into data mining problems
- the importance of model stability
- finding the right input variables
- sampling to create balanced model sets
- partitioning to create training, validation, and test sets
- data preparation
- model assessment
- developing intuition about data
- data structure
- data types
- data values
- exploring distributions
- summary statistics
- histograms
- using SAS Enterprise Miner for data exploration
- the null hypothesis
- statistical significance
- confidence bounds
- variance and standard deviation
- standardized values
- correlation
- linear regression
- logistic regression
- using SAS Enterprise Miner to build regression models
- decision trees as data exploration and classification tools
- decision trees for modeling and scoring
- decision trees for variable selection
- alternate representations of decision trees
- algorithms used to build decision trees
- splitting criteria
- recognizing instability and overfitting in decision tree models
- capturing interactions between variables
- using SAS Enterprise Miner to build decision trees
- origins of neural networks
- neural networks compared with regression
- the algorithms used to train neural networks
- data preparation requirements for neural networks
- picking appropriate inputs for neural networks
- creating neural network models using SAS Enterprise Miner
- similarity and distance
- distance metrics appropriate for different kinds of data
- the role of the training set in MBR
- combining the votes of several neighbors
- other K-nearest neighbor techniques
- collaborative filtering
- using the SAS Enterprise Miner MBR node
- more on similarity and distance
- the K-means algorithm
- divisive clustering
- agglomerative clustering
- data preparation for clustering
- interpreting clusters
- finding clusters with SAS Enterprise Miner
- origins of survival analysis
- how business data is different from clinical data
- hazards and hazard charts
- retention curves and survival curves
- calculating survival from retention
- calculating hazards empirically
- parametric hazard models
- censoring
- competing risks
- survival based forecasting
- using SAS code in SAS Enterprise Miner to create survival curves
- link analysis
- genetic algorithms
- association rules
- using SAS Enterprise Miner to discover associations in retail data
- formulating the business problem as a data mining problem
- finding the tool that fits the problem

