Abstracts

This page is updated weekly. Please check back frequently for the latest information.
Keynote Speakers
Frontiers in Data Mining: Emerging Trends, Challenges and Applications
Bart Baesens, Katholieke Universiteit Leuven (Belgium) & University of Southampton (United Kingdom)
Over the past few years, data mining has grown from a relatively unknown discipline into a widespread billion dollar business. Being first only adopted in the retail and banking sectors, we can nowadays observe a proliferation of the application domains, like for instance in e-business, terrorism prevention, RFID, software engineering, pharmaceutics, and bio-informatics. In this presentation, we first briefly present a selection of new exciting data mining techniques (e.g. Bayesian networks, multirelational mining), having a substantial potential for improving (strategic) business processes. We then discuss some key challenges when implementing data mining models successfully in the business, e.g. improving data quality, model interpretability, model backtesting and model stress testing. We conclude by covering some recent/visionary application domains and explain how data mining can contribute towards an increased efficiency in these fields.
Getting a Seat at the Boardroom Table
Dr. Niall M. Fraser, Open Options Corporation
In large organizations, the direct use of quantitative information is often limited to relatively routine operational decisions. High-level strategic issues, on the other hand, are commonly decided primarily by the informed experience and intuition of corporate leadership. There is clearly a hierarchy - data professionals are the suppliers of data to the people who actually make the decisions.

This can be a bad idea. Experience and intuition have limitations that data can supplement, and vice versa. If a deep understanding of data does not directly inform corporate strategic decisions, human error is more likely. Personality and politics can intervene, for example, or things can just be too complex for the mind to deal with effectively. When a firm's fundamental strategic issues are involved, such errors can be disastrous.

This presentation will present a novel technology used at Open Options Corporation to bring the analytic function to the boardroom in a very effective way. The particular approach described is specific to strategic issues where outside forces, such as competitors or regulators, are particularly important. However, it has been very successful at making data professionals a key part of high-level strategic decision making. Lessons learned from the use of this technique by Open Options provide a model for achieving similar benefits in other application areas.
Mining Industrial Data using Latent Variable Methods
John MacGregor, McMaster University, Canada
Latent Variable methods based on Principal Component Analysis (PCA) and Projection to Latent structures (PLS) have been the main approaches used for mining information from databases in the process industries. This talk will look at the theoretical justification for the use of these methods and present recent applications illustrating their use on a variety of problems from diverse industries. In particular, the talk will consider the use of plant data for process analysis, monitoring, and optimization, the extraction of information from digital images for on-line process and product quality control, and the use of diverse industrial databases for the rapid development of new products.
Managing Business Complexity with Agent-Based Modeling and Simulation
Michael J. North, Argonne National Laboratory
Agent-based modeling and simulation (ABMS) is a recent approach to modeling systems comprised of interacting autonomous agents. ABMS is already having far-reaching effects on the way that government and business uses computers to support decision-making. Computational advances have made possible a growing number of agent-based applications in a variety of fields at ever-increasing scales. Applications range from using ABMS to model supply chains and logistics systems, to predicting the spread of epidemics and the diffusion of public information, from the identifying factors in the fall of ancient civilizations to understanding contemporary urban conflict, to name a few. This tutorial, based on North and Macal's "Managing Business Complexity: Discovering Strategic Solutions with Agent-Based Modeling and Simulation" (Oxford 2007), describes the foundations of ABMS, identifies software toolkits and methods, and approaches for developing agent models, from spreadsheets to enterprise-scale computer systems, and discusses the relationship between ABMS and traditional modeling techniques, emphasizing the value-added that ABMS provides, along with special challenges pertaining to data and model validation.
An Introduction to Case-Based Reasoning with Special Emphasis to Image Mining Tasks in Biomedical Applications
Petra Perner, Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Case-Based Reasoning (CBR) solves problems using the already stored knowledge, and captures new knowledge, making it immediately available for solving the next problem. Therefore, case-based reasoning can be seen as a method for problem solving, and also as a method to capturing new experiences and making it immediately available for problem solving. It can be seen as a learning and knowledge-discovery approach.

CBR can collect samples in a defined way as well as learning more generalized knowledge in the form of higher-order constructs among the samples, prototypes and structures. As those it fills the gap between generalizing data mining methods, such as decision trees, statistical models, rule induction methods, and similarity-based data mining methods such as nearest neighbour classifiers.

The success of CBR systems has been shown for many applications among them are signal/image processing and interpretation tasks, help-desk applications, medical applications and E-commerce product-selling systems. In this talk we will explain the case-based reasoning and case mining process. We will show what kinds of methods are necessary to provide all the functions for such a computer model. We will develop the bridge between CBR and Statistics. Examples will be given based on image mining tasks in biomedical applications such as high-content analysis of cellular assays, novelty detection, and meta-learning for parameter selection in signal processing.
Reinventing Customer Relationships: Using Analytics to Capitalize on Insight, Intimacy, and Loyalty to Drive Growth
Daniel Thorpe, Wachovia
Everyone talks about creating an appealing "customer experience," but what do banks have to do to go beyond buzzwords and effectively create an integrated customer experience that leads to retention, revenue growth and improved profitability? What roles do analytics and targeting play in generating customer insight, customer value and customer intimacy? Several examples and a framework using analytics to understand the links between customers and values will be discussed.
Session Speakers
Performance Analytics on the Maturity Curve
Michele Boulanger, JISC Consulting
Performance management systems have, until recently, focused on tracking and reporting selected key performance indices (KPIs), lead indicators, and metrics in such a manner as to allow improvement activities based on the data being captured. But the large number of metrics, and the irrelevant nature of some of them, have led to conflicts and distraction of efforts resulting in overall poor prediction capability. We'll explore the role that advanced analytics plays in resolving these issues as well as in driving accountability for performance. This in turn leads to a new set of challenges for performance management systems and analytics and the people who deploy and manage them.
A Field Guide to Text Mining: An overview of the people, tools, and research frontiers that are unlocking the predictive power of text
Zach Buckner, Elder Research, Inc.
Topics will include an overview of text mining, current and emerging software tools, application ideas, and references for learning more about text mining. The talk will highlight the theoretical concepts that underpin today's text mining algorithms, without focusing on equation specifics. The talk will also include case studies of several past and present text mining projects at Elder Research, Inc., and include vignettes of several commercial and in-house tools developed for text mining.
Panel discussion:Teaching Data Mining to Graduate Students and Professionals via online and offline delivery methods: Opportunities and Challenges
Dr. Goutam Chakraborty, Oklahoma State University; Dr. J. Michael Hardin, University of Alabama; Dr. Morgan C. Wang, University of Central Florida
Data mining skills to analyze and interpret large and complex data for making good decisions have always been an important consideration for business professionals. With the explosion of data availability and emergence of highly sophisticated analysis software and algorithms, this aspect has become even more important in the last decade. However, many challenges arise in successfully teaching today's graduate business students and business professionals such quantitative skills whether the classes are taught in face-to-face sessions or via online delivery methods. This panel discussion session has three experts, each of whom has founded and directed successful graduate data mining programs in three large US universities. In addition, these experts have also been teaching data mining to graduate students (both in business and statistics) as well as industry professionals for many years. These experts will share their collective knowledge about how to teach data mining tools/techniques/concepts to full-time business graduate students who take courses via traditional delivery methods (such as lab and class rooms) as well as part-time students (business professionals) who take courses via online delivery methods. The audience will be encouraged to ask specific questions to panel members as well as share their own experiences.
Sales Forecasting Using Google Searches
Michael Cavaretta, Ford's Research and Innovation Center
Google dominates search on the internet with estimates of up to 60% of all searches. Customers use internet search to find information on products, compare different products, and locate online, or brick and mortar stores. This is particularly true for expensive and/or durable goods. As more and more people make the internet their primary source for product information, tracking customers online searches should correlate with their purchase behavior.
A Bag of Tricks for Your Balancing Act: How to Increase Predictive Accuracy on Imbalanced Datasets
Sven F. Crone, Lancaster University Management School, UK
Data Mining methods and procedures are routinely employed in business, but often neglect the specific properties of the dataset. For many corporate applications the actual class of interest, e.g. those responding to a direct mailing or defaulting on a loan, is often an underrepresented minority, which should be either targeted or avoided to ensure profitability. But how important is the data in the majority class of lesser interest? Is it required at all, or can we discard parts of it? And if so, is there some 'golden ratio' of negative to positive examples?

A variety of simple sampling strategies are now available to under- or over-sample the existing data. This presentation will demonstrate how different approaches of data sampling can enhance or impair predictive accuracy, using case studies of database marketing and direct mailing, customer credit scoring, and predicting internet shopping adoption to distinguish consumers between online-shoppers, browsers and offline shoppers.
Offset Techniques in Predictive Modeling for Insurance
Matthew Flynn, ISO and Jun Yan, Deloitte Consulting
In predictive modeling, "offset" is a technique frequently used in data architecture, modeling architecture and model setups for data mining. Intuitively, offset is a simple method used to run a model against the residual of a set of given factors. In this presentation, we will first display some generic applications of offset and corresponding SAS code using Proc GENMOD and Proc NLMIXED. The examples include modeling count instead of modeling ratio, and partially offsetting coefficients for certain variables. Next, we use personal auto pricing as example to show several variations of "offset" in real modeling practice where we will discuss how "offset" could be applied to exposure adjustment, sequential modeling and cross coverage tier construction. A GLM with Tweedie Compound Poisson distribution will be used in some of the examples where we will show how to code a Tweedie model in SAS Proc NLMIXED.
Modeling and Optimization of Marketing Campaigns
Viterbo H. Berberena González, Anáhuac University (Mexico City) and Jaime Paredes Sánchez, Santander Bank in Mexico
This paper describes a methodology for modeling and optimization of marketing campaigns in a very general form. The design of experiments plays a very important role as a quantitative research tool. Some quantitative techniques for marketing research, although not new, are used in a novel way in a modeling and optimization methodology which complements the procedures of marketing research widely required nowadays. This methodology is a process to understand the market and also to identify issues that may arise in the design of experiments, in a pilot campaign, and in the launch of "full blown" campaigns. The methodology is now broadly used at Santander Bank in Mexico. Research to improve this methodology is in process, as a part of a bigger effort in the analytical intelligence area said bank.
Applying Seasonality to Evaluate Trends in Automotive Financing
Gene Grabowski, Ford Motor Credit Company
In the highly competitive automotive industry, monthly sales figures are scrutinized to evaluate market conditions and detect signals for future performance. To expand sales, automakers rely on financial subsidiaries to promote vehicle loans and leases. To maximize the benefit to the automaker, there is a growing desire to monitor and forecast key operational trends such as contract originations and collections. By better understanding these key trends, staffing levels can now be adjusted to meet changing market conditions. A critical first step of this process is the ability to gauge the impact of seasonal fluctuations.

This session will provide an overview of the analytical methods Ford Credit uses to evaluate seasonality in its contract originations and collections workload. By combining theoretical methods and actual SAS code, it will demonstrate how seasonal adjustments can be utilized to provide concrete business solutions for forecasting and planning. At each point, the discussion will emphasize how SAS plays a critical role in addressing these significant analytical challenges.
Empowering the Enterprise with Data Mining-Based Solutions
Richard Hale and Harry Seifert, IBM Corporation
Data mining has long been viewed as an ad hoc capability to apply highly complex techniques and methodologies to solve valuable business problems. IBM and SAS have teamed for more that 30 years in helping customers enhance their data mining environments by leveraging SAS applications and IBM technology infrastructure to extend their reach into the operational processes of the enterprise. This session will introduce concepts on how business analysts can selectively apply complex, predictive models created for them by statisticians and create simple, discovery-based models using their normal workbench interface. We will also discuss technologies and best practices to enhance SAS Enterprise Miner installations, automatically apply Enterprise Miner Models in DB2 environments, and extend application of the models to business intelligence reporting tool interface.
Support Vector Machines: The New Kid on the Block
Elsa Jordaan, The Dow Chemical Company
Support Vector Machines (SVM) is one of the latest nonlinear modelling techniques that come from the computational intelligence community. It is widely used for classification and text mining problems, but it can also be applied to regression problems. By design, SVM are able to model sparse data sets as well as rank deficient data sets ("fat" data). This last aspect makes it particularly interesting for analyzing marketing and medical data where the number of features is typically much higher the number of observations. Another capability of SVM, which is of particular interest in the financial sector, is its ability to detect outliers and anomalies.
Two Case Studies in Fraud Detection
Jin-Whan Jung, Jay King, and Sanjay Arangala, SAS
Fraud is known as a very difficult problem to solve due to its rarity and frequent changes in methods. Though rare, damage from fraud, when committed, can be significant both financially and in terms of corporate reputation and customer relations. Thus, fraud detection and prevention is an increasingly important objective across all industries. In a predictive modeling setting, approaches based on historical behavior tend to remain stable until the fraudsters change their strategy. Because new types of fraudulent activity are constantly emerging these models tend to decay rapidly and must also evolve. Catching these unusual and rapidly evolving behavior patterns have been especially challenging. Typical approaches with unknown historical information involve univariate and multivariate outlier detection techniques. Two examples using such techniques are introduced here with their business issues.
Data Mining at Chrysler
Thomas L. Kondrat, Chrysler LLC
Data mining technology has been used to solve a wide variety of business problems in many business domains at Chrysler. With the continuing expansion of brands and nameplates available to the automotive consumer, the automotive industry is extremely competitive and manufacturing and marketing processes have become increasingly complex. Consequently, we are relying more heavily on automated methods and analytics than ever before. Therefore, one critical key to corporate efficiency and effectiveness is the ability to both automate and optimize decision making (using analytics) to make the best use of the limited human resources available.
Building Symbolic Regression Models: An Industrial Experience
Arthur Kordon
Symbolic regression involves finding both the functional form and the numeric coefficients of a mathematical expression. It is one of the key application areas of Genetic Programming (GP) with great potential for effective empirical model development.

The presentation will focus on the industrial experience of applying symbolic regression for solving various real-world problems in the chemical industry. First, the competitive advantages of symbolic regression modeling, generated by GP, will be discussed. Second, an integrated methodology that explores the synergy between support vector machines, GP and statistics for effective symbolic regression modeling, will be presented. Third, the methodology will be illustrated with several industrial applications in The Dow Chemical Company.
Data Mining to Help Determine Which Orthodontic Patients are Appropriate to Treat and Which are Better to Refer to Specialists
Larry Lai & Eric Kuo, Align Technology, Inc.
The question of how to maximize treatment success for new customers using an innovative custom-manufactured dental product is a challenge for this medical device manufacturing company. By mining the wealth of digital orthodontics data about previously manufactured treatment devices, the company is better able to lay out effective product strategies to help guide the doctor in the decision-making process of, given his/her skill set, whether to treat a prospective patient using the device or refer the patient to a more-experienced dental specialist. The same approach can also be used to provide information to the clinical education, product sales, and product support teams to improve product training for customers. This strategy can improve treatment outcome quality, boost the doctor's clinical confidence in the dental product, and potentially improve the new customer retention rate.
Net Lift Prediction Models: How to maximize marketing impact and what data miners can learn from presidential campaigns
Kim Larsen, Charles Schwab & Co.
The true effectiveness of a marketing campaign is measured by quantifying the incremental impact. That is, additional revenue directly attributable to the campaign that we would not have gotten otherwise.

Measuring incremental impact is typically done by holding out a random control group that will not receive the offer. If the clients contacted by the campaign (a.k.a. the "test group") have better post-campaign performance than the control group, the campaign has been effective and further rollouts can be considered.

In order to maximize the impact of the campaign, marketers will work with an analytical team to develop a set of targeting criteria (typically based on a predictive model) that are believed to generate the best conversion rate. Such targeting strategies typically yield impressive conversion rates for the test group, but too often the results are equally impressive for the control group. In other words, targeting criteria often find clients that are interested in the product but would have bought the product whether or not they received the promotion. In such cases, the incremental impact is insignificant and the marketing dollars could have been spent elsewhere.

The purpose of this talk is to demonstrate how to build Net Lift Models that optimize the incremental impact of marketing campaigns by maximizing the difference in conversion rates between the test and control groups. We will see how the strategy needed to solve this problem is very similar to the strategies employed by presidential candidates: Spend your valuable budget in the swing states where the undecided voters are, not on the people that already have made up their minds.

A real-life case study will be presented with examples on how to implement this in SAS.
Using PSI to Monitor Predictive Model Stability in the Database Marketing Industry
Joe Laskos, Genworth Financial, Shihong Li, Choicepoint
Predictive models help streamline the decision-making process by improving the quality and consistency of decisions being made. In order to achieve maximum effectiveness and ensure maximum profitability for the client, front-end reports must be put in place to track model stability through the model's entire life cycle. PSI applications can be developed to serve this important business need for database marketing customers. PSI is an acronym for Population Stability Index (PSI). Population Stability Indices are calculated and monitored using a methodology known as "Entropy". The PSI application is a tool for creating front-end reports that track model stability. PSI is utilized as stability metric and is widely used and has been proved to be very effective. More than that, by incorporating model metadata management in database marketing environments, PSI easily provides great flexibility in creating time-series-based front-end reporting that can leverage dynamic model attribute metadata tables to simplify new model implementation and old model retirement. The PSI application helps proactively inform customers of changes in data that may affect the performance of their predictive models and allows them to make well-informed preemptive adjustments if needed. PSI reporting allows clients to dynamically adjust marketing strategies and quickly react to in-market changes thus providing them with a valuable edge against their competitors who have not implemented an effective application to measure model stability.

Benefits:
  1. Creates standardized front-end reports to monitor predictive model stability;
  2. Utilizes the PSI methodology which has been proved very effective in detecting data shifts;
  3. When utilizing a metadata-model PSI provides the ability for time series comparison reports and simplifies new model implementation and old model retirement
  4. Updates regularly to provide dynamic monitoring;
  5. Maximizes model effectiveness and enables maximum client profitability;
  6. Adds a competitive edge over analytic modeling services alone.
Behavior-Based Predictive Models – A New Framework of Predictive Models
Wensui Liu, Chase Credit Card Service, & Jimmy Cela, ChoicePoint Precision Marketing
Modelers have traditionally used logistic regression, decision trees, or other data mining techniques to develop discriminate marketing models which classify populations of interests into segments. The aim is to calculate the probability-based score for each individual and predict his/her likelihood to respond to a marketing campaign offer, file an insurance claim, or default on a credit payment. This type of modeling strategy is focused on the choice of an individual without consideration to the consequent behaviors.

However, in the real world, what financially impacts companies the most is not in the choice to respond itself but the behaviors associated with such choice. Specifically, it is the frequency and the severity of such behaviors that result in either profits or losses to companies.

In this presentation, a new framework of predictive models will be introduced which simultaneously estimates both the probability of a specific choice (response, claim, or default) and the conditional probability of consequent behaviors (number of responses, claims, or defaults). Models to be discussed include Hurdle Model (Mullahy, 1986), Zero-Inflated Poisson Model (Lambert, 1992), and Latent Class Poisson Model (Wedel, 1993). Modeling strategy using SAS and various statistical tests for model selection will be illustrated to show the audience, from a marketing, insurance, and banking perspective, how to implement this new concept in data mining to predict customers' behaviors of interest.
Beating the Spread: Predicting game outcomes with a new ranking model
Carl Meyer and Anjela Govan, North Carolina State University
The Offense-Defense Model (ODM) is an effective, iterative ranking model that is partly inspired by John Kleinberg's 1998 HITS algorithm. Original HITS method is used for ranking web pages and is a part of the Ask.com search engine. We use ODM to generate rating scores for competitive team sports, which in turn we can use to assign ranks to the teams. The sports team ratings generated by ODM can also be utilized as a basis for predicting the outcomes of upcoming games between the teams. We use SAS and IML code for calculating these predictions. Finally we compare our results with predictions based on other popular sports ranking models.
Managing 3rd party claims processing; Successfully utilizing analytics to optimize suspect claims detection to reduce warranty costs and improve customer service
Richard Miller, GE
The service delivery process is filled with a number of key business issues that are intermingled and that cross many functional areas. Some of more complex processes involve the consumer experience of service delivery, commercial relationship management, and warranty administration. The ultimate goal of warranty and service contract claims administration is to minimize claims cost and maximize consumer and servicer retention.

We will explore the relationship between these processes and will take a deeper dive into how analytics can ultimately help improve all these things including the company bottom line.
Automating Human Decisions
Marc Schneiderman, Mobile Agent Technologies
As we move into the information age most companies have already automated wrote manual tasks, such as generating an invoice or posting a journal entry to the general ledger. The next great step will be the automation of human decisions. Although nature has imbued us with intuition and common sense, there are some innate limitations to our cognitive abilities which may prevent us from making good decisions. Business surveys have shown that executives feel that poor decision making is a real concern which they feel may affect bottom line profitability. An automated decision making system can make more accurate, consistent, unbiased decisions in less time than a human. In addition the decision will always follow long term corporate strategy and policy, as well as government regulations. The technologies used to build automated decision making systems include a business rules engine and data mining toolkit. First generation systems have been reasonably successful at automating high volume "operational" decisions, but there are some limitations which can be addressed by the addition of facilities to support common sense reasoning.
A Portfolio Approach to Segmentation in the Automotive Industry
Will Neafsey, Ford Motor Company
In today's complex and competitive Automotive Industry, a single segmentation can no longer serve all the needs of product, marketing, and communications. Additionally, attempting to cluster consumers in a single model can often lead to vague results with segments that are difficult to interpret or take action upon. This session will discuss a variety of types of segmentation, from complex to very simple, that can be assembled to provide the business with clear and actionable results. Specific automotive examples will be discussed.
Data Mining Challenges in Health Information
Matthew D. Rotelli, Eli Lilly and Company
The rising cost of healthcare is a major political issue, and the quality of healthcare is a major social issue. Currently, large amounts of data are available from clinical trials and medical claims databases. In addition, recent advances in technology are enabling the emergence of electronic health records. This will further feed the explosion of data available to assess the effectiveness, cost, and outcomes associated with treatment paradigms. Regulatory agencies, pharmaceutical companies, payers, and providers are eager to leverage this information to improve patient outcomes, reduce costs, and reduce medical errors. This is creating many opportunities to apply data mining techniques. However, many challenges exist to ensure the use of appropriate methodologies and accurate interpretation of results. These will be highlighted to encourage further research into this rapidly growing and important arena for data mining applications.
Tailoring the Use of SAS Enterprise Miner?
Sascha Schubert, SAS
A growing number of SAS users with different goals and skill levels need access to data mining functionality. The new generation of SAS® Enterprise Miner? 5 and the SAS stored process facility provide an easy way to tailor data mining functionality to the user's needs. The flexible architecture of SAS Enterprise Miner and the integration of Enterprise Miner into the SAS Enterprise Intelligence architecture allow the product users to create data mining projects for interactive or batch execution and share projects with other users. The software's Extension facility allows users to build specific functions that are fully integrated into the Enterprise Miner workbench. Based on the integrated batch processing capabilities, model training and model scoring code can be easily extracted from SAS Enterprise Miner and integrated into the SAS Enterprise Intelligence Platform using the SAS stored process facility. For users of the SAS Add-In for Microsoft Office, customized data mining interfaces can be integrated into their favorite Microsoft applications.
Solving Industrial Problems in the Chemical Industry Using Chemometrics
Mary Beth Seasholtz, Randy J. Pell, Pat Wiegand, Enric Comas, Leo Chiang, The Dow Chemical Company
Chemometrics is the application of mathematical tools for solving problems in research and manufacturing in the chemical industry. These methods include multivariate techniques such as principal component analysis and partial least squares. Chemometrics methods have been used at The Dow Chemical Company for more than 18 years. Improvements in the predictive performance of these methods are possible using objective sample and variable selections strategies. Presented in this talk are applications of robust statistical techniques for selecting a consistent set of samples as well as a genetic algorithm for selecting appropriate variables for multivariate models. A method for simultaneous samples/variable selection will also be discussed.
The Importance of Valid Models in Data Mining
Simon Sheather and Mike Speed, Texas A&M University
It has been stated that, because of vast amounts of data, the data miner does not have the luxury of verifying assumptions and hence checking for model validity. This paper uses the Dominick's Finer Foods database (James M. Kilts Center, GSB, University of Chicago) to investigate implications of not verifying assumptions and hence the validity of models. This database consists of same store sales of 58 stores over 268 weeks. This paper demonstrates that ensuring that a valid model has been fit improves predictions sometimes dramatically so. In the real example considered in this talk a valid model, which takes account of autocorrelation in the data, reduces prediction error set in the validation by 54%. Thus, the data miner may want to consider that the checking of assumptions and validity is not a luxury, but in fact a necessity.
Marketing Impact Optimization Using PROC OPTMODEL
Randy Sherrod, CISCO
Econometric models are often used to help inform the allocation of corporate marketing budgets. By quantifying the impact of past marketing efforts on market performance (e.g., revenue), models make it possible to optimize the impact of marketing investment. Typically, scenario planning and optimization alternatives are calculated using the mean (not variance) of the estimated relationship between marketing efforts and market performance. This paper uses PROC OPTMODEL to illustrate how uncertainty can impact the optimization of marketing budgets by treating the econometric estimates of the relationship between marketing efforts and market performance as a random variable.
Developing Customer Insights
Allen J. Thompson & Richard V. Wherry, Bank of America
A major problem for businesses, not just financial services, is to determine where to spend money and which customers to spend the money on. In order to allow the bank to more efficiently deploy resources, the GWIM Analytics group began exploring hazard models as an approach to address several key problems. Top on that list was lifetime value. The GWIM Analytics group has applied hazard modeling to the development of lifetime value of the client base.

While exploring the use of hazard models for lifetime value, the group alternately explored additional problems, and it was determined that hazard modeling techniques could be applied effectively to the management of the CD portfolio. Within the bank, there are groups that focus on forecasting rates that directly impact CD deposits. However, the Client Analytics group has the unique ability to leverage customer insights and a complete relationship view to enhance pricing decisions on CDs. We put this together and utilize hazard modeling to develop a forecast of the CD portfolio.
Mathematical Professional Science Masters (PSM) Degree Programs Are Excellent Sources for BI Staff Recruiting
Phil Tuchinsky, Tuchinsky BI, LLC and Senior Research Fellow, Central Michigan University Research Corporation
Would you like to add staff to your BI / analytics group that are A) seeking business / industrial careers in quantitative and data-driven problem solving; B) well-prepared academically as BI knowledge workers and C) proven project managers, business writers and presenters? Would these candidates be even more attractive if they come with D) experience on industrially sponsored analytics projects and E) business awareness coursework that prepares them to work with your executive project champions?

The graduates of more than a dozen professional science masters (PSM) degree programs at leading American universities have all these qualifications. They are self-starting mathematical knowledge workers and problem solvers, ready to grow into your BI corporate culture. Many will develop into leaders and managers. Starting salaries fresh from graduate school are near $60,000 a year (USD).

PSM programs are terminal degrees, usually involving two years of graduate training beyond a bachelor's degree. They are career-oriented, with established ties to business and industry. With Alfred P. Sloan Foundation support, more than 70 American universities created more than 120 of these programs since 1998. Their content ranges over all the mathematical, physical and life sciences; many are interdisciplinary.

The dozen+ mathematical PSM programs, variously named M.S. in Applied Mathematics, Financial Mathematics, Industrial Mathematics, Mathematical Entrepreneurship, etc., are exceptional training grounds for business intelligence knowledge workers. This talk presents the philosophy, origin and structure of these programs, their real-world roots and natural relationship to BI work ? and how to contact them all.
Alternative Paths Towards Improved Predictive Analytics for Customer Intelligence
Dirk Van den Poel, Ghent University, Belgium
This talk focuses on alternative ways to improve on your current predictive-analytics models for customer intelligence. This presentation will mainly focus on two areas of application: customer churn, cross/up-sell (also known as NPTB models, i.e. Next Product to Buy). Basically, there's two ways to enhance the predictive performance of your predictive models: 1. Include better variables, and 2. Employ better models. Firstly, this presentation demonstrates how sequential as well as textual information (using SAS Text Miner) will enhance the predictive capabilities over and above other variables. Secondly, we demonstrate how ensemble methods, comprised of numerous 'simple' models, increase the predictive power. All these approaches were rigorously tested through the peer-review process of international journals.
The IDeAL System: A Utility-based Methodology for Mining Massive Databases
Herna L. Viktor, University of Ottawa
Massive databases, which are omnipresent in domains such computational biology, law enforcement and environmental impact studies, amongst others, bring new challenges to the data mining community. There is an urgent need for novel algorithms and solutions to assist domain experts and data mining novices to understand these vast resources. Such users require direct, transparent access to their very large-scale relational databases. Furthermore, they require the data mining exercise and its results to be evaluated using measures that are of high economic utility, in order to make informed, thought-through decisions.

This talk describes the IDeAL data mining system, designed to mine very large-scale databases directly. A scalable approach that detects patterns between multiple seemingly unrelated characteristics is described, which enables us to capture previously unavailable semantically rich information. A utility-based evaluation method is followed, in order to give decision makers access to knowledge which may otherwise have remained hidden in their massive databases. The IDeAL system is illustrated by presenting the results when exploring an Anthropometric database, as seen from the Virtual Tailoring perspective.
Predicting Loss Given Default in Retail Portfolios using SAS Enterprise Miner
Hendrik Wagner, Independent Consultant (Risk Parameters)
Predicting percentage loss rates for non-defaulted loan exposures is a required component of risk estimation in the context of Basel 2 banking regulation. Over the course of the past few years many banks have gathered the necessary historical cash flow data that serve as the basis for LGD estimation in retail portfolios. Multivariate statistical estimation methods, however, have only recently started to be employed more widely. The presentation describes some tools in SAS Enterprise Miner that are particularly suited for addressing this problem, such as two-stage modelling, regression tree modelling and generalized additive neural networks.
Practical Applications of Decision Theory in Modeling Rare Events
Doug Wielenga, SAS
Modern data mining methods enable you to develop a large number of models in a short amount of time. Implicit in this development is a decision structure which can impact all phases of the process. Using the wrong decision structure can lead to inferior model development, incorrect model selection, and inadequate model deployment. By default, most modeling methods predict each observation for a class target into the level with the highest probability. This poses a problem when modeling a rare event where it may be impossible to find a set of predictors that identify any observations where the event of interest is most likely. It is common in these situations to oversample the rare event in the training data set or to change the threshold at which the rare event is selected. SAS Enterprise Miner addresses both of these approaches through the use of the Target Profiler. This paper discusses several strategies for modeling rare events based on the nature and the size of the imbalance, and provides several examples of how to use the Target Profiler to obtain the correct probabilities and make the right decision. By using these strategies, you will be able to better identify the best variables, pick the best model, and improve deployment results.