SAS Logo M2002 Logo














Speakers

Abstracts
Logistics
Registration Info
Register Now
Training
Agenda
Sponsors
Corporate Overview
SAS Home
M2002 Recap
M2002 Home

circle
 

Speaker Abstracts

Below is a partial list of abstracts. Remaining abstracts will be posted as they are finalized. Please visit again soon.


John Brocklebank, SAS
Analytic Applications for Web Mining: Case studies using an Application Service Provider (ASP) approach
The web offers a new world of opportunity for data analysts, but within this opportunity lies challenges that must be overcome before traditional statistical techniques can be applied. The transition from raw web logs to an analytic-ready data sources is the most crucial and time consuming aspect for the information discovery process.

The ASP model offers the opportunity to provide a suite of analytic applications for extracting relevant information out of daily web logs and merging them with complementary data sources to generate a complete and accurate representation of web site activity.

This presentation will showcase analytical methods relevant to retail, pharmaceutical, and financial companies in meeting these challenges. Examples will include data cleansing and pre-processing methods, business metric forecasting and goal seeking, site traffic visualization, exception process notification, cross-sell analysis, and direct to consumer campaign effectiveness monitoring.


Randy Collica, Compaq
Mining Textual Data from Call Center Notes
This presentation shows how one can greatly augment a business intelligence application by including textual data into their data mining. Knowledge of customer inputs at Compaq Computer Corp are greatly utilized by combining notes gathered from Call Center telesales representatives and input into Siebel. These text notes include general comments from the customers, email messages to and from our customers or prospects, etc. Together with additional data from our customer data warehouse, it is shown that distinct clusters or groups of data exist hidden until now and SAS Data Mining Text solutions aid the process of understanding these groups. These text notes can also be used to predict the lead rating which is the likelihood to purchase as determined by the sales representative.


David Duling, SAS
Developers are often faced with the demand from marketing for data mining products to "automatically extract patterns from large databases," the famous Black Box. However, research and development usually leads to a collection of statistical tools for large, un-designed data sets. This leads us to question the myth of the Black Box. Case studies will demonstrate why original analysis is important. Meanwhile, market analyst reports predict that data mining will increasingly be packaged into solutions for the masses paradoxically encompassing more sophisticated algorithms and more widespread usage. This will require development of a new breed of analytical solutions.


Arnold Goodman, University of California, Irvine
Challenges and Checklists for Data Miners, Statisticians and the Clients Who Fund Them
Discovering new knowledge from massive amounts of complex data requires that any informative patterns mined from the data be generalized into useful models to predict situations, suggest new knowledge from the predictions, and then evaluate this knowledge in that world beyond the data.
A basic challenge is for the results to work almost all (not only some) of the time, account for uncertainty outside (not only inside) the data, and add valuable knowledge to a client’s world.

Knowledge discovery rests on the three balanced legs of computer science, statistics and client knowledge: it will not stand either on one leg or on two legs, or even on three unbalanced legs.
A maturity challenge is for data miners, statisticians and clients to recognize their dependence on each other and for all of them to widen their focus until true collaboration becomes reality.

Although far more effort is spent on processing data beyond what the data really support, far less effort is spent on planning the data selection and evaluating knowledge before actual acceptance.
An investment challenge is to balance effort spent on analysis inside the database with effort spent on analysis outside the database in the knowledge domain, difficult though it might be.

The Brain Makers and Mind Matters document artificial intelligence going from over-promising in the early 1960’s to under-performing in the early 1970’s and expert systems going from over-promising in the early 1980’s to under-performing in the early 1990’s, despite vestiges remaining now that computer technology is finally able to deliver on promises made by computer science.
The critical challenge for us all is to view the challenges as opportunities for our joint success.

To assist us all in responding to these challenges, checklists are presented for the major critical success factors in knowledge discovery, professional collaboration and knowledge evaluation.

The critical success areas in knowledge discovery are data mining for interesting patterns, data prediction with process models and data knowledge through new understanding. Although the first area might be accomplished without statistical thinking, the other two definitely cannot be.

Professional collaboration involves developing analysis and software, solving client’s problem, being a solution conscience, adding value to client world and evaluating results in client world. Designed samples, related information, contexts of data, errors in predictions and causal networks might help in the evaluation of new knowledge outside the database in knowledge domain itself.


Victor Lo, Fidelity Investments
A Novel Response Modeling Approach in Database Marketing
Data mining has been used extensively in database marketing in a variety of industries. In particular, predictive models are often developed to identify individuals who are most likely to respond to campaigns where response can be defined as becoming a customer, buying a product, deepening a relationship, increasing profitability, or a combination of the above.

In this presentation, we identify the appropriate business objective based on years of business experience and then derive the corresponding mathematical objective function. We then propose a novel approach to meeting the appropriate objective. We also point out that the current approach that is widely published in the literature and commonly used in business is not directly designed to meet the appropriate business objective. An example using simulated data is used to illustrate the significant benefit of the proposed approach over the current approach in campaign targeting. The proposed approach is easy to implement and can be used in conjunction with common supervised learning algorithms such as logistic regression, decision tree, spline regression, and neural network.


Bill MacReady, BiosGroup
Optimization: Foundations to Frontiers
Optimization is the process of discovering (usually by a computer) configurations which minimize some cost function. Applications range from scheduling (manufacturing, airlines timetables), to network flow (logistics, electricity + gas), to supply chain optimization. Optimization applications began improving businesses dramatically in the 1960's with the advent of operations research (OR) departments both in academia and industry and no quantitative field has had larger impact on improving business performance.

Through a series of concrete examples this talk will introduce optimization methods starting from it's OR foundations and continuing to the latest state of the art results in constraint programming, naturally inspired techniques (e.g. genetic algorithms, ant algorithms), and distributed multi-agent optimization. This last topic will be discussed in some detail as it promises exciting new applications.


Edward C. Malthouse, Northwestern University
Valuing Individual Customers
It is important for companies and organizations to estimate the long-term value (LTV) of its customers. This talk focuses on three key issues in making estimates LTV for individual customers:
  • How to evaluate a LTV model? We identify issues that make model evaluation difficult and discuss solutions.
  • How accurately can LTV be estimated and how does the accuracy depend on (1) the length of the period over which the forecast is made and (2) the amount of information available on a customer at the time of the forecast? We present results from multiple companies in several different industries to establish accuracy baselines. We compare the accuracy of forecasts from behavioral data versus only demographic overlays.
  • What are possible modeling approaches and what are their strengths and weaknesses? LTV can be modeled directly with some form of "regression" model. It could be modeled by first estimating retention probabilities and then estimating value conditional on being retained. It could be modeled via survival analysis models. We discuss how these models compare in terms of accuracy and marketing insights. We also discuss under what business circumstances a particular approach is preferred.

Bruce Ratner, DM STAT-1 CONSULTING
A New Method for Maximizing Customer LifeTime Value
A new method for maximizing customer life time value (LTV) - the CPR Model - is introduced, which simultaneously addresses two important objectives facing database marketers: maximizing response, and maximizing profit. The CPR model balances the two objectives, seeking a single score that identifies high-LTV responders. The CPR Model is theoretically optimal, and easy to build and validate.

The CPR Model can address many real-world problems. In the telecommunications industry, one often seeks to model customers' tenure in combination with usage - identifying people who have long tenure and high usage of services. In catalogue and retail sales, models identifying potential buyers who will not return purchased goods are useful; similarly, models that identify potential responders to mailings who are also likely to buy some specific product are often sought. In the financial services industry, models that identify customers who are likely to be approved for credit and who can also be expected not to make late payments or default on loans are valuable.

The current approach for identifying high-LTV responders consists of building a logistic regression model for identifying responsive customers, and an ordinary regression for identifying high-profit customers. Then, the two model scores are multiplied as a procedure for identifying individuals who are both most likely to respond and contribute large profit. This widely-used approach produces suboptimal results, and is cumbersome to perform and validate.

I discuss the CPR model and demonstrate its strengths in identifying high-value responders. Then, we review the logistic and ordinary regression models and demonstrate the weakness of multiplying their scores. Two real case studies are discussed to highlight the new method.


Jaideep Srivastava, University of Minnesota
Rare Event Analysis & Its Applications
'Rare events' are events that occur very infrequently - and are thus very difficult to detect. However, when they do occur, their consequences can be quite dramatic - and often in a negative sense. Examples include network intrusions and security breaches, cardiac events, credit card and other types of financial fraud, telecom circuit overloads, traffic accidents, etc. Timely detection of rare events has been of interest for quiet some time. However, techniques for it have so far been largely heuristic in nature.

Recent years have seen an explosive growth in the speed and capacity of data collection and storage devices, accompanied by a significant drop in price. This has had a multiplicative effect on our data collection ability. Most organizations today collect huge quantities of data about various processes in their computer systems, be they communication, information, process control, or any other type of systems. Data is collected at various levels of abstraction, from hardware and firmware, to operating system and communication events, to database query logs and application level events. These comprehensive event logs provide a wealth of data, analysis of which has the potential to identify the rare events described earlier.

Classical statistical techniques, which have focused on detecting the major model in a dataset, do not adequately address the problem of analyzing rare events. Often, events that represent significant deviations from the norm are thrown away as outliers. Sampling techniques often tend to completely miss out data representing rare events, precluding any analysis. Recent growth in data mining techniques, especially geared towards large number of attributes and huge data volumes, provide a new hope for analyzing rare events.

In this talk we will introduce the idea of Rare Event Analysis. Examples will be drawn from a number of domains to illustrate the challenges faced in such analysis. Existing and emerging techniques to addressing these problems will be presented, both in a supervised and unsupervised setting. The talk will conclude by pointing out issues that remain to be addressed in this challenging field.

This work is being carried out under the sponsorship of the Army High Performance Computing Research Center, which is funded by the US Army.


Gary Saarenvirta, IBM
Data Mining in Real-Time
Data Mining has evolved and become much more in the mainstream. Where only financial institutions and direct marketers used data mining five years ago, we find corporations of all sizes, in all industries using data mining to solve a wide variety of business problems today. Industry standards are being developed to permit data mining to be embedded into business applications by application programmers. Real-time deployment of data mining models have been enabled by the implementation of some of these industry standards. Data Mining is becoming a commodity and in the very near future will be a completely automated process.

This presentation explores the real-time deployment of data mining through examples and case studies and discusses future data mining developments being led by IBM research.


Andrew Storey, Scotiabank and Marc Cohen, SAS
Offer Optimization - Optimizing Cross-Sell and Up-Sell Opportunities in Banking
The banking industry regularly mounts campaigns to improve customer value by offering new products to existing customers. This approach gained momentum as a result of the increasing availability of customer data and improved analysis capabilities through data mining. Even with these improvements the problem of efficiently using resources to maximize the return on marketing investment (ROMI) is a challenge. This problem is compounded because of increased capability to send multiple campaigns through several distribution channels over multiple time periods. The combination of alternatives creates a complicated array of possible actions. This paper presents a software solution that focuses on answering the questions of what products to offer to each customer in a way that maximizes the value of contact with the customer and the ROMI. The solution goes beyond the usual greedy approach of picking the customers that have the largest expected value for a particular product because it maximized return while also accounting for limited resources and multiple sequential campaigns. Although a retail banking example is presented, the approach is transferable to numerous other industries. The developed solution uses the SAS/STAT®, SAS/OR® and base SAS® products and is operating system independent. This solution is intended for an audience with a medium skill level in SAS.

What participants say about the M-series:

"The educational content, exchange of ideas, and intellectual environment I found at the conference exceeded my expectations and confirmed SAS' place as the premier data mining conference in the world."

   Thad Perry, Ph.D.
   Senior Director
   Infomatics


~~~~~~~~~

"Right time. Right place. Right content."

   Thomas Brauch
   Vice President
   Consumer eCommerce


~~~~~~~~~

"This was a superb environment - one of the smartest conference venues I have experienced (and I have experienced a lot). The talks went into greater depth than the talks at many such meetings. Many of the talks were particularly valuable in shedding light on different application areas of data mining."

   David Hand
   Professor/Head of Statistics
   Imperial College, London