 |
 |
|
Speaker Abstracts
Below is a partial list of abstracts. Remaining abstracts will be posted
as they are finalized. Please visit again soon.
John Brocklebank, SAS
Analytic Applications for Web Mining: Case studies using an Application Service Provider (ASP) approach
The web offers a new world of opportunity for data analysts, but within
this opportunity lies challenges that must be overcome before traditional
statistical techniques can be applied. The transition from raw web logs to
an analytic-ready data sources is the most crucial and time consuming
aspect for the information discovery process.
The ASP model offers the opportunity to provide a suite of analytic
applications for extracting relevant information out of daily web logs and
merging them with complementary data sources to generate a complete and
accurate representation of web site activity.
This presentation will showcase analytical methods relevant to retail,
pharmaceutical, and financial companies in meeting these challenges.
Examples will include data cleansing and pre-processing methods, business
metric forecasting and goal seeking, site traffic visualization, exception
process notification, cross-sell analysis, and direct to consumer campaign
effectiveness monitoring.
Randy Collica, Compaq
Mining Textual Data from Call Center Notes
This presentation shows how one can greatly augment a business intelligence
application by including textual data into their data mining. Knowledge of
customer inputs at Compaq Computer Corp are greatly utilized by combining
notes gathered from Call Center telesales representatives and input into
Siebel. These text notes include general comments from the customers, email
messages to and from our customers or prospects, etc. Together with
additional data from our customer data warehouse, it is shown that distinct
clusters or groups of data exist hidden until now and SAS Data Mining Text
solutions aid the process of understanding these groups. These text notes
can also be used to predict the lead rating which is the likelihood to
purchase as determined by the sales representative.
David Duling, SAS
Developers are often faced with the demand from marketing for data mining
products to "automatically extract patterns from large databases," the
famous Black Box. However, research and development usually leads to a
collection of statistical tools for large, un-designed data sets. This
leads us to question the myth of the Black Box. Case studies will
demonstrate why original analysis is important. Meanwhile, market analyst
reports predict that data mining will increasingly be packaged into
solutions for the masses paradoxically encompassing more sophisticated
algorithms and more widespread usage. This will require development of a
new breed of analytical solutions.
Arnold Goodman, University of California, Irvine
Challenges and Checklists for Data Miners, Statisticians and the Clients Who Fund Them
Discovering new knowledge from massive amounts of complex data requires
that any informative patterns mined from the data be generalized into
useful models to predict situations, suggest new knowledge from the
predictions, and then evaluate this knowledge in that world beyond the
data.
A basic challenge is for the results to work almost all (not only some)
of the time, account for uncertainty outside (not only inside) the data,
and add valuable knowledge to a client’s world.
Knowledge discovery rests on the three balanced legs of computer science,
statistics and client knowledge: it will not stand either on one leg or on
two legs, or even on three unbalanced legs.
A maturity challenge is for data miners, statisticians and clients to
recognize their dependence on each other and for all of them to widen their
focus until true collaboration becomes reality.
Although far more effort is spent on processing data beyond what the data
really support, far less effort is spent on planning the data selection and
evaluating knowledge before actual acceptance.
An investment challenge is to balance effort spent on analysis inside the
database with effort spent on analysis outside the database in the
knowledge domain, difficult though it might be.
The Brain Makers and Mind Matters document artificial intelligence going
from over-promising in the early 1960’s to under-performing in the early
1970’s and expert systems going from over-promising in the early 1980’s to
under-performing in the early 1990’s, despite vestiges remaining now that
computer technology is finally able to deliver on promises made by computer
science.
The critical challenge for us all is to view the challenges as
opportunities for our joint success.
To assist us all in responding to these challenges, checklists are
presented for the major critical success factors in knowledge discovery,
professional collaboration and knowledge evaluation.
The critical success areas in knowledge discovery are data mining for
interesting patterns, data prediction with process models and data
knowledge through new understanding. Although the first area might be
accomplished without statistical thinking, the other two definitely cannot
be.
Professional collaboration involves developing analysis and software,
solving client’s problem, being a solution conscience, adding value to
client world and evaluating results in client world. Designed samples,
related information, contexts of data, errors in predictions and causal
networks might help in the evaluation of new knowledge outside the database
in knowledge domain itself.
Victor Lo, Fidelity Investments
A Novel Response Modeling Approach in Database Marketing
Data mining has been used extensively in database marketing in a variety of
industries. In particular, predictive models are often developed to
identify individuals who are most likely to respond to campaigns where
response can be defined as becoming a customer, buying a product, deepening
a relationship, increasing profitability, or a combination of the above.
In this presentation, we identify the appropriate business objective based
on years of business experience and then derive the corresponding
mathematical objective function. We then propose a novel approach to
meeting the appropriate objective. We also point out that the current
approach that is widely published in the literature and commonly used in
business is not directly designed to meet the appropriate business
objective. An example using simulated data is used to illustrate the
significant benefit of the proposed approach over the current approach in
campaign targeting. The proposed approach is easy to implement and can be
used in conjunction with common supervised learning algorithms such as
logistic regression, decision tree, spline regression, and neural network.
Bill MacReady, BiosGroup
Optimization: Foundations to Frontiers
Optimization is the process of discovering (usually by a computer)
configurations which minimize some cost function. Applications range from
scheduling (manufacturing, airlines timetables), to network flow
(logistics, electricity + gas), to supply chain optimization. Optimization
applications began improving businesses dramatically in the 1960's with the
advent of operations research (OR) departments both in academia and
industry and no quantitative field has had larger impact on improving
business performance.
Through a series of concrete examples this talk will introduce optimization
methods starting from it's OR foundations and continuing to the latest
state of the art results in constraint programming, naturally inspired
techniques (e.g. genetic algorithms, ant algorithms), and distributed
multi-agent optimization. This last topic will be discussed in some detail
as it promises exciting new applications.
Edward C. Malthouse, Northwestern University
Valuing Individual Customers
It is important for companies and organizations to estimate the long-term
value (LTV) of its customers. This talk focuses on three key issues in
making estimates LTV for individual customers:
- How to evaluate a LTV model? We identify issues that make model
evaluation difficult and discuss solutions.
- How accurately can LTV be estimated and how does the accuracy depend on
(1) the length of the period over which the forecast is made and (2) the
amount of information available on a customer at the time of the forecast?
We present results from multiple companies in several different industries
to establish accuracy baselines. We compare the accuracy of forecasts from
behavioral data versus only demographic overlays.
- What are possible modeling approaches and what are their strengths and
weaknesses? LTV can be modeled directly with some form of "regression"
model. It could be modeled by first estimating retention probabilities and
then estimating value conditional on being retained. It could be modeled
via survival analysis models. We discuss how these models compare in terms
of accuracy and marketing insights. We also discuss under what business
circumstances a particular approach is preferred.
Bruce Ratner, DM STAT-1 CONSULTING
A New Method for Maximizing Customer LifeTime Value
A new method for maximizing customer life time value (LTV) - the CPR Model -
is introduced, which simultaneously addresses two important objectives
facing database marketers: maximizing response, and maximizing profit. The
CPR model balances the two objectives, seeking a single score that
identifies high-LTV responders. The CPR Model is theoretically optimal, and
easy to build and validate.
The CPR Model can address many real-world problems. In the
telecommunications industry, one often seeks to model customers' tenure in
combination with usage - identifying people who have long tenure and high
usage of services. In catalogue and retail sales, models identifying
potential buyers who will not return purchased goods are useful; similarly,
models that identify potential responders to mailings who are also likely
to buy some specific product are often sought. In the financial services
industry, models that identify customers who are likely to be approved for
credit and who can also be expected not to make late payments or default on
loans are valuable.
The current approach for identifying high-LTV responders consists of
building a logistic regression model for identifying responsive customers,
and an ordinary regression for identifying high-profit customers. Then, the
two model scores are multiplied as a procedure for identifying individuals
who are both most likely to respond and contribute large profit. This
widely-used approach produces suboptimal results, and is cumbersome to
perform and validate.
I discuss the CPR model and demonstrate its strengths in identifying
high-value responders. Then, we review the logistic and ordinary regression
models and demonstrate the weakness of multiplying their scores. Two real
case studies are discussed to highlight the new method.
Jaideep Srivastava, University of Minnesota
Rare Event Analysis & Its Applications
'Rare events' are events that occur very infrequently - and are thus very
difficult to detect. However, when they do occur, their consequences can be
quite dramatic - and often in a negative sense. Examples include network
intrusions and security breaches, cardiac events, credit card and other
types of financial fraud, telecom circuit overloads, traffic accidents,
etc. Timely detection of rare events has been of interest for quiet some
time. However, techniques for it have so far been largely heuristic in
nature.
Recent years have seen an explosive growth in the speed and capacity of
data collection and storage devices, accompanied by a significant drop in
price. This has had a multiplicative effect on our data collection ability.
Most organizations today collect huge quantities of data about various
processes in their computer systems, be they communication, information,
process control, or any other type of systems. Data is collected at various
levels of abstraction, from hardware and firmware, to operating system and
communication events, to database query logs and application level events.
These comprehensive event logs provide a wealth of data, analysis of which
has the potential to identify the rare events described earlier.
Classical statistical techniques, which have focused on detecting the major
model in a dataset, do not adequately address the problem of analyzing rare
events. Often, events that represent significant deviations from the norm
are thrown away as outliers. Sampling techniques often tend to completely
miss out data representing rare events, precluding any analysis. Recent
growth in data mining techniques, especially geared towards large number of
attributes and huge data volumes, provide a new hope for analyzing rare
events.
In this talk we will introduce the idea of Rare Event Analysis. Examples
will be drawn from a number of domains to illustrate the challenges faced
in such analysis. Existing and emerging techniques to addressing these
problems will be presented, both in a supervised and unsupervised setting.
The talk will conclude by pointing out issues that remain to be addressed
in this challenging field.
This work is being carried out under the sponsorship of the
Army High Performance Computing Research
Center, which is funded by the US Army.
Gary Saarenvirta, IBM
Data Mining in Real-Time
Data Mining has evolved and become much more in the mainstream. Where only
financial institutions and direct marketers used data mining five years
ago, we find corporations of all sizes, in all industries using data mining
to solve a wide variety of business problems today. Industry standards are
being developed to permit data mining to be embedded into business
applications by application programmers. Real-time deployment of data
mining models have been enabled by the implementation of some of these
industry standards. Data Mining is becoming a commodity and in the very
near future will be a completely automated process.
This presentation explores the real-time deployment of data mining through
examples and case studies and discusses future data mining developments
being led by IBM research.
Andrew Storey, Scotiabank and Marc Cohen, SAS
Offer Optimization - Optimizing Cross-Sell and Up-Sell Opportunities in Banking
The banking industry regularly mounts campaigns to improve customer value
by offering new products to existing customers. This approach gained
momentum as a result of the increasing availability of customer data and
improved analysis capabilities through data mining. Even with these
improvements the problem of efficiently using resources to maximize the
return on marketing investment (ROMI) is a challenge. This problem is
compounded because of increased capability to send multiple campaigns
through several distribution channels over multiple time periods. The
combination of alternatives creates a complicated array of possible
actions. This paper presents a software solution that focuses on answering
the questions of what products to offer to each customer in a way that
maximizes the value of contact with the customer and the ROMI. The solution
goes beyond the usual greedy approach of picking the customers that have
the largest expected value for a particular product because it maximized
return while also accounting for limited resources and multiple sequential
campaigns. Although a retail banking example is presented, the approach is
transferable to numerous other industries. The developed solution uses the
SAS/STAT®, SAS/OR® and base SAS® products and is operating system
independent. This solution is intended for an audience with a medium skill
level in SAS.
|
 |
What participants say about the M-series:
"The educational content, exchange of ideas, and intellectual environment
I found at the conference exceeded my expectations and confirmed SAS'
place as the premier data mining conference in the world."
~~~~~~~~~
"Right time. Right place. Right content."
~~~~~~~~~
"This was a superb environment - one of the smartest conference venues I
have experienced (and I have experienced a lot). The talks went into
greater depth than the talks at many such meetings. Many of the talks were
particularly valuable in shedding light on different application areas of
data mining."
|
|