|
 |

|
| |
The following abstracts have been provided:
David Banks, Duke University
"Combinatorial Search in Fitting Complex Models"
Modern computational statistics often requires extensive
combinatorial search. But search is expensive, not just in terms of time,
but also in terms of the penalty one pays for multiple testing associated
with model selection. Therefore it is important for statistical procedures
to search in the most efficient ways possible. This paper describes several
methods for smart combinatorial search, such as Gray codes and fractional
factorial experiments, and shows how they can apply to problems in
multivariate regression, cluster analysis, and multidimensional scaling.
Joe Bartling - H&R Block
Brett Long (co-author)
Lifetime Value Methodology at H&R Block
We present and discuss the methodology developed at H&R
Block to measure the Lifetime Value of individual clients in the tax
preparation industry. One unique aspect of this industry is that the client
has one interaction with the firm per calendar year. We discuss in detail
the individual elements that go into calculating lifetime value; revenue,
cost and client longevity. In addition, we discuss the two approaches of
Lifetime value measurement adopted at H&R Block. The two methods
involve partially and fully burdening clients with H&R Block costs. We
then discuss the appropriate uses for each
Robert Berry, Central Michigan University
University and Corporate Business Intelligence Research Partnerships: Critical Success Factors and Lessons Learned
The Central Michigan University Research Corporation (CMURC) was
established just one year ago as a not-for profit subsidiary of CMU. The
CMURC is a separate independent corporation whose mission is to create a
business and research environment where business, academia and technology
come together to solve real business problems.
The CMURC has assisted many companies, including The Dow Chemical Company,
Dow Corning, EDS, and International Paper in understanding how BI can add
value to them and how to get started though it's business insight services.
The CMURC defines business intelligence broadly to include data mining,
text mining, analytics, spatial analysis, and predictive analytics. Many of
these techniques have been successfully utilized in the business
intelligence projects performed in the past year.
The CMURC has developed a business model and engagement methodology that
allows us to leverage our technology partners, IBM, SAS, and ESRI. CMURC
and its corporate partners have created an industry-driven, research
environment where the full intellectual potential of the university is
leveraged to find innovative business solutions. This approach has
strengthened the faculty research and teaching components of the university
and has strengthened corporate relationships.
The CMURC business model, engagement process, the critical success factors
and the lessons we have learned from working with our corporate and
academic partners will be presented.
Hamparsum Bozdogan, The University of Tennessee
Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms
In many real-life applications of strategic decision making, large
numbers of variables need to be simultaneously considered to build an
operating model for a given data set. In such cases, it is desirable to
determine which subset of variables has an effect on a particular business
problem under consideration. In this talk, we develop a computationally
feasible intelligent data mining and knowledge discovery technique that
addresses currently existing potentially daunting statistical and
combinatorial problem to make just-in-time strategic decisions. The new
proposed approach integrates novel statistical modeling procedures based on
Bozdogan's information-theoretic measure of complexity (ICOMP) criterion
with the genetic algorithm (GA) to select the optimal subset of variables.
The data warehouse InfoCubes allow the aggregation of data with a minimum
loss of information to provide rapid availability of data. Within this
setting, ICOMP allows the identification of all the best fitting models
from a very large portfolio of model landscape. This permits bypassing the
hypothesis testing needed in traditional statistical procedures. The
genetic algorithm enables the rapid computation of models that would
otherwise be impossible in a reasonable amount of time. As a result, it is
now feasible to automatically and dynamically develop best fitting models
with many different combinations of variables to respond to rapidly
changing different application environments. We will demonstrate the
approach using real data sets in:
- Multiple Regression
- Logistic Regression, and
- Ordinal Logistic Regression Models.
We further will discuss other potential applications of this new approach
to Bayesian highest predictive regression models, multivariate
kernel-mixture model cluster analysis, model selection under
misspecification, and in multivariate dynamic linear models known as vector
autoregressive (VAR) models, to mention a few.
The computation of choosing the best operating model will be demonstrated
live on a laptop computer.
Bruce Carroll, Acxiom
Mr Carroll will explain how Acxiom used a combination of art, data and
statistics to create Personicx, a household level segmentation/cluster
system that has had considerable commercial success amongst banks,
insurance and automotive companies since its launch almost a year ago.
Mr. Carroll will explain how and why approximately 30% of households get a
new cluster assignment every year and why and how marketers are taking
advantage of these changes for real time marketing execution and
communication. As well as discussing some of the traditional applications
of segmentation in response optimization, cross-sell, upsell, etc, he will
also show how Personicx is helping traditional data mining technologies be
used for conceptualizing and visualizing a company's strategic business
imperatives.
Patricia Cerrito, University of Louisville
Data and Text Mining to Investigate Health Outcomes
Data mining can be used to examine the relationship between physician
practice and patient outcomes while accounting for variability in patient
risk and examining variability in physician practice. For example, changes
in physician prescribing patterns are easily visualized using data mining
tools. Associations can be used to target interactions and/or outcomes for
additional examinations. Patterns of prescribing can be tracked using
visualization techniques. Once the prescribing patterns are known
educational materials can be developed to support necessary changes in the
prescribing patterns, change buying habits to reduce the cost of goods, and
provide the patient with the most effective, cost saving medications.
One of the more recent tools of data mining is that of text mining. It can
be used to examine unstructured notes that are often contained within
patient charts. It can also be used to examine diagnosis codes that are
used for billing purposes. Another use of the text mining is to compare
different sources of medical information to determine optimal sources to
examine so that physicians (and consumers) can keep current with practice
guidelines.
Various data mining tools will be illustrated with examples from medical
data. In particular, the relationship between practice and outcome will be
examined, so that best practices can be identified that will result in the
most optimal outcomes. In particular, the use of text mining will be
emphasized.
Randy Collica, Hewlett-Packard
Data Mining Galore: For Business Applications
This presentation shows how one can greatly augment a business intelligence
application by including textual data into their data mining. Knowledge of
customer inputs at HP are greatly utilized by combining notes gathered from
Call Center telesales representatives and input into Siebel. These text
notes include general comments from the customers, email messages to and
from our customers or prospects, etc. Together with additional data from
our customer data warehouse, it is shown that distinct clusters or groups
of data exist hidden until now and SAS Data Mining Text solutions aid the
process of understanding these groups. These text notes can also be used to
predict the lead rating which is the likelihood to purchase as determined
by the sales representative. Other business applications such as part
number re-classification and customer survey and free form text will also
be covered.
James Cox, SAS
The Text Mining Challenge: Mining Ill-formed and Noisy Textual Documents
An ideal situation for text mining is to have a set of documents, each
split into separate files, that are between 100 and 2000 characters long,
mixed case, with well-formatted sentences that have been proofread for
misspellings and consistent verbiage. Alas, few document collections in
the "real" world meet that specification. What do you do when any (or all)
of these criteria aren't met? This presentation will focus on two such
collections. The first is an insurance data collection that uses mostly
short 10-150 character documents, all upper case, with terse
non-grammatical descriptions. A second set uses conference proceedings
from the 2001 KDD conference; in which there is one PDF file that contains
all the papers concatenated together. The papers vary in length from 10000
characters to 100000 characters. A variety of techniques are shown for
both sources to mine them effectively, using SAS' Text Miner product.
Richard Hales, IBM
The SAS and IBM Mining Relationship, Enabling Enterprise Solutions
SAS and IBM are collaborating in development of integrated data mining
products that allow models created using SAS Enterprise Miner to be applied
throughout the enterprise by IBM's DB2 database. This session will
overview the DB2 capabilities that integrate with SAS Enterprise Miner and
apply the models. It will also cover additional database mining services
that are made available due to the collaboration. How this technology
enables solutions and takes mining closer to the customer touchpoint will
be shown using "real world" examples.
Will Neafsey, Ford Motor Company
Part of an on-going effort to best serve automotive consumer is discovering
new channels for the "viscerally" understanding consumers. One of these
channels, originating from a group chartered with incubating new business
opportunities, is open-form communication ("text").
Open-form communication from consumers presents large challenges and larger
benefits. The most common occurrence of open-form communication is when it
is received as a response to a posed question or an invitation for comment.
In many cases, companies do not have the time or resources to read and
condense this material. In the instance where a question was asked, most
companies will use either a human or a technology-assisted keyword search
to quantify the responses. Using SAS technologies as well as a process
developed at Ford, we were able to glean a broader picture of the consumer
mindset than the simple literal response.
As a result of three years of work and experimentation side by side with
analytics professionals at SAS, we developed an expertise that allows us to
flow information from text into practical business insight to either
reinforce existing consumer opportunities or give directional guidance for
new opportunities.
Timothy Rey, Dow Chemical
Data Mining Education at Dow
Historically Dow has been quite involved in various "structured" modeling
technologies inclusive of development, education and use of such methods
as: traditional Operations Research in Supply Chain; Process Modeling and
Simulation in Manufacturing inclusive very early adoption to Neural
Networks; Nonlinear simulation, estimation and optimization approaches in
Reaction/Pharmaco kinetics, 3D fluid dynamics, and traditional DOE and
response surface modeling in R&D; Hierarchical Path Modeling, Conjoint,
Time Series modeling etc. in Marketing/Marketing Research and Sales, as
well as traditional ANOVA/Linear modeling in Agricultural Science as well
as Health and Environmental Sciences. Adding data mining technologies and
processes to this tool kit is extremely useful. In keeping with the
previous technologies, Dow has put together a 3-prong approach to help
establish data mining as yet another value added approach to solve business
problems. This talk will review the approach and share learning's.
Olivia Parr-Rud, OLIVIAGroup
Unleashing the Power of Lifetime Value to Boost ROI
Predictive models for response, risk and retention have created a clear
competitive advantage for many companies. However, using these models in
isolation may be suboptimal. The best approach is to combine these
components to go after the end goal, Lifetime Value. This presentation
details the steps for building an acquisition model that optimizes lifetime
value for direct mail insurance. By combining model scores and business
rules, the combined measure guarantees the highest long-term profitability.
Shashi Shekhar, University of Minnesota
Mining colocation patterns from spatial datasets
The importance of spatial data mining is growing with the increasing
incidence and importance of large geo-spatial datasets such as maps,
repositories of remote-sensing images, and the decennial census.
Applications include M(obile)-commerce industry (location-based services),
NASA (studying the climatological effects of El Nino, land-use
classification and global change using satellite imagery), National
Institute of Health (predicting the spread of disease), National Imagery
and Mapping Agency (creating high resolution three-dimensional maps from
satellite imagery), National Institute of Justice (finding crime hot
spots), and transportation agencies (detecting local instability in
traffic). However, classical data mining techniques are often inadequate
for spatial data mining and different techniques need to be developed. The
talk illustrates this point in context of co-location patterns mining over
spatial datasets.
Given a collection of boolean spatial features, the co-location rule
discovery process finds the subsets of features whose instances are
frequently located together in geographic space. For example, symbiotic
plant species and predator-prey animal species are likely co-locations in
Ecology datasets. The co-location rule discovery problem is different from
the association rule discovery problem. Even though the boolean spatial
features may be considered as item types, there is no natural notion of
transactions. Transactioning spatial datasets can lead to incorrect
estimation of the interest measures for many spatial co-location patterns
with instances near transaction boundaries. This makes it difficult to use
traditional interest measures, e.g. support, and traditional association
rule mining algorithms, which are based on ideas like support based pruning
and compression of transaction data.
Proposed approach formalizes the notion of co-locations using
user-specified spatial neighborhoods in place of transactions. It defines
new interest measures based on the neighborhoods along with a model for
interpreting the co-location rules. It provides a simple, correct, and
complete algorithm for mining co-location rules. In addition, it proposes
to advance the development of co-location mining by addressing three basic
issues, namely, scalability, ascertaining quality of inferred patterns, and
discovery of high confidence low support rules.
David Speights
Predictive Models in Mortgage Banking
Statistical models are widely used in mortgage banking for a variety of
applications. An introduction to mortgage banking is provided along with
an overview of several problems which can be addressed with predictive
models. Particular attention is given to the interaction of mortgage
banking with the economy and the various modeling techniques which can
address these issues. Some common simple modeling approaches are presented
along with some more advanced modeling techniques.
Andreas Weigend, Amazon.com
Analyzing Customer Behavior at Amazon.com
Amazon.com, perhaps the world's largest laboratory to study human behavior
and decision making, uses data, measurement and modeling as central
ingredients in its business decisions.
The first part of the talk gives an overview of the different kinds of data
available at Amazon.com, emphasizing that data mining needs to drive
actions such as emails, coupons, and recommendations of products, stores,
or site features. The scope of the actions ranges from the individual
customer, over pre-computed customer segments, to the entire customer base.
The second part presents joint work with Bruce D'Ambrosio (Cleverset, Inc.)
on relational probabilistic models for customer behavior, both for
discovering static customer attributes, and for dynamically predicting the
intention of the customer and the outcome of a session.
The third part outlines current research problems, such as modeling and
eventually influencing the long-term behavior of customers. In addition to
the importance of machine learning, it shows the central role principles of
behavioral economics, judgment and decision making play in computational
marketing.
|
|


|