speakers abstracts logistics registration info training agenda sponsors corporate overview sas home M2002 RECAP m2002 home
M2003



  The following abstracts have been provided:

David Banks, Duke University
"Combinatorial Search in Fitting Complex Models"
Modern computational statistics often requires extensive combinatorial search. But search is expensive, not just in terms of time, but also in terms of the penalty one pays for multiple testing associated with model selection. Therefore it is important for statistical procedures to search in the most efficient ways possible. This paper describes several methods for smart combinatorial search, such as Gray codes and fractional factorial experiments, and shows how they can apply to problems in multivariate regression, cluster analysis, and multidimensional scaling.


Joe Bartling - H&R Block
Brett Long (co-author)

Lifetime Value Methodology at H&R Block
We present and discuss the methodology developed at H&R Block to measure the Lifetime Value of individual clients in the tax preparation industry. One unique aspect of this industry is that the client has one interaction with the firm per calendar year. We discuss in detail the individual elements that go into calculating lifetime value; revenue, cost and client longevity. In addition, we discuss the two approaches of Lifetime value measurement adopted at H&R Block. The two methods involve partially and fully burdening clients with H&R Block costs. We then discuss the appropriate uses for each


Robert Berry, Central Michigan University
University and Corporate Business Intelligence Research Partnerships: Critical Success Factors and Lessons Learned
The Central Michigan University Research Corporation (CMURC) was established just one year ago as a not-for profit subsidiary of CMU. The CMURC is a separate independent corporation whose mission is to create a business and research environment where business, academia and technology come together to solve real business problems.

The CMURC has assisted many companies, including The Dow Chemical Company, Dow Corning, EDS, and International Paper in understanding how BI can add value to them and how to get started though it's business insight services.

The CMURC defines business intelligence broadly to include data mining, text mining, analytics, spatial analysis, and predictive analytics. Many of these techniques have been successfully utilized in the business intelligence projects performed in the past year.

The CMURC has developed a business model and engagement methodology that allows us to leverage our technology partners, IBM, SAS, and ESRI. CMURC and its corporate partners have created an industry-driven, research environment where the full intellectual potential of the university is leveraged to find innovative business solutions. This approach has strengthened the faculty research and teaching components of the university and has strengthened corporate relationships.

The CMURC business model, engagement process, the critical success factors and the lessons we have learned from working with our corporate and academic partners will be presented.


Hamparsum Bozdogan, The University of Tennessee
Intelligent Statistical Data Mining with Information Complexity and Genetic Algorithms
In many real-life applications of strategic decision making, large numbers of variables need to be simultaneously considered to build an operating model for a given data set. In such cases, it is desirable to determine which subset of variables has an effect on a particular business problem under consideration. In this talk, we develop a computationally feasible intelligent data mining and knowledge discovery technique that addresses currently existing potentially daunting statistical and combinatorial problem to make just-in-time strategic decisions. The new proposed approach integrates novel statistical modeling procedures based on Bozdogan's information-theoretic measure of complexity (ICOMP) criterion with the genetic algorithm (GA) to select the optimal subset of variables. The data warehouse InfoCubes allow the aggregation of data with a minimum loss of information to provide rapid availability of data. Within this setting, ICOMP allows the identification of all the best fitting models from a very large portfolio of model landscape. This permits bypassing the hypothesis testing needed in traditional statistical procedures. The genetic algorithm enables the rapid computation of models that would otherwise be impossible in a reasonable amount of time. As a result, it is now feasible to automatically and dynamically develop best fitting models with many different combinations of variables to respond to rapidly changing different application environments. We will demonstrate the approach using real data sets in:
  • Multiple Regression
  • Logistic Regression, and
  • Ordinal Logistic Regression Models.
We further will discuss other potential applications of this new approach to Bayesian highest predictive regression models, multivariate kernel-mixture model cluster analysis, model selection under misspecification, and in multivariate dynamic linear models known as vector autoregressive (VAR) models, to mention a few.

The computation of choosing the best operating model will be demonstrated live on a laptop computer.


Bruce Carroll, Acxiom
Mr Carroll will explain how Acxiom used a combination of art, data and statistics to create Personicx, a household level segmentation/cluster system that has had considerable commercial success amongst banks, insurance and automotive companies since its launch almost a year ago.

Mr. Carroll will explain how and why approximately 30% of households get a new cluster assignment every year and why and how marketers are taking advantage of these changes for real time marketing execution and communication. As well as discussing some of the traditional applications of segmentation in response optimization, cross-sell, upsell, etc, he will also show how Personicx is helping traditional data mining technologies be used for conceptualizing and visualizing a company's strategic business imperatives.


Patricia Cerrito, University of Louisville
Data and Text Mining to Investigate Health Outcomes
Data mining can be used to examine the relationship between physician practice and patient outcomes while accounting for variability in patient risk and examining variability in physician practice. For example, changes in physician prescribing patterns are easily visualized using data mining tools. Associations can be used to target interactions and/or outcomes for additional examinations. Patterns of prescribing can be tracked using visualization techniques. Once the prescribing patterns are known educational materials can be developed to support necessary changes in the prescribing patterns, change buying habits to reduce the cost of goods, and provide the patient with the most effective, cost saving medications.

One of the more recent tools of data mining is that of text mining. It can be used to examine unstructured notes that are often contained within patient charts. It can also be used to examine diagnosis codes that are used for billing purposes. Another use of the text mining is to compare different sources of medical information to determine optimal sources to examine so that physicians (and consumers) can keep current with practice guidelines.

Various data mining tools will be illustrated with examples from medical data. In particular, the relationship between practice and outcome will be examined, so that best practices can be identified that will result in the most optimal outcomes. In particular, the use of text mining will be emphasized.


Randy Collica, Hewlett-Packard
Data Mining Galore: For Business Applications
This presentation shows how one can greatly augment a business intelligence application by including textual data into their data mining. Knowledge of customer inputs at HP are greatly utilized by combining notes gathered from Call Center telesales representatives and input into Siebel. These text notes include general comments from the customers, email messages to and from our customers or prospects, etc. Together with additional data from our customer data warehouse, it is shown that distinct clusters or groups of data exist hidden until now and SAS Data Mining Text solutions aid the process of understanding these groups. These text notes can also be used to predict the lead rating which is the likelihood to purchase as determined by the sales representative. Other business applications such as part number re-classification and customer survey and free form text will also be covered.


James Cox, SAS
The Text Mining Challenge: Mining Ill-formed and Noisy Textual Documents
An ideal situation for text mining is to have a set of documents, each split into separate files, that are between 100 and 2000 characters long, mixed case, with well-formatted sentences that have been proofread for misspellings and consistent verbiage. Alas, few document collections in the "real" world meet that specification. What do you do when any (or all) of these criteria aren't met? This presentation will focus on two such collections. The first is an insurance data collection that uses mostly short 10-150 character documents, all upper case, with terse non-grammatical descriptions. A second set uses conference proceedings from the 2001 KDD conference; in which there is one PDF file that contains all the papers concatenated together. The papers vary in length from 10000 characters to 100000 characters. A variety of techniques are shown for both sources to mine them effectively, using SAS' Text Miner product.


Richard Hales, IBM
The SAS and IBM Mining Relationship, Enabling Enterprise Solutions
SAS and IBM are collaborating in development of integrated data mining products that allow models created using SAS Enterprise Miner to be applied throughout the enterprise by IBM's DB2 database. This session will overview the DB2 capabilities that integrate with SAS Enterprise Miner and apply the models. It will also cover additional database mining services that are made available due to the collaboration. How this technology enables solutions and takes mining closer to the customer touchpoint will be shown using "real world" examples.


Will Neafsey, Ford Motor Company
Part of an on-going effort to best serve automotive consumer is discovering new channels for the "viscerally" understanding consumers. One of these channels, originating from a group chartered with incubating new business opportunities, is open-form communication ("text").

Open-form communication from consumers presents large challenges and larger benefits. The most common occurrence of open-form communication is when it is received as a response to a posed question or an invitation for comment. In many cases, companies do not have the time or resources to read and condense this material. In the instance where a question was asked, most companies will use either a human or a technology-assisted keyword search to quantify the responses. Using SAS technologies as well as a process developed at Ford, we were able to glean a broader picture of the consumer mindset than the simple literal response.

As a result of three years of work and experimentation side by side with analytics professionals at SAS, we developed an expertise that allows us to flow information from text into practical business insight to either reinforce existing consumer opportunities or give directional guidance for new opportunities.


Timothy Rey, Dow Chemical
Data Mining Education at Dow
Historically Dow has been quite involved in various "structured" modeling technologies inclusive of development, education and use of such methods as: traditional Operations Research in Supply Chain; Process Modeling and Simulation in Manufacturing inclusive very early adoption to Neural Networks; Nonlinear simulation, estimation and optimization approaches in Reaction/Pharmaco kinetics, 3D fluid dynamics, and traditional DOE and response surface modeling in R&D; Hierarchical Path Modeling, Conjoint, Time Series modeling etc. in Marketing/Marketing Research and Sales, as well as traditional ANOVA/Linear modeling in Agricultural Science as well as Health and Environmental Sciences. Adding data mining technologies and processes to this tool kit is extremely useful. In keeping with the previous technologies, Dow has put together a 3-prong approach to help establish data mining as yet another value added approach to solve business problems. This talk will review the approach and share learning's.


Olivia Parr-Rud, OLIVIAGroup
Unleashing the Power of Lifetime Value to Boost ROI
Predictive models for response, risk and retention have created a clear competitive advantage for many companies. However, using these models in isolation may be suboptimal. The best approach is to combine these components to go after the end goal, Lifetime Value. This presentation details the steps for building an acquisition model that optimizes lifetime value for direct mail insurance. By combining model scores and business rules, the combined measure guarantees the highest long-term profitability.


Shashi Shekhar, University of Minnesota
Mining colocation patterns from spatial datasets
The importance of spatial data mining is growing with the increasing incidence and importance of large geo-spatial datasets such as maps, repositories of remote-sensing images, and the decennial census. Applications include M(obile)-commerce industry (location-based services), NASA (studying the climatological effects of El Nino, land-use classification and global change using satellite imagery), National Institute of Health (predicting the spread of disease), National Imagery and Mapping Agency (creating high resolution three-dimensional maps from satellite imagery), National Institute of Justice (finding crime hot spots), and transportation agencies (detecting local instability in traffic). However, classical data mining techniques are often inadequate for spatial data mining and different techniques need to be developed. The talk illustrates this point in context of co-location patterns mining over spatial datasets.

Given a collection of boolean spatial features, the co-location rule discovery process finds the subsets of features whose instances are frequently located together in geographic space. For example, symbiotic plant species and predator-prey animal species are likely co-locations in Ecology datasets. The co-location rule discovery problem is different from the association rule discovery problem. Even though the boolean spatial features may be considered as item types, there is no natural notion of transactions. Transactioning spatial datasets can lead to incorrect estimation of the interest measures for many spatial co-location patterns with instances near transaction boundaries. This makes it difficult to use traditional interest measures, e.g. support, and traditional association rule mining algorithms, which are based on ideas like support based pruning and compression of transaction data.

Proposed approach formalizes the notion of co-locations using user-specified spatial neighborhoods in place of transactions. It defines new interest measures based on the neighborhoods along with a model for interpreting the co-location rules. It provides a simple, correct, and complete algorithm for mining co-location rules. In addition, it proposes to advance the development of co-location mining by addressing three basic issues, namely, scalability, ascertaining quality of inferred patterns, and discovery of high confidence low support rules.


David Speights
Predictive Models in Mortgage Banking
Statistical models are widely used in mortgage banking for a variety of applications. An introduction to mortgage banking is provided along with an overview of several problems which can be addressed with predictive models. Particular attention is given to the interaction of mortgage banking with the economy and the various modeling techniques which can address these issues. Some common simple modeling approaches are presented along with some more advanced modeling techniques.


Andreas Weigend, Amazon.com
Analyzing Customer Behavior at Amazon.com
Amazon.com, perhaps the world's largest laboratory to study human behavior and decision making, uses data, measurement and modeling as central ingredients in its business decisions.

The first part of the talk gives an overview of the different kinds of data available at Amazon.com, emphasizing that data mining needs to drive actions such as emails, coupons, and recommendations of products, stores, or site features. The scope of the actions ranges from the individual customer, over pre-computed customer segments, to the entire customer base.

The second part presents joint work with Bruce D'Ambrosio (Cleverset, Inc.) on relational probabilistic models for customer behavior, both for discovering static customer attributes, and for dynamically predicting the intention of the customer and the outcome of a session.

The third part outlines current research problems, such as modeling and eventually influencing the long-term behavior of customers. In addition to the importance of machine learning, it shows the central role principles of behavioral economics, judgment and decision making play in computational marketing.




Search | Contact Us | Terms of Use & Legal Information | Privacy Statement
Copyright © 2003 SAS Institute Inc. All Rights Reserved.

 

Latest News


What participants say about the M-Series.

FIND OUT MORE