Abstracts
This page is updated weekly. Please check back frequently for the latest information. Abstracts are listed in alphabetical order by the speaker's last name.New Applications of Data Mining in the Retail Industry
Subir Bandyopadhyay, Ranjan Kini, Indiana University Northwest
In the last decade, data mining has emerged as an important business intelligence tool in many facets of retail industry, including customer segmentation, market basket analysis, customer churn, and trend analysis. There are, however, many others emerging applications of data mining. For example, retailers are trying to predict consumer trends and behaviors to gain competitive advantage. Similarly, many retailers are exploring how to mine for patterns of seemingly unrelated products that are frequently purchased together.
While the enabling technology may offer a plethora of opportunities, the retailers must decide what applications hold promise for increased profitability. Based on the feedback (obtained through one-on-one and focus group interviews) from a sample of retail managers, we will try to project future applications of data mining in the retail sector.
Successfully Implementing Predictive Analytics in Direct Marketing
John Blackwell, The Nature Conservancy (TNC)
An analysis of the benefits of modeling in direct marketing as well as a review of the dangers associated with common data mining mistakes. The presentation will review specific examples of how SAS data mining products have been successfully utilized at the one of the world's largest non-profit organizations to dramatically improve ROI.
Specifically, this session will focus on
- measuring the true value of predictive models
- preparing data for modeling and avoiding pitfalls
- building models and selecting the best for deployment
- the importance of industry expertise and questioning what the data are "saying"
- assessing the readiness of your organization in building analytics capacity in-house.
Pattern Discovery - or - How to Find a Needle in a Haystack
Richard Bolton, KnowledgeBase Marketing
The collation of large electronic databases of scientific and commercial information has led to a dramatic growth of interest in methods for discovering structures in such databases. We can characterize two different kinds of structures sought in large data sets: models and patterns. The first of these, models, are high level, global descriptive summaries of data sets. Models are often used in marketing, for example, for predicting response or conversion for a population of interest.
In contrast, patterns are local descriptive structures, that is, 'local models,' which may involve just a few points or variables. Patterns are by nature hard to find, but it is these patterns that contain the nuggets of information that many owners of these large datasets are looking for today. 'Tell us something we don't know' is the cry. We describe some simple exploratory pattern discovery techniques with reference to marketing case studies.
To Collect or Not To Collect: How AT&T Determines Which Delinquent Accounts are Worth Pursuing and Which are Not
Andy Christian, AT&T
Every day, thousands of consumers in every industry, are late in paying their bills. However, not everyone is a risk for non-payment so what's the answer? With SAS's tools, AT&T Southeast has produced all of its analytical models within the Credit & Collections arena to determine this answer. Come see how AT&T uses Clustering, Decision Trees, Logistic Regression, and Linear Optimization to determine which of these delinquent accounts are risky and which are worth the company's resources to pursue.
Is Your Model Good Enough?
Michael Conerly, Winston Choi, & J. Michael Hardin, University of Alabama
The use of Goodness of Fit tests for Logistic Regression models based on Hosmer and Lemeshow's (2000) results are widespread. For data analysis purposes, running this test to check a model's adequacy are routine and widely available in software packages such as SAS, JMP, Minitab, others? Extending this test procedure to other predictive models is straightforward. Applying this test to holdout data enables one to check the validity of a predictive model based on decision trees or neural networks or other analyses. The key to these tests is assigning the observations to groups. The authors will discuss different methods to accomplish this and show examples of the ensuing test results.
Data Analytics based Business decisions in the software industry
Sambuddha Deb, Wipro
Data Analytics forms the backbone of scientific decision making. However, in the software industry, issues ranging from data roll up to dependencies on intangible or immeasurable parameters pose challenges to quantified decision making. Moreover, focus is frequently on process and product data analysis rather than on end to end models than can drive business decisions. Wipro has applied substantial analytical rigor to meaningfully interpret the data generated by Quality systems and associated business processes. It has developed various models linking customer satisfaction, delivery and operational excellence metrics, process compliance, team experience, growth rates and operating margins. This session will demonstrate the usage of these models for planning and decision making and the approach to keep them constantly validated using organization's data.
Takeaways
- Demonstration of the interplay of software engineering and management processes through quantitative relationships
- Application of Analytical Techniques and their interpretation in the context of the business
- Process Improvement Strategy through performance management
- Identifying vital drivers for enhancing productivity and customer satisfaction
Cluster analysis for customer segmentations
Filip Deforce and Jörg Besier, Accenture
Cluster analysis is a technique that is used to group customers together in segments. In this presentation we will describe how cluster analysis can be used to find actionable results for your business.
Cluster analysis is a part of a process where we ultimately want to develop different value propositions for different segments in the customer base. This process starts with understanding the value drivers. A next step is to translate these value drivers into key performance indicators. Defining these key performance indicators will enable one to focus on dimensions that are important when segmenting the customer base.
In the cluster analysis algorithm that we will describe we will address different important issues that can occur when segmenting the customer base. A first problem that we will address is how we can make the segments distinct. That will enable you to develop distinct value propositions for different segments. A second issue that will be tackled is how we can make the segments big enough so that it makes sense from an economical point of view to develop different value propositions for every segment. A last point that we will adress is how we build customer segmentations so that customers within a segment are similar enough. This enables the company to deliver a consistent message to their customers.
A last section will deal with the possibility to develop different segmentations for different teams in the company. These teams are responsible for executing (parts of) the value propositions that are developed for the different segments. More then one segmentation is sometimes important to deliver a consistent, unambiguous and understood message to the different stakeholders that will act upon the information.
Analysis of Research Literature with Text Mining
Dr. Dursun Delen, Oklahoma State University
Text mining is a semi-automated process of extracting knowledge from a large amount of unstructured data. Given that the amount of unstructured data being generated and stored is increasing rapidly, the need for automated means to process it is also increasing. In this presentation we explain and evaluate the process and the methods used to perform text mining on collections of unstructured information. A case study is presented using text mining to identify clusters and trends of related research topics from three major journals in the management information systems field. Based on the interesting findings of this case study, it is proposed that this type of semi-automated literature analysis could potentially be invaluable for researchers in any field.
Generalized Additive Neural Network Modeling
André de Waal. North-West University, Potchefstroom, South Africa
A generalized additive neural network is a special type of neural network that can be used to estimate a generalized additive model. The basic topology for a generalized additive neural network utilizes a separate multilayer perceptron for each input variable. This presentation will describe several approaches that can be followed to arrive at a workable implementation. In particular, more detail will be given of the approach that resulted in the AutoGANN modeling node in SAS Enterprise Miner. Another aspect that will be addressed is the use of generalized additive neural networks in credit scoring.
Best Practices in Credit Risk Model Monitoring
Dina Duhon, Canadian Imperial Bank of Commerce and Wayne Thompson, SAS
In the highly regulated banking environment, credit risk model development and monitoring are at the forefront of risk management. Now more than ever, with the implementation of BASEL II, models are being put under the microscope. Paragraph 417 of the Basel II Accord states that "The burden is on the bank to satisfy its supervisor that a model or procedure has good predictive power and that regulatory capital requirements will not be distorted as a result of its use. The bank must have a regular cycle of model validation that includes monitoring of model performance and stability; review of model relationships; and testing of model outputs against outcomes." The presentation will review specific examples of how SAS Enterprise Miner and SAS Model Manager have been successfully utilized at one of the largest banks in Canada to achieve transparency and auditability of credit risk models. This session will focus on best practices in Credit Scorecard Monitoring. Many of these best practices are also
applicable to validating and monitoring predictive models in other industries.
Designing effective and efficient retention and acquisition strategies for retailers
Amit Ghosh, Cleveland State University
Faced with the problem of optimizing sales and prices, retailers are increasingly abandoning time-honored rules such as applying a fixed percentage markup onto their cost and adopting decision-support systems under the rubric of "merchandise price and promotion optimization software" to make prompt, effective and efficient decisions.
In this session, some potential drawbacks of these systems are discussed primarily from the perspective of designing successful acquisition and retention strategies. We explore whether data-mining techniques and Customer Relationship Management (CRM) principles can be used by retailers of staple merchandise (such as consumer packaged goods) to further enhance their financial performance and market domination. While CRM principles have been widely used in many other industries, they have seldom been utilized by retailers of Consumer Packaged Goods (CPG). We systematically lay out how consumer panel data (that is being increasingly collected by retailers) can be used to design efficient and effective retention as well as acquisition strategies. Scanner-based consumer panel data obtained from Information Resources Incorporated (one of the major providers of consumer panel data in the US) is used to empirically demonstrate the advantages of a CRM approach over a traditional approach. Purchase information
gathered for several panel members over two years in one CPG product category -- laundry detergents -- is used in this empirical exercise. The strategic consequences of this approach and the likely impact of our approach on a firm's ROI are also discussed.
Applying Diverse Analytics to Improve Automotive Collections
Gene Grabowski, Ford Motor Credit Company
In the highly competitive automotive industry, monthly sales figures are scrutinized to evaluate market status and provide signals for future performance. To augment sales automakers rely on financial subsidiaries to promote vehicle loans and leases. If a growing number of customers fail to meet their contractual obligations, the benefit of robust sales can be seriously eroded. Consequently, the need to limit delinquencies and contract defaults has grown substantially.
This session will provide an overview of the analytical methods Ford Credit uses to manage their Collection Operations. Moreover, it will demonstrate how diverse analytical methods can be integrated to provide concrete business solutions. Topics include use of Predictive Modeling, Seasonal Analysis, Time Series Forecasting and Optimization techniques. At each point, the discussion will emphasize how SAS plays a crucial role in incorporating disparate data sources and meeting significant analytical challenges.
Fraud!
David Hand, Imperial College London
Data mining provides a powerful set of tools for detecting and preventing fraud. I look at the application of such tools in a range of situations, including the personal banking sector.
Text Mining of Clinical Healthcare Records: The UAB/UA/SAS Partnership
J. Michael Hardin, Ph.D., Uzma Raja, Ph.D., Timothy Day, Ph.D - Univ of Alabama; Jerry Oglesby, Ph.D. - SAS Institute
In August 2005, the University of Alabama, and the University of Alabama at Birmingham's (UAB) Health System entered into a partnership with the SAS Institute to investigate the use of text mining of clinical data. The Health System Information Services department manages the Electronic Medical Record which includes many examples of free text transcribed documents. This presentation will review the history of the partnership, why UAB Health System is a suitable partner for such an effort, the challenges encountered (some of which are unique to healthcare), the results obtained to date. Suggestions on future directions and research will be discussed.
Analytics: The Next Competitive Advantage
Hari S. Hariharan, Accenture
In recent years there has been an exponential growth in the availability of information within organizations. One major reason for this growth has been the large investments that companies have made in infrastructure and tools. However the impact on the quality of decision making within organizations with the increased availability of information is still debatable. The first part of my presentation will review some of the common reasons why this happens even in large organizations.
The second part of the presentation will review several case studies where insight driven initiatives have had a dramatic impact on corporate performance. These case studies will cover companies in Financial Services, Telecom and Retail and will also cover a range of functional areas like Marketing, Enterprise Performance and Strategy. The case studies will also illustrate that insight can be driven by simple as well as complex analytical methods.
The third and concluding part of the presentation will make a case as to why it is critical for companies in today's competitive environment to be insight driven The presentation will review some o f the key factors that organizations need to have in place to successfully foster insight driven decision making and also suggest a road map for organizations to use analytics as their competitive advantage of choice.
Data Mining - Find your data diamond and improve ROI
Maria Marsala Herlihy, KnowledgeBase Marketing, Inc.
This session examines several case studies where data mining techniques were used to uncover valuable information hidden in large volumes of data that resulted in increased return on investment. These case studies cover practical and common business applications in healthcare, call center, retail, catalog and telecommunications. The audience will walk away with a clear list of examples on how analytical mining efforts can and should always drive actionable findings that tie to sometimes known and sometimes unknown business issues.
Building a Direct Marketing Machine
Brad Jordan, Blue Cross Blue Shield-Florida
Blue Cross Blue Shield of Florida (BCBSFL) has a long history of providing health-related solutions to the people of Florida. They use database marketing, direct marketing and analytics to better manage their member acquisition, cross-sell and lifecycle communications. In this presentation, Mr. Jordan will discuss how BCBSF has used a combination of predictive modeling and new campaign management tools to dramatically improve their direct marketing capabilities.
Using home price index to forecast default rates for real estate loans
Chifei Juang, HSBC
Between 1999 and 2004, high home price appreciation has lead to low default rates for real estate loans. This high appreciation trend, however, slowed down later in 2005. For some MSAs, the home price declined during 2006 and 2007. This presents a challenge to many real estate lenders that use historical data to set up underwriting policy. In this session, we will discuss how lenders can use a forward looking pricing index at the MSA level to enhance their forecast for the future default rate. Having an accurate forecast of the default rate will help lenders set appropriate loss reserve, manage underwriting criteria, and develop pricing strategies to manage profitability.
Consumer-centric Healthcare Informatics
David Kil, Accenture Technology Labs
Today's payers are too fixated on cost prediction. It is not too surprising to see member stratification criteria focused on predicted cost and ad hoc backward-looking clinical triggers. In this talk, we introduce the integrated health management framework with consumer experience optimization. The key step here is in understanding what works and what doesn't from consumer-engagement and outcomes perspectives. We explain several experiments to probe into how we can minimize consumer irritability while extracting their cooperation in improving their health and wellness. Furthermore, we will explore how we can insert contagious engagement into what has traditionally been dreary, boring, and no-pain-no-gain lifestyle management.
Data Mining Approaches to Stopping Warranty Fraud
Jay King, SAS
Over 400 years ago, Sir Edward Coke - Lord Chief Justice - complained about increasing levels of fraud. His comments are just as applicable today. The problem is that, while we have evolved better techniques, the fraudsters have been evolving as well. Today, manufacturers overpay approximately two billion dollars per year (6-15% of overall warranty costs) on questionable claims. A mere 1% reduction in costs through fraud detection and prevention could results in a 10% increase in profit margin. This presentation will discuss data mining approaches used to detect anomalous behavior of service providers used by a large US warranty provider.
Using Analytics to Manage Enrollment at Sinclair Community College
Karl Konsdorf, Sinclair Community College
Community colleges facing increased enrollment challenges are looking for ways to improve the overall educational experience for their students. This presentation will outline Sinclair's Strategic Enrollment Management (SEM) environment and how analytics are used to create a more positive experience, and the impacts of the analytics on the college. The people, processes, and technology used in building the SEM architecture will be discussed during this session.
Leveraging Superior Marketing Tools for Building a Forward-looking CRM Strategy
V. Kumar (VK), ING Center for Financial Services, School of Business, University of Connecticut
It is no secret that firms treat customers differentially. When one customer spends ten minutes on the phone navigating through automated menu options, only to find out that there are no seats available on that evening flight to Chicago, a "Gold" customer whose phone call gets picked up on the second ring by a friendly service representative is offered a window seat on that same flight. How do firms decide the customers to whom they should provide preferential treatment that clearly costs more money and resources, to which customer they should interact through inexpensive channels like the Internet or the touch tone phone, and which customer to let go? How do firms decide the timing of an offering to a customer? What kind of sales and service resources should the firm allocate to conduct future business with that customer? What should be the metric that a firm should use to base their acquisition and retention strategies? Should firms focus on investing in brands or investing in building customer
relationships? Should firms encourage customers to shop in multiple channels or is there a fear of cannibalization? Finally, can a firm identify customers who are likely to quit buying from it and further take preventive action to retain the customers?
To answer the above questions, firms do develop a measure of what they consider to be the best indicator of the total profits that a customer is likely to provide the firm and use that indicator to base their marketing decisions. This measure is called the "Customer Lifetime Value" (CLV). We present several case studies here and also look at the organizational and implementation challenges that surround the adoption of customer value management by both B2B and B2C firms. The successful implementation of CLV-based approach in these firms has resulted in increased revenues, profit and ROI.
Analytics That Matter
Daymond Ling, Canadian Imperial Bank of Commerce
Business Intelligence is a hot topic today, covering a wide gamut of activities such as reporting, OLAP, analytics, dashboards and data mining. However, like all tools and technologies, merely using BI on existing processes may not deliver the home run that one is hoping for. Organizations that want to compete on analytics need to use the new capabilities wisely in order to create value and out compete in their marketplace.
This presentation will touch on three subjects. First, how to identify the right set of problems to be solved thus creating value on a larger scale for the business. Second, how to create value for self by acquiring the right skill sets to out compete in the BI human capital marketplace. And lastly, the environment under which both can happen.
Association Rules Revisited, Or How I Learned to Love Them
Gordon Linoff, Data Miners, Inc.
Association rules, used for determining which products are purchased together, are an important part of understanding customer purchase patterns. Unfortunately, the traditional methods for calculating association rules often fail to produce useful, insightful rules. This talk presents some alternative methods for calculating the rules, going beyond the historical measures of lift, support, and confidence. In addition, it discusses additional applications of assocation rules, using heterogeneous associations, negative rules, and product properties.
Keynote Panel Discussion: Making Modeling Relevant - Integrating Modeling Insights with Strategy and Management Decision Making
Moderator: Sharat Mathur, ACG Solutions
Panelists: Murli Buluswar, Farmers Insurance; Ramin Eivaz, Kimberly-Clark
Technology, over the past decade has driven significant innovations in data availability and, in turn allowed companies to conduct more sophisticated data mining, modeling and other analytics. The primary focus of this and other academic/industry conferences is to discuss these advancements and share best practices and leading ideas with other colleagues. An equally important topic, that often does not get as much attention, is how to better integrate analytics and insights with management decision making. Many companies have recognized this and are working to make this happen. This roundtable will focus on these issues. A team of senior executives from across different industries, with a deep understanding of both analytics and strategy, will talk about lessons learned and best practices for "Making Modeling Relevant - Integrating Modeling Insights with Strategy and Management Decision Making."
Panelists: Murli Buluswar, Farmers Insurance; Ramin Eivaz, Kimberly-Clark
The discussion will focus on four central themes:
- To what extent are major corporations effectively integrating advanced analytics with management decision making? Is this done on an ad hoc or systematic basis?
- What are the issues and challenges they face in doing so?
- What could they be doing differently? What are the critical success factors and best practices?
- How could analysts and statisticians be more proactive in terms of ensuring that their research/analytics has relevance?
Audio Analysis in Action
Manya Mayes, SAS
Do you voice record customer communications? What if you could capture, transcribe and analyze large amounts of audio data and pinpoint exactly what customers are telling you? Learn how text and data mining techniques such as topic detection, segmentation, profiling, and predictive modeling can be applied to audio signals to detect and predict churn. This presentation will walk you through the process, starting with voice-to-text technology via NICE Systems and analysis with SAS Text Miner. With these solutions, there's no reason to wonder what's on your customers' minds.
Collaborative Filtering Recommendation in E-business
Ting Millette, ETS
Collaborative filtering is valuable in e-commerce, and for direct recommendations for music, movies, news etc. Collaborative filtering systems are prediction algorithms over sparse data sets of user preferences. This paper describes a technique for making personalized recommendations from any type of database to a user based on similarities between the interest profile of that user and those of other users. In particular, we discuss the implementation of a collaborative filtering system, which makes personalized recommendations for restaurants in a city.
Consumer Segmentation in the Automotive Industry
Will Neafsey, Ford Motor Company
Every day, today's automobile manufacturers strive to satisfy and delight new vehicle buyers. Thirty years ago, most manufacturers simply built a couple styles of cars and trucks. Today, in the rapidly fragmenting market of cars, trucks, SUVs, crossovers and minivans, there are close to 400 distinct nameplates sold in the U.S. to roughly 14 million retail customers. The ongoing challenge for designers and marketers is to find a robust analytic process and structure to identify and organize the varied consumers into manageable, understandable groups. In today's market, failure to understand the nuances of different groups of consumers can be a recipe for disaster.
That's where consumer segmentation comes in. Segmentation is the first step in the process of targeting and positioning your products to achieve your corporate goals. Techniques range from the most basic demographic approaches like dividing the market by demographics to complex latent class analysis (LCA) that rationalizes scores of variables that uniquely identify various groups of consumers who share common values, experiences, tastes, preferences and lifestyles. Whether you choose a simple scheme or a complex scheme, a good segmentation will consolidate and simplify the voice of the customer.
This session will focus on both the theory and practice of consumer segmentation in the automotive industry.
Predictive Modeling for Workers' Compensation: Right Claims, Right Resources
James J. Paugh, III, Deloitte Consulting LLP
Predictive modeling is a process that utilizes a number of mathematical techniques to analyze large quantities of internal and external data to unlock unknown and meaningful business relationships. For workers' compensation claims management, claim triage models can be developed that segment and identify prospective high cost claimants. The claim triage model can also be used for enhanced case reserving, fraud indications, and can improve the overall indemnity outcome when effectively implemented. Some examples are
- ensure claims management is focused on those cases that will have the biggest impact
- filter out small and catastrophic injuries; focus on moderate severity types of claims - back strains & sprains
- analyze historical claimant cost patterns as compared to industry patterns
- provide reason codes to explain scores - medically complex, fraudulent lifestyle, etc.
- enhance the claim business rules to be more future severity drivers based upon the model's underlying assessment
- using First Notice of Injury the scoring can begin at initial point of contact
- possible secondary use for litigation management.
Data Preparation for Data Mining in Health Care Using SAS
S. Greg Potts, Arkansas Foundation for Medical Care
Data mining practitioners are well aware that most of the total effort required to complete a data mining project is not spent in the "trendier" aspects of the project such as problem definition or algorithm/technique selection, application, and interpretation of the results. No, most time - up to 80% often cited - is spent "in the trenches" getting to know the data - summarizing data to the "unit of analysis" and creating derived variables to be used as targets and inputs in the modeling analysis.
This presentation will present two case studies in using SAS to extract and prepare data for data mining. The first case will explore how to prepare transactional (Medicaid claims) data for directed data mining, where the goal is to explain or predict the value(s) of a particular target variable. The second case will explore data preparation for undirected data mining (cluster analysis) using hospital-level data supplied to Medicare Quality Improvement Organizations (QIOs) in their and other collaborative organizations' efforts to reduce improper admissions and payments to inpatient acute-care hospitals under the Hospital Payment Monitoring Program (HPMP).
An Overview of Reduced Error Logistic Regression
Daniel M. Rice, Rice Analytics
Reduced Error Logistic Regression is a new form of Logistic Regression that has been developed and implemented in SAS for data mining applications. With a large enough number of variables, Reduced Error Logistic Regression returns solutions with significantly less error than existing standard Logistic Regression approaches. For example, with upwards of 25 multicollinear independent variables, a Reduced Error Logistic Regression solution with a sample size of 100 can have the same or even better reliability and validity than a standard maximum likelihood Logistic Regression solution with a sample size of 10,000 or more. Unlike standard Logistic Regression or Hierarchical Bayes Logistic Regression, Reduced Error Logistic Regression is not significantly limited by the number, multicollinearity, and nonlinearity of independent variables and/or interactions. Because of this ability to handle the dimensionality problem, Reduced Error Logistic Regression would appear to have distinct advantages over existing
Logistic Regression approaches. This talk will show examples of Reduced Error Logistic Regression data mining solutions. This talk will suggest that Reduced Error Logistic Regression could have widespread artificial intelligence applications that are well beyond the current applications of Logistic Regression.
Forecasting Box Office Success of Movies using Data Mining Techniques: Recent Results
Ramesh Sharda, Oklahoma State University
We have been experimenting with several data mining approaches including artificial neural networks and support vector machines to forecast the box office success of Hollywood movies. In our model, we convert the problem of predicting box office success into a multi-class classification problem. That is, we attempt to classify a movie project into one of the nine classes ranging from "super flop" to a "blockbuster". Our previous experiments (using data from 1998-2002) have shown that the neural networks can be used to predict a movie's success classification within one category with roughly 70% accuracy. In this presentation, we present an update on the models' performance using more recent data and other classification algorithms including such as C5, Support Vector Machines, Boosted Trees, and Random Forest. We also describe a Web-based Decision Support System that is now nearly operational to provide these predictions to users via a distributed architecture based on Web services.
Model Validity Checks For Regression And Logistic Regression
Simon Sheather and Mike Speed, Texas A&M University
An important step in any model building exercise is checking the validity of the model under consideration. In this talk we will consider methods for checking the validity of regression and logistic regression models. Techniques to be discussed include marginal model plots, in which a smooth of the fitted values against each predictor is compared with a smooth of the outcome variable against each predictor. We will show that it is important to take into account whether the outcome variable is continuous or discrete, as is the case in logistic regression. Real data sets, analyzed using SAS data mining software, will be used to illustrate these methods.
Textual Analysis of Stock Market Prediction Using Financial News Articles
Dr. Robert P. Schumaker, Iona College
This paper examines the problem of discrete stock price prediction using a synthesis of linguistic, financial and statistical techniques to create the Arizona Financial Text System, AZFinText. We approach this line of research using textual representation and statistical machine learning methods on financial news articles that are partitioned by similar industry and sector groupings. Through our research, we discovered that stocks partitioned by Sectors were most predictable in measures of Closeness, Mean Squared Error (MSE) score of 0.1954, predicted Directional Accuracy of 71.18% and a Simulated Trading return of 8.50% (compared to 5.62% for the S&P 500 index). In direct comparisons to existing market experts and quantitative mutual funds, our system's trading return of 8.50% outperformed well-known trading experts. Our system also performed well against the top 10 quantitative mutual funds of 2005, where our system would have placed fifth. When comparing AZFinText against only those quantitative
funds that monitor the same securities, AZFinText had a 2% higher return than the best performing quant fund.
Building Successful Analytics, Step by Step
Drew Thoeni, Blue Cross Blue Shield-Florida
Few organizations can sustain the investments required to build a successful analytic program without producing almost immediate returns. Mr. Thoeni will give you real-world examples of how to diagnose where your organization is on the analytic continuum and how to move to the next levels of success by adding value in every step.
Utilizing Text Mining Techniques to Identify Fall Related Injuries
Dr. Monica Chiarini Tremblay, Florida International University
This study focuses on investigating the computerized medical record, including textual progress notes, using data and text mining techniques to examine patient fall-related injuries (FRIs) in the Veterans Administration (VA) ambulatory care setting. Traditionally, health services researchers and policy makers have relied on data documented in administrative databases to study these adverse events. Unfortunately, the structure of the data (typically designed for reimbursement purposes), the validity of the data (codes assigned correctly), and the reliability of the data (consistent application of coding practices) limit the use of these administrative data for research purposes. Recognizing patterns in electronic medical records can facilitate understanding the frequency and nature of fall related injuries for the implementation of prevention programs at the VA. Relevant data from administrative sources and the electronic medical records for all veterans treated for injuries is staged into a database.
Information is then grouped to approximate an "episode of care" in order to approximating which records are to be combined to accurately illustrate a treatment cycle for a patient's injury. A GUI interface serves as a chart review to construct the "gold standard. Two approaches are taken: unsupervised learning and supervised learning. Unsupervised learning investigates clusters created from the terms (using an entropy weighting scheme) extracted from the electronic medical records to assess the potential predictive power of these textual descriptions in identifying FRIs. The supervised learning approach attempts to form improved clusters by using a dataset that includes class labels for FRI and non-FRI cases, based on an information gain weighting scheme.
Building Analytical Capability in a Global Environment
Sandeep Tyagi, Inductis, an EXL Company
The analytical needs of US firms are growing rapidly, but the supply of capable data analysts is not keeping pace. High-value-add business data analytics requires workers with a combination of business savvy, analytical skills, and knowledge of analytics technology. Increasingly, companies are looking to regions such as Eastern Europe, India, and China, who in addition to offering labor cost arbitrage are also producing engineering and math graduates at a comparable rate than the Western world.
In this talk we will discuss how companies are using global delivery models to create analytical capabilities today, as well as the models that are emerging for the future. Key elements of an analytics delivery center that will be covered today can be divided into the following main areas:
- People: What kind of talent can you hire for your analytics team? How will you find and attract the necessary talent?
- Privacy and security: How to deal with data security and customer privacy issues when your analytical team is global?
- Training and skill development: How does one build the capabilities in products and processes in a group of people on the other side of the world
- Technology: How is technology deployed to make the team productive in a secure environment?
- Engagement model: How will your offshore analytics team interact with your onshore team?
- Commercial model: When to think of a captive center versus a partner?
Applying Mathematical Optimization to Marketing and Risk Strategies in Consumer Financial Services
Robin Way, SAS
Consumer-oriented financial services firms, such as credit card issuers and retail banks, are starting to deploy mathematical optimization techniques to further improve the performance of marketing and risk strategies. Results suggest these techniques can deliver a significant and measurable increase in value when compared with business-as-usual approaches. Mathematical optimization, which maximizes the expected utility of a marketing or risk strategy while satisfying constraints and contact policies, is a compelling next-step-forward for decision scientists who are already taking advantage of predictive model scores and test-and-learn contact designs as part of their decisioning process. We will present the results of several case studies, and best practices emerging from these studies, including predictive modeling and scoring for incremental response, experimental designs for promotional test cells, and strategies to balance short-run with medium-run return on marketing and risk strategies.
In Search of Stability in Neural Network Modeling
Jeff Zeanah, Z Solutions, Inc.
One of the concerns of modeling with Artificial Neural Networks is a lack of stability in the models. Most training algorithms start with random weight assignments. Therefore by changing the random seed, different results can be obtained with the same training data. Extreme variation in the model results can lead to uncertainty in the models, or if the model is deployed substandard results.
This presentation evaluates the Neural Network procedures in the SAS Enterprise Miner product (with results transferable to other products) to develop recommendations of training techniques, options and variable selection to decrease model instability and increase confidence in the use of the technqiues.

