• home icon
  • Phone icon
  • Email icon
  • Twitter #analytics2014


Session Abstracts

Media and Communications

Weather Forecasts: Deterministic, Probabilistic or Both?

Gregory Fishel, Chief Meterologist, WRAL-TV

The use of probabilities in weather forecasts has always been problematic, in that there are as many interpretations of how to use probabilistic forecasts as there are interpreters. However, deterministic forecasts carry their own set of baggage, in that they often overpromise and underdeliver when it comes to forecast information critical for planning by various users. In recent years, an ensemble approach to weather forecasting has been adopted by many in the field of meteorology. This presentation will explore the various ways in which this ensemble technique is being used, and discuss the pros and cons of an across-the-board implementation of this forecast philosophy with the general public.
Level: Appropriate for all levels of knowledge and experience


The Growing Role of M-Learning in the Analytics Revolution: Application to Management Education

Owen Hall, Jr., Full Professor of Decision Sciences, Pepperdine University

Management education has come a long way since Sir Isaac Pitman initiated the first correspondence course in the early 1840s. Today the business school universe is under growing pressure to engage in significant reforms due to the impacts of globalization, new learning technologies, changing demographics, and unprecedented economic uncertainty. The increasing use of analytics in business and government to improve efficiency and performance suggests that similar possibilities exist for schools of business. The rapid growth of m-learning technologies has ushered in a new era in learning opportunities for management education. As a result of the m-learning paradigm, the three pillars of a traditional graduate management education – fixed time, fixed location and fixed learning pace – are being replaced with a more flexible and customized learning environment. This new instructional paradigm provides the vehicle for expanding the use of analytics throughout the community of practice. The purpose of this presentation is to highlight best practices and trends in the instruction and use of analytics in management education via m-learning.
Level: Intermediate

Financial Services, Banking

CRM: Balancing the Speed, Relevance, and Frequency of Your Customer Conversation

Landon Starr, Director of Analytics, American Express

CRM is a broad topic with a variety of perspectives. Almost all recognize the importance of big data as it relates to the topic. However, big data, fast execution, and communication volume aren't the only considerations when thinking about managing your relationship with your customer. In our presentation, we will talk about how analytics can play a role in balancing speed of execution with the importance of establishing a relevant and results-oriented customer conversation.
Level: Appropriate for all levels of knowledge and experience

Model Risk: Evaluating the Performance of GARCH-based VaR Models Through Backtesting

Kostas Kyriakoulis, Professor of Financial Analytics, Institute for Advanced Analytics

Banks and financial institutions utilize value at risk (VaR) models for both regulatory and internal capital calculations. VaR is the loss that we are fairly sure (e.g., 99 percent) will not be exceeded if the current portfolio is held over some period of time (e.g., one day). Regulatory supervisors allow the use of internal VaR models only if they provide satisfactory results in backtesting, a technique that measures the out-of-sample performance of the model. The purpose of backtesting is to address two main questions:
  1. Is the coverage of my model the expected one? In other words, assuming that my VaR estimate is 99 percent accurate, am I getting more extreme losses (exceedances) only 1 percent of the time out of sample?
  2. Even if coverage is the expected one, are the exceedances independent?
The purpose of this presentation is to show how an analyst can utilize SAS to backtest some of the most popular VaR models, such as the stationary and asymmetric GARCH, with T or normally distributed residuals. The analysis is based on the actual daily returns of the S&P 500 for the last 12 years. Backtesting is based on the following steps:
  • Estimation of more than 5,000 distinct models.
  • For each one of these 5,000 models, we simulate 10,000 returns for a one-day-ahead horizon.
  • Using the 10,000 simulated returns, the VaR is calculated at the 90, 95 and 99 percent confidence levels.
  • Coverage and Independence tests of the estimated VaR values are performed using the appropriate statistical measures (Likelihood-Ratio type tests).
Level: Intermediate

The 'Supremacy' Econometrics Deliver When Modeling the Purest Form of Losses on Retail Credit Risk: Start to Finish

David Gumpert-Hersh, Vice President, Credit Risk and Econometrics, Wescom Credit Union

This interactive discussion will carry attendees from the initial formation of the performance drivers for a firm's specific portfolio to a monthly forecast for five years out. By deriving the significant predictive factors in each economic index, both macro and micro, national and local, an overall portfolio "expansion or recession" outlook can be derived. The audience will be walked through the steps utilized to model the purest retail credit risk question: Will the borrower paid as agreed? Building on this output, the next steps are to build an Allowance for Loan Lease Loss model and TDR forecast. Today, these models have been leveraged again to increase originations by over 50 percent YoY and reduce losses by 85 percent since 2010, by pinpointing the risk appetite and fully utilizing it.
Level: Intermediate

Using Big Data to Enhance Risk Measurement

Hans Helbekkmo, Expert Principal, Risk Practice, McKinsey & Company

Advances in modeling technology have permitted the use of "big data" to enhance risk measurement. Banks in particular are increasingly making use of different types of data to improve consumer credit scoring, including data from alliances (such as supermarkets, cell phone providers and utility providers), transaction data and in some cases "unstructured" data, e.g., supporting semantic analysis of news feeds for business customers. In this presentation we would like to provide our perspectives and selected examples of recent advances in this area, including the use of supermarket purchase patterns and cellphone data to improve credit scoring for banks.
Level: Appropriate for all levels of knowledge and experience

Health Care

A Novel Model for Predicting Future Health Risk and Cost Stratification at a Member Level

Sandy Chiu, Program Manager, Clinical Analytics, Humana

Hamed Zahedi, Program Manager, Humana

More than one-sixth of the US economy is devoted to health care spending, which is estimated at $2.7 trillion annually, and continues to rise.* The Centers for Disease Control and Prevention (CDC) reports that each year, chronic diseases cause 7 in 10 deaths and account for about 75 percent of medical care costs**. This presentation will discuss a health severity score model used to predict a member's future health care costs. It allows for early identification of high-risk members and effective population stratification for personalized health care services. A customized model is built for each member segment, and the segmentation criteria can include line of business, member tenure, conditions and expected changes in utilization. Each customized model comprises a set of learning algorithms that find the most predictive linear and non-linear patterns between each member's risk factors and future health care costs. The model is currently used to refer high-risk members to clinical programs and historical results indicate positive ROI for these programs.

* America's Health Insurance Plans (AHIP)
** Center for Disease Control and Prevention (CDC)

Level: Appropriate for all levels of knowledge and experience

A Novel Predictive Model for Identifying Members at High Risk of Falling

Harpreet Singh, Metrician, Humana

Falls are a major health risk for the elderly. One in three has at least one fall annually, and falls are the leading cause of both fatal and non-fatal injuries in this population. According to estimates by the Centers for Disease Control and Prevention (CDC), the direct and indirect cost of falls is expected to reach $54.9 billion by 2020. Both patients and health care payers, then, have an interest in reducing the incidence of falls. However, this is a difficult task. Unlike certain medical conditions such as hypertension, a fall is not a single medical condition with a set definition. Incomplete documentation and limited use of falls diagnostic codes in medical insurance claims make identifying and analyzing a fall even more challenging. We developed a novel and comprehensive predictive model to identify people at risk of falling, and estimated their future likelihood of falling. The current falls predictive model has an ROC index of 0.763 with a 76 percent improvement in prediction rate for members in the top 1 percent highest-risk group, compared to a baseline rate of 20 percent.
Level: Appropriate for all levels of knowledge and experience

HIRA's Big Data Utilization and Disclosure

Logyoung Kim, Research Fellow, Health Insurance Review and Assessment Service, South Korea

Health Insurance Review and Assessment Service (HIRA) supports the government's four plans for Government 3.0 - The New Government Operation Paradigm: Opening, Sharing, Communication and Collaboration. HIRA provides services such as the information support center and IT infrastructures in which big data will be utilized to boost the health care industry and create new jobs. Using SAS software, HIRA is developing new services that will influence policymaking by forecasting health care service costs for disease groups; providing information crucial for efficient health care management; offering information about pharmaceutical products; and developing a monitoring system for treatment trends. By maximizing the way it uses health care big data, HIRA is building an ecosystem where the private sector plays a part in strengthening quality and managing efficiency within the health care system.
Level: Appropriate for all levels of knowledge and experience

Risk Adjustment Analytics: Measuring Illness Burden to Optimize Health Plan Payments

Richard Lieberman, Principal Data Scientist and Technical Architect, PeakAnalytics Software

The Affordable Care Act requires risk adjustment in health insurance products purchased by working-aged people. Risk adjustment dramatically alters commonly used risk segmentation strategies and will result in strong incentives to identify sicker-than-average members. These members represent very high premium revenues for issuers and with appropriate case management, may improve member health status and contribute significant surpluses to the issuers' bottom line. This presentation will focus on:
  • Understanding how risk adjustment models identify and quantify member illness burden.
  • How health insurance issuers need to incorporate risk adjustment into their care delivery systems.
  • Similarities and differences of risk adjustment approaches in different insurance programs.
  • Predicting how issuer and member behavior will change because of the ACA's reliance on risk adjustment.
Level: Appropriate for all levels of knowledge and experience

Visual Analytics in Health Care to Promote Accountable Care

Jo Porter, Deputy Director, Institute for Health Policy and Practice

Health care in the United States is undergoing significant change. Ensuring that the change is appropriate requires data-driven decision making. Using All-Payer Claims Database data, the New Hampshire Accountable Care project provides regional and state-level reports on a variety of cost, utilization and quality measures. The reports include dynamic querying capability and visualization, which will be demonstrated during the presentation.
Level: Appropriate for all levels of knowledge and experience


Consumer Analytics Faster, Higher, Stronger: The Consumer DNA Factory

Raphael Cailloux, Head of Marketing Intelligence and Brand Consumer Analytics, Adidas

While working on consumer analytics applications like predictive modeling, most analysts devote a lot of time on peripheral tasks such as data preparation (before the analysis) and insights dissemination (after the analysis). Not only do these tasks deter data scientists from focusing on the actual value-adding insight creation, it also poses many challenges with respect to the quality and timeliness of those insights, including knowledge management. Adidas has tackled this issue within a project called Consumer DNA. The project's goal is to provide the right information in the right context and the right format to various analytical applications, such as predictive modeling, campaign analysis or even targeting (selection) processes. The Consumer DNA (CDNA) Factory is now at the very core of Adidas marketing intelligence and dramatically enhances its capabilities. This presentation will be about revealing the magic behind the components, functions and (simple) technologies at work in the CDNA Factory. It will also draw the line to other parts of the analytical framework and show how it can systematically be optimized as a whole.
Level: Intermediate


Just Say NO To OLS

Zubin Dowlaty, Head of Innovation and Development, Mu Sigma

OLS (ordinary least squares regression and GLM) is the default statistical method and, in many occurrences, the only method used by data scientists. Analysts spend considerable time creating a model-ready data set, followed by running their respective advanced analytics technique(s). New recommendations embrace the need to explore assumptions and performance by running a portfolio of techniques and deriving insights from this process in order to improve the base statistical model. Big data is providing a scalable computation environment, enabling a data scientist to run dozens of methods rapidly, benefiting from the diversity of techniques. Ultimately, practitioners are seeking more robust and confident business analytics models, which the ensemble approach affords. Statistics, machine learning and econometrics are the core academic disciplines in the decision scientist's toolbox. For applied business analytics, Dowlaty will share how these three disciplines can work in harmony to improve your core analytical model.
Level: Intermediate

Making the Most of Your Analytical Talent

Patrick Maher, Senior Manager, SAS Global Professional Services, SAS

We will discuss five strategic considerations for managing an analytics team. These include people, culture, communication, return on investment and best practices. Points covered will include, but are not limited to: hiring, analytical skills creation, communication strategies, value creation, project approach and lessons learned - all taken from tens of thousands of hours of project work.
Level: Appropriate for all levels of knowledge and experience

Text Analytics and Topic-Attribute Association

Nick Evangelopoulos, Associate Professor of Decision Sciences, University of North Texas

In big data analytics projects, analysts are frequently faced with the challenge of demonstrating the value of text analytics to senior executives. The unstructured nature of text data often obscures the valuable information that lies within. Text analytics software solutions such as SAS Text Miner include topic extraction in their toolset. But after topics are extracted, how can you bring your analysis to the next level? In this discussion, we present topic-attribute association analysis and visualize the results with topic-attribute maps. Treatment of text data with these tools takes advantage of explicit or implicit attributes that often remain under-exploited. Bringing the topic-attribute relationship to the fore can be critical in making the business case for text analytics. Which products or services are viewed by your customers as "great" and which ones as not so great? Which product or service failure categories are most prominent in one geographical region, and which ones in another region? In what ways are your services discussed differently in different social media? The presentation goes over a number of case studies that identify and visualize topic-attribute relationships.
Level: Intermediate

Non-industry specific

Do You Know or Do You Think You Know? Building a Testing Culture at State Farm

Andy Pulkstenis, Director of Analytics, State Farm Insurance

Experimental design is becoming more common in business settings, both in strategic and tactical arenas. Many firms are recognizing the broad applications, including initial product design, marketing, logistics, customer retention, profitability analysis, strategy optimization and website design, just to name a few. Surprisingly, many still resist fully integrating designed experiments into their strategies. As with any analytic change, driving an organization to adopt strategic testing can be difficult. The presenter will share how State Farm has been moving from minimal business experimentation to a more ambitious culture of testing, as well as provide a step-by-step guide for changing, creating or improving a testing culture. He will share several State Farm examples that demonstrate the value of multivariate testing, from simple to complex. He will also share progress on incorporating test results into strategy optimization, as well as the current state of the internal constantly evolving testing landscape. The presentation will provide some tangible takeaways for business practitioners in the crowd desiring to improve their own strategies through designed experiments, and is expected to be beneficial to both experienced experimenters and those newcomers who want to champion testing in their companies but are unsure how to get started.
Level: Appropriate for all levels of knowledge and experience

Follow the Boolean Rule: How to Create Rules-Based Profiles

Julia Marshall, Lead Text Analytics Specialist, Bridgeborn Inc.

Machine-learned, statistical text mining methods are good to define distinct, non-overlapping topics. But what if the differences between topics are more nuanced than a "bag of words" approach can decipher? SAS allows users to build rules using Boolean logic that can assess the intended meaning of the author by adding the contextual subtleties that are often necessary to precisely categorize documents - even categories with significant overlap. This talk will present a real-world case study of how profiles for USAID's Development Experience Clearinghouse were developed using the power of Boolean rules. The presentation will cover how to analyze the language within documents and how to translate that into rules that mimic how a human would categorize content. For both novice and experienced analysts, this practical session will also illustrate how to build and test Boolean-based rule logic, as well as some basic and necessary management procedures for maintaining rules.
Level: Appropriate for all levels of knowledge and experience

Modeling Donations, Costs and Counts Using Two-Part Models

Laura Ring Kapitula, Assistant Professor, Grand Valley State University

There are many situations where an outcome of interest has a large number of zero outcomes and a group of non-zero outcomes that are discrete or highly skewed. One such example is health care costs, for many patients in a given time period will have zero costs, and if a cost is incurred, most of the costs will be relatively small, but some will be extremely high. Another example is in the prediction of charitable donations, where many potential donors may give nothing, and conditional on a person having given, the donations will often be very positively skewed with a few very large donors. In the modeling of counts there are also times where there are more zeros than would be expected using standard methodology, and a zero-inflated model may be appropriate. If data have such structure and ordinary least squares methods are used, then predictions and estimation may be very inaccurate. The two-part model is an analytics method for modeling in such situations. In a two-part model, we first model the probability of a non-zero event occurring using a generalized linear model, and then conditional on a positive event occurring, the outcome can be modeled using another generalized linear model using the same or different covariates. Methodology will be demonstrated for fitting two-part models for independent data using SAS PROC GENMOD and for fitting two-part models for dependent data using SAS PROC GLIMMIX. The fitted models will be used for prediction and estimation.
Level: Intermediate

Unstructured Data Analysis: Real-World Applications and Business Case Studies

Goutam Chakraborty, Director of the Graduate Certificate in Business Data Mining and Professor of Marketing, Oklahoma State University

Murali Pagolu, Business Analytics Consultant, SAS

In recent years, decision makers have shown great interest in incorporating unstructured data into their decision making. Insights derived from analyzing textual information can help business managers make better decisions. Call center logs, emails, documents, web articles, blog posts, tweets, Facebook posts, YouTube video comments, and customer product reviews are prominent sources of unstructured data encompassing both internal and external sources. It is impossible for anyone to manually sift through large volumes of text data and analyze them, let alone use the results to make strategic decisions. Needless to say, you need sophisticated tools developed based on scientific methodologies that can help detect anomalies, capture trends and analyze patterns in the data. In this presentation, we discuss how to extract, standardize, organize and analyze textual data for extracting insightful customer intelligence from various data sources. Using SAS Text Analytics tools, we showcase some of the successful implementations of SAS Text Analytics solutions and other application areas along with case studies. While SAS products are used as tools for demonstration only, the topics and theories covered are generic (not tool-specific).
Level: Appropriate for all levels of knowledge and experience

This list was built using SAS software.