• home icon
  • Phone icon
  • Email icon
  • Twitter #analytics2014


Session Abstracts

Business Analytics and Hadoop in the Enterprise

Shaun Connolly, Vice President for Corporate Strategy, Hortonworks

Helen Fowler, Senior Director, SAS and Teradata Center of Excellence (COE,) Americas

Paul Kent, SAS

Bill Wheatley, JPMorgan Chase

Join this session to explore the trends and drivers for big data analytics and Hadoop. Our panelists will discuss their big data journey from proof of concept to production. They will explore and discuss the business impact of Hadoop, share the challenges they experienced, solutions they found and highlight key lessons you can use in your organization.
Level: Appropriate for all levels of knowledge and experience

Media and Communications

Network Analysis for Business Applications

Carlos Andre Reis Pinheiro, Data Scientist, EMC

Network analysis is a major discipline in analytics that helps you foresee business events and understand the customer's behavior, including how people are related to each other, create communities and behave within social structures. By mapping the relationships among customers, throughout different mechanisms such as calls, texts, messages, claims and bank transactions, companies are able to recognize core nodes within their networks and establish the best business actions to take. Relationships among distinct types of entities can also highlight unexpected behavior in relation to risk or fraud. This presentation will exploit some practical applications where network analysis might be used to improve sales, diffuse products, avoid churn and detect fraud.
Level: Appropriate for all levels of knowledge and experience

The Journey to Multimodel Environment: A Turkcell Story From Creation to Measurement

Tamer Cagatay, Business Analytics Lead, Turkcell

Turkcell is the leading GSM operator in Turkey, and the first and only Turkish company ever to be listed on the New York Stock Exchange. Turkcell has a vision of facilitating the lives of subscribers, and this vision requires a microsegmented point of view and a high variation of GSM products. Establishing an intelligent targeting environment for numerous products has been a challenge for Turkcell for years. The need for quickly designed, but acceptably accurate and sustainable, models is answered by SAS Rapid Predictive Modeler. Selected major products were modeled using this SAS solution, and results are pretty satisfactory considering the time-to-market and integration to Turkcell's current SAS modeling environment.

The products in the first phase play a small but leading part in Turkcell's product portfolio. For the remaining products, Turkcell decided to generate a new way of modeling experimentally. Thus, considering the dimensions of major products, a hybrid model is designed and additional propensity scores are calculated, resulting in a dimensional modeling approach.

Adding more than 30 models to its modeling portfolio created another challenge for Turkcell: monitoring and measuring model performance regularly. Although data preparation for modeling is generally thought to be the most time-consuming part of the model development life cycle, performance measurement and taking fast action in case of a decrease in accuracy should not be underestimated. SAS Model Manager helps Turkcell to measure the performance of many models regularly by generating tracking reports and alarms, automating scoring tasks, and retraining the models when accuracy is below acceptable thresholds.

Level: Appropriate for all levels of knowledge and experience

Weather Forecasts: Deterministic, Probabilistic or Both?

Gregory Fishel, Chief Meterologist, WRAL-TV

The use of probabilities in weather forecasts has always been problematic, in that there are as many interpretations of how to use probabilistic forecasts as there are interpreters. However, deterministic forecasts carry their own set of baggage, in that they often overpromise and underdeliver when it comes to forecast information critical for planning by various users. In recent years, an ensemble approach to weather forecasting has been adopted by many in the field of meteorology. This presentation will explore the various ways in which this ensemble technique is being used, and discuss the pros and cons of an across-the-board implementation of this forecast philosophy with the general public.
Level: Appropriate for all levels of knowledge and experience

What Is Relevant? The Key Question When Working With Big Data

Carlos Lara, Senior Solutions Architect, SAS

Collecting and storing data is easier and cheaper than ever before. Organizations are looking for ways to quickly find insights from the sea of data they have. The amount of data actually relevant to any organization's mission is tiny - sometimes less than 1 percent of what it collects. The only way one can leverage big data for valuable insights is by using analytics and embedding it at the point of decision and/or in the process itself to enable more real-time decision making.

This presentation uses a big data example to show how the new SAS technologies were used to expedite the analytics life cycle. It will show how SAS Visual Analytics, SAS Visual Statistics, SAS In-Memory Statistics for Hadoop and SAS Enterprise Miner with high-performance nodes were used with digital data to predict demographic characteristics of site visitors by addressing commonly found challenges with big data.

Level: Appropriate for all levels of knowledge and experience

Energy and Utilities

Creating Analytics-Driven Strategies in Enerjisa

Ozcan Cavus, Enerjisa

Mehmet Firat, Head of Project Management, Enerjisa

Enerjisa (part of E.ON & Sabanci Group) is one of Turkey's largest utilities companies. The company employs more than 10,000 people at locations across the country. Enerjisa is one of the biggest suppliers of electricity by volume in the country, providing electricity to more than 9 million business and residential customers.

Since the utilities market in Turkey started opening up to competition during the past years, several private corporations made investments in the utilities industry through privatization, which transformed the market to a highly competitive environment. To be able to meet these regulatory and competitive changes in the market, Enerjisa wanted to find new ways to engage more effectively with its customers and expedite decision processes.

To make this happen, Enerjisa invested in SAS Analytics and SAS Data Management solutions and created an analytical competency center, which has already started delivering projects in several areas, including customer data quality, customer segmentation, demand forecasting, pricing, revenue analytics and credit scoring.

In this talk, we will present how Enerjisa created an analytical competency center that is focused on information sharing and data quality, what types of challenges we have faced during this process and the projects, as well as our analytical road map.

Level: Appropriate for all levels of knowledge and experience


The Growing Role of M-Learning in the Analytics Revolution: Application to Management Education

Owen Hall, Jr., Full Professor of Decision Sciences, Pepperdine University

Management education has come a long way since Sir Isaac Pitman initiated the first correspondence course in the early 1840s. Today the business school universe is under growing pressure to engage in significant reforms due to the impacts of globalization, new learning technologies, changing demographics, and unprecedented economic uncertainty. The increasing use of analytics in business and government to improve efficiency and performance suggests that similar possibilities exist for schools of business. The rapid growth of m-learning technologies has ushered in a new era in learning opportunities for management education. As a result of the m-learning paradigm, the three pillars of a traditional graduate management education – fixed time, fixed location and fixed learning pace – are being replaced with a more flexible and customized learning environment. This new instructional paradigm provides the vehicle for expanding the use of analytics throughout the community of practice. The purpose of this presentation is to highlight best practices and trends in the instruction and use of analytics in management education via m-learning.
Level: Intermediate

Financial Services, Banking

ATM Cash Optimization Through an Inflexible Supply Chain

James Gearheart, Manager, Advanced Analytics Performance Consulting, JPMorgan Chase

Donovan Gibson, Vice President and Analytics Manager, JPMorgan Chase

In the financial services industry, there have been several solutions developed to optimize the vendor service schedules for ATMs. Although these solutions find the optimal schedule to service an ATM fleet, it is rare to find a large retail bank that owns and/or controls all the factors of the supply channel. In reality, banks need to account for a range of factors including flexible vendor schedules, a range of ATM deposit and withdrawal capacities, volatile demand patterns and a long list of business rules that account for process control. In order to account for these variations, a probabilistic model that balances the cost of service with the probability of a disruption of ATM functionality has been developed to provide schedule recommendations that optimize the customer impact across a range of probabilities.

JPMorgan Chase's vendor service optimization model consists of three main parts: 1) a unique ATM forecasting method that selects the best forecasting methodology for an individual ATM, 2) a Monte Carlo simulation algorithm that provides the fault probability for every possible service schedule, and 3) a business rule decision engine that assures the model output meets business needs. The presenters will discuss the development, testing and deployment of each part of the model.

Level: Appropriate for all levels of knowledge and experience

CRM: Balancing the Speed, Relevance, and Frequency of Your Customer Conversation

Landon Starr, Director of Analytics, American Express

Pieter van Ispelen, Performance Analytics at American, American Express

CRM is a broad topic with a variety of perspectives. Almost all recognize the importance of big data as it relates to the topic. However, big data, fast execution, and communication volume aren't the only considerations when thinking about managing your relationship with your customer. In our presentation, we will talk about how analytics can play a role in balancing speed of execution with the importance of establishing a relevant and results-oriented customer conversation.
Level: Appropriate for all levels of knowledge and experience

Improving Behavior Scoring Models by Using Ensemble Methods

Lan Wang, Global Analytics, Ford Motor Credit Co.

Logistic regression is a common technique used to develop scorecards in the financial industry, where the predicted variable is binary. Decision trees and other methods can also be used to predict binary targets. In this talk, we will show how an ensemble model (a combination of base models to form a composite predictor) can improve concordance and accuracy for certain behavior scoring models. We will demonstrate how to interpret the model.
Level: Appropriate for all levels of knowledge and experience

Model Risk: Evaluating the Performance of GARCH-based VaR Models Through Backtesting

Kostas Kyriakoulis, Professor of Financial Analytics, Institute for Advanced Analytics

Banks and financial institutions utilize value at risk (VaR) models for both regulatory and internal capital calculations. VaR is the loss that we are fairly sure (e.g., 99 percent) will not be exceeded if the current portfolio is held over some period of time (e.g., one day). Regulatory supervisors allow the use of internal VaR models only if they provide satisfactory results in backtesting, a technique that measures the out-of-sample performance of the model. The purpose of backtesting is to address two main questions:
  1. Is the coverage of my model the expected one? In other words, assuming that my VaR estimate is 99 percent accurate, am I getting more extreme losses (exceedances) only 1 percent of the time out of sample?
  2. Even if coverage is the expected one, are the exceedances independent?
The purpose of this presentation is to show how an analyst can utilize SAS to backtest some of the most popular VaR models, such as the stationary and asymmetric GARCH, with T or normally distributed residuals. The analysis is based on the actual daily returns of the S&P 500 for the last 12 years. Backtesting is based on the following steps:
  • Estimation of more than 5,000 distinct models.
  • For each one of these 5,000 models, we simulate 10,000 returns for a one-day-ahead horizon.
  • Using the 10,000 simulated returns, the VaR is calculated at the 90, 95 and 99 percent confidence levels.
  • Coverage and Independence tests of the estimated VaR values are performed using the appropriate statistical measures (Likelihood-Ratio type tests).
Level: Intermediate

Modernizing Analytics to Enhance Business Efficiency

Sterling Green, Vice President of Business Intelligence, Delivery Services, Enterprise Data and Integration, SunTrust Bank

The journey of a thousand miles begins with the first step.

This presentation will examine the business issue drivers that informed the decision to migrate the SAS user community to an n-tiered thin client/server SAS grid-enabled analytics platform. A systematic review of the milestones, challenges, benefits received and lessons learned will be presented. In addition, a road map to future directions will be offered. The necessity to build a foundation for a successful outcome will be considered. Representative empirical performance metrics will be shared.

Level: Appropriate for all levels of knowledge and experience

SAS Visual Analytics: Supercharging Data Reporting and Analytics

Ryan Marcum, Vice President for Acquisitions Credit Risk, Wells Fargo Home Mortgage

Data visualization can be like a GPS directing us to where we should spend our analytical efforts. In today's big data world, many businesses are still challenged to quickly and accurately distill insights and solutions from ever-expanding information streams. As Wells Fargo CEO John Stumpf has noted, "We cannot wait for data to run and have insights overnight, we live in a real-time world and need to know and understand our customers in a real-time and secure way." The Wells Fargo Credit Risk department focuses on delivering timely, accurate and reliable information and analytics to help answer questions posed by internal and external stakeholders. The group measures, analyzes and provides proactive recommendations to support and direct credit policy and strategic business changes. This session will focus on how Wells Fargo evaluated potential solutions and created a new go-forward vision using a world-class visual analytics platform with strong data governance to replace manually intensive processes.
Level: Appropriate for all levels of knowledge and experience

The 'Supremacy' Econometrics Deliver When Modeling the Purest Form of Losses on Retail Credit Risk: Start to Finish

David Gumpert-Hersh, Vice President, Credit Risk and Econometrics, Wescom Credit Union

This interactive discussion will carry attendees from the initial formation of the performance drivers for a firm's specific portfolio to a monthly forecast for five years out. By deriving the significant predictive factors in each economic index, both macro and micro, national and local, an overall portfolio "expansion or recession" outlook can be derived. The audience will be walked through the steps utilized to model the purest retail credit risk question: Will the borrower paid as agreed? Building on this output, the next steps are to build an Allowance for Loan Lease Loss model and TDR forecast. Today, these models have been leveraged again to increase originations by over 50 percent YoY and reduce losses by 85 percent since 2010, by pinpointing the risk appetite and fully utilizing it.
Level: Intermediate

Using Big Data to Enhance Risk Measurement

Hans Helbekkmo, Expert Principal, Risk Practice, McKinsey & Company

Advances in modeling technology have permitted the use of "big data" to enhance risk measurement. Banks in particular are increasingly making use of different types of data to improve consumer credit scoring, including data from alliances (such as supermarkets, cell phone providers and utility providers), transaction data and in some cases "unstructured" data, e.g., supporting semantic analysis of news feeds for business customers. In this presentation we would like to provide our perspectives and selected examples of recent advances in this area, including the use of supermarket purchase patterns and cellphone data to improve credit scoring for banks.
Level: Appropriate for all levels of knowledge and experience

Health Care

A Novel Model for Predicting Future Health Risk and Cost Stratification at a Member Level

Sandy Chiu, Program Manager, Clinical Analytics, Humana

Hamed Zahedi, Program Manager, Humana

More than one-sixth of the US economy is devoted to health care spending, which is estimated at $2.7 trillion annually, and continues to rise.* The Centers for Disease Control and Prevention (CDC) reports that each year, chronic diseases cause 7 in 10 deaths and account for about 75 percent of medical care costs**. This presentation will discuss a health severity score model used to predict a member's future health care costs. It allows for early identification of high-risk members and effective population stratification for personalized health care services. A customized model is built for each member segment, and the segmentation criteria can include line of business, member tenure, conditions and expected changes in utilization. Each customized model comprises a set of learning algorithms that find the most predictive linear and non-linear patterns between each member's risk factors and future health care costs. The model is currently used to refer high-risk members to clinical programs and historical results indicate positive ROI for these programs.

* America's Health Insurance Plans (AHIP)
** Center for Disease Control and Prevention (CDC)

Level: Appropriate for all levels of knowledge and experience

A Novel Predictive Model for Identifying Members at High Risk of Falling

Harpreet Singh, Metrician, Humana

Falls are a major health risk for the elderly. One in three has at least one fall annually, and falls are the leading cause of both fatal and non-fatal injuries in this population. According to estimates by the Centers for Disease Control and Prevention (CDC), the direct and indirect cost of falls is expected to reach $54.9 billion by 2020.

Both patients and health care payers, then, have an interest in reducing the incidence of falls. However, this is a difficult task. Unlike certain medical conditions such as hypertension, a fall is not a single medical condition with a set definition. Incomplete documentation and limited use of falls diagnostic codes in medical insurance claims make identifying and analyzing a fall even more challenging. We developed a novel and comprehensive predictive model to identify people at risk of falling, and estimated their future likelihood of falling.

The current falls predictive model has an ROC index of 0.763 with a 76 percent improvement in prediction rate for members in the top 1 percent highest-risk group, compared to a baseline rate of 20 percent.

Level: Appropriate for all levels of knowledge and experience

Application of Survival Model in the Health Insurance Sector to Predict Member Lapses

Praveen Saxena, Senior Consultant, WellPoint

Vivian Zhou, Director of Planning and Data Analysis, WellPoint Inc.

The goal of every company is to grow the business. While it is important to maximize sales, it is also important to keep attrition to a minimum. It costs significantly more to acquire a member than to retain a member. Also, with the implementation of the Affordable Care Act, there is a rising concern that member lapse would increase dramatically, so retaining profitable members has become increasingly important and challenging in the current market within the health insurance sector. To retain more members we need to accurately predict lapse and understand its drivers.

In order to accurately predict lapses and understand lapse behavior, we did a study which applies a survival model to estimate the probability of a member to lapse for every future month from the observed date given various time-dependent covariates like claims, premium, age, etc. The probability estimates are used to identify the target population for our members and when to target. The output from the model is further used to identify top drivers and how they impact lapse behavior. This facilitates in developing effective retention programs. We have applied the individual-level monthly probability estimates of lapse in another study to estimate lifetime value of prospective members and residual value of current members. Thus for marketing and retention purposes, we can target the most profitable customer with the highest persistency.

Level: Intermediate

HIRA's Big Data Utilization and Disclosure

Logyoung Kim, Research Fellow, Health Insurance Review and Assessment Service, South Korea

Health Insurance Review and Assessment Service (HIRA) supports the government's four plans for Government 3.0 - The New Government Operation Paradigm: Opening, Sharing, Communication and Collaboration. HIRA provides services such as the information support center and IT infrastructures in which big data will be utilized to boost the health care industry and create new jobs.

Using SAS software, HIRA is developing new services that will influence policymaking by forecasting health care service costs for disease groups; providing information crucial for efficient health care management; offering information about pharmaceutical products; and developing a monitoring system for treatment trends.

By maximizing the way it uses health care big data, HIRA is building an ecosystem where the private sector plays a part in strengthening quality and managing efficiency within the health care system.

Level: Appropriate for all levels of knowledge and experience

Improving HEDIS Scores, Reducing Cost, and Delivering More Effective Care

Craig Willis, Physicians Pharmacy Alliance

Today's health care is an incredibly complex and quickly evolving system. Physicians Pharmacy Alliance (PPA) partnered with SAS Visual Analytics is delivering innovative, cost-effective solutions to managed Medicaid plans to help them control their costs and improve patient outcomes. PPA Director of Analytics Craig Willis is discussing how SAS Visual Analytics has become a mission-critical component of his company's service offering.
Level: Appropriate for all levels of knowledge and experience

Risk Adjustment Analytics: Measuring Illness Burden to Optimize Health Plan Payments

Richard Lieberman, Principal Data Scientist and Technical Architect, PeakAnalytics Software

The Affordable Care Act requires risk adjustment in health insurance products purchased by working-aged people. Risk adjustment dramatically alters commonly used risk segmentation strategies and will result in strong incentives to identify sicker-than-average members. These members represent very high premium revenues for issuers and with appropriate case management, may improve member health status and contribute significant surpluses to the issuers' bottom line.

This presentation will focus on:

  • Understanding how risk adjustment models identify and quantify member illness burden.
  • How health insurance issuers need to incorporate risk adjustment into their care delivery systems.
  • Similarities and differences of risk adjustment approaches in different insurance programs.
  • Predicting how issuer and member behavior will change because of the ACA's reliance on risk adjustment.
Level: Appropriate for all levels of knowledge and experience

The Spatio-Temporal Impact of Urgent Care Centers on Physician and ER Use

Daryl Wansink, Director of Health Economics, Blue Cross Blue Shield of North Carolina

The unsustainable rise in health care costs has led to efforts to shift some health care services to less expensive sites of care. In North Carolina, the expansion of urgent care centers introduces the possibility that non-emergency and non-life-threatening conditions can be treated at a less intensive care setting. BCBSNC conducted a longitudinal study of density of urgent care centers, primary care providers and emergency departments and differences in how members access care near those locations.

The talk will focus on several analytic techniques that were considered for the analysis. The model needed to account for the complex relationship between the changes in the population (including health conditions and health insurance benefits) and the changes in the types of services and supply of services offered by health care providers proximal to them. Results for the chosen methodology will be discussed.

Level: Appropriate for all levels of knowledge and experience

Visual Analytics in Health Care to Promote Accountable Care

Jo Porter, Deputy Director, Institute for Health Policy and Practice

Health care in the United States is undergoing significant change. Ensuring that the change is appropriate requires data-driven decision making. Using All-Payer Claims Database data, the New Hampshire Accountable Care project provides regional and state-level reports on a variety of cost, utilization and quality measures. The reports include dynamic querying capability and visualization, which will be demonstrated during the presentation.
Level: Appropriate for all levels of knowledge and experience


Consumer Analytics Faster, Higher, Stronger: The Consumer DNA Factory

Raphael Cailloux, Head of Marketing Intelligence and Brand Consumer Analytics, Adidas

While working on consumer analytics applications like predictive modeling, most analysts devote a lot of time on peripheral tasks such as data preparation (before the analysis) and insights dissemination (after the analysis). Not only do these tasks deter data scientists from focusing on the actual value-adding insight creation, it also poses many challenges with respect to the quality and timeliness of those insights, including knowledge management.

Adidas has tackled this issue within a project called Consumer DNA. The project's goal is to provide the right information in the right context and the right format to various analytical applications, such as predictive modeling, campaign analysis or even targeting (selection) processes. The Consumer DNA (CDNA) Factory is now at the very core of Adidas marketing intelligence and dramatically enhances its capabilities.

This presentation will be about revealing the magic behind the components, functions and (simple) technologies at work in the CDNA Factory. It will also draw the line to other parts of the analytical framework and show how it can systematically be optimized as a whole.

Level: Intermediate


Mass-Scale Modeling: Model Your Entire Business with SAS

Jared Dean, Senior Director, Research and Development, SAS

Jonathan Wexler, Principal Product Manager, SAS

Over the last year, SAS Enterprise Miner R&D has added significant functionality into the suite, adding in 10 new nodes covering machine learning, predictive modeling, open-source integration and much more. In this session, you will learn about the exciting new additions to SAS Enterprise Miner, while getting a sneak peak at the future. Learn about new features that enable you to automatically build and manage thousands of models using data mining and machine learning methods --- across thousands of segments (i.e. product SKUs, devices, product lines, channels, campaigns), which allows for "Collaborative Analytics at Scale".
Level: Appropriate for all levels of knowledge and experience

Clustering Methods in Time Series Analysis

David Corliss, Predictive Analytics Consultant, Ford Motor Company

Cluster analysis or segmentation is used to divide a group with similar characteristics or behaviors into subsets. Two clustering methods have been developed expressly for time series data. Similarity analysis groups complete time series into clusters with similar patterns over time (e.g., to identify economic patterns and cycles). Phase analysis operates within a time series, dividing it into successive intervals with the behavior changing with each new stage. This recent development in time series analysis has been used to identify data-driven seasonal changes, as well as stages of development in evolving systems. This presentation gives an overview of both methods, with practical examples and a discussion of some of the issues that can arise in handling the data.
Level: Intermediate

Just Say NO To OLS

Zubin Dowlaty, Head of Innovation and Development, Mu Sigma

OLS (ordinary least squares regression and GLM) is the default statistical method and, in many occurrences, the only method used by data scientists. Analysts spend considerable time creating a model-ready data set, followed by running their respective advanced analytics technique(s). New recommendations embrace the need to explore assumptions and performance by running a portfolio of techniques and deriving insights from this process in order to improve the base statistical model. Big data is providing a scalable computation environment, enabling a data scientist to run dozens of methods rapidly, benefiting from the diversity of techniques. Ultimately, practitioners are seeking more robust and confident business analytics models, which the ensemble approach affords. Statistics, machine learning and econometrics are the core academic disciplines in the decision scientist's toolbox. For applied business analytics, Dowlaty will share how these three disciplines can work in harmony to improve your core analytical model.
Level: Intermediate

Leveraging SAS Advanced Analytics For SAS Business Solutions

Manoj Chari, Senior Director, Advanced Analytics, SAS

Tugrul Sanli, Senior Director, Advanced Analytics, SAS

Customers routinely use SAS language and advanced analytics tools to build applications to solve business problems. This presentation will discuss instances where SAS has created analytical solutions that enable customers to address specific classes of business problems without needing to access SAS advanced analytics tools directly. After introducing the business context and motivation, we will discuss the analytical complexities that require the development of innovative techniques that go beyond a straightforward application of analytical functionality in the SAS advanced analytics portfolio.

We will begin with a discussion of SAS Revenue Management and Pricing Optimization Analytics. This is not an off-the-shelf solution, but an industry-neutral toolbox that allows for rapid development of pricing and revenue management solutions. These solutions are data-intensive (often transaction-based), while requiring advanced forecasting and econometric estimation techniques as well as optimization processes that cater to the specific needs of the customer and their system environment. Next, we will discuss analytical engines developed at SAS for two customer intelligence solutions - SAS Marketing Optimization and SAS Customer Link Analytics - in which the scale of problems require novel, large-scale implementations of certain optimization and network analysis methods. We will also use customer case studies to illustrate how these powerful analytical engines have produced concrete business value.

Level: Intermediate

Machine Learning at Scale

Xiangxiang Meng, Staff Scientist, SAS

Wayne Thompson, Chief Data Scientist, SAS

Machine learning helps develop deep insights from data assets faster and with greater precision, leading to an improved bottom line, reduced risk, better customer understanding and more success metrics. Machine learning applications are predominately focused on segmentation, classification and prediction. SAS In-Memory Statistics for Hadoop and SAS Visual Statistics deliver interactive statistical modeling and machine learning. You can use these products to manage data, perform exploratory analysis, build models and score data with unlimited amounts of data stored in Hadoop or other distributed file systems. The in-memory architecture of these two products offers collocated analytics for unprecedented speed and multiuser concurrency.

This session provides examples based on an overview of several machine learning methods delivered with SAS In-Memory Statistics for Hadoop and SAS Visual Statistics:

  1. Logistic regression and random bootstrap forest for predictions and classifications.
  2. Decision tree and k-means clustering for data segmentation.
Level: Intermediate

Making the Most of Your Analytical Talent

Patrick Maher, Senior Manager, SAS Global Professional Services, SAS

We will discuss five strategic considerations for managing an analytics team. These include people, culture, communication, return on investment and best practices. Points covered will include, but are not limited to: hiring, analytical skills creation, communication strategies, value creation, project approach and lessons learned - all taken from tens of thousands of hours of project work.
Level: Appropriate for all levels of knowledge and experience

New SAS Procedures for State Space-Based Modeling: Exponential Smoothing Models (ESMX) and General State Space Models (SSM)

Michele Trovero, Research Statistician Developer, SAS

State space models (SSMs) refer to a class of models that describes the probabilistic dependence between the latent state variable and the observed measurement. SSMs provide a general framework for analyzing systems that are measured or observed through a stochastic process. SAS/ETS presents two new procedures for the analysis of time series data based on SSMs.

The ESMX procedure analyzes and forecasts univariate time series data by using exponential smoothing models (ESM) based on an innovation formulation of SSM. These models provide a statistical framework to the commonly used ESM methods, which enables statistical inference in a natural way. Moreover, unlike traditional ESMs, these models can be extended to include other components, such as events and input variables.

The SSM procedure is designed for modeling with general linear SSMs. It enables the analysis of a wider class of data settings (univariate and multivariate time series, panel of time series, and longitudinal data). The SSM procedure provides a state-of-the-art interface for linear state space modeling.

Level: Advanced

New Techniques for Doing Association Classification and a Demonstration of Their Usefulness for Mining Text

James Cox, Senior Manager, Advanced Analytics R&D, SAS

Saratendu Sethi, Senior Director, Advanced Analytics, SAS

The Text Analytics R&D team has developed new techniques that transform the world of association mining. Recently introduced into SAS Text Miner and SAS Contextual Analysis and approved this summer by the US Patent Office is Bool-yer, an algorithm for classification with a huge number of potential features. Bool-yer reliably and quickly produces a set of Boolean rules to predict a category, or RHS (right-hand side), with demonstrated superior accuracy to any other rule-based technique. Now the ideas behind Bool-yer have been extended into AssoCat, which presents many potential rules to a user in a dynamic network display. This will allow SAS Contextual Analysis users to interact with this network, using their domain knowledge to choose the rules of interest from the best candidate rules generated by the algorithm. There is promise here to totally transform the business of rule writing; having users choose rules from a proven set should allow them to leverage their expertise more effectively than by writing rules from scratch. Finally, the innovations here can be applied to mining any type of data existing as "transactions" with a large set of attributes, such as whole-genome sequence analysis, web path mining and market basket analysis.
Level: Appropriate for all levels of knowledge and experience

Text Analytics and Topic-Attribute Association

Nick Evangelopoulos, Associate Professor of Decision Sciences, University of North Texas

In big data analytics projects, analysts are frequently faced with the challenge of demonstrating the value of text analytics to senior executives. The unstructured nature of text data often obscures the valuable information that lies within. Text analytics software solutions such as SAS Text Miner include topic extraction in their toolset. But after topics are extracted, how can you bring your analysis to the next level? In this discussion, we present topic-attribute association analysis and visualize the results with topic-attribute maps. Treatment of text data with these tools takes advantage of explicit or implicit attributes that often remain under-exploited. Bringing the topic-attribute relationship to the fore can be critical in making the business case for text analytics. Which products or services are viewed by your customers as "great" and which ones as not so great? Which product or service failure categories are most prominent in one geographical region, and which ones in another region? In what ways are your services discussed differently in different social media? The presentation goes over a number of case studies that identify and visualize topic-attribute relationships.
Level: Intermediate

Using the OPTMODEL Procedure in SAS/OR to Solve Complex Problems

Rob Pratt, Senior Manager, Operations Research, SAS

Mathematical optimization is a powerful paradigm for modeling and solving business problems that involve interrelated decisions about resource allocation, pricing, routing, scheduling and similar issues. The OPTMODEL procedure in SAS/OR software provides unified access to a wide range of optimization solvers and supports both standard and customized optimization algorithms. This presentation illustrates PROC OPTMODEL's power and versatility in building and solving optimization models and describes the significant improvements that result from PROC OPTMODEL's many new features. Highlights include the recently added support for the network solver, the constraint programming solver and the COFOR statement, which allows parallel execution of independent solver calls. Best practices for complex problems that require access to more than one solver are also demonstrated.
Level: Intermediate

What You Don't Know About Education and Data Science

Jennifer Priestley, Professor of Applied Statistics and Data Science, Kennesaw State University

The demand for deep analytical skills is far outpacing the supply. Finding data scientists on the open market is difficult and expensive - if you can find them. Some say they are increasingly becoming unicorns - because you just can't find them. Part of the reason is that we are experiencing the employment equivalent of a run on the bank, with all sectors of the economy trying to hire the same talent at the same time. Come find out what one university is doing to close the analytical talent gap and develop the next generation of data scientists - and how you can help.
Level: Appropriate for all levels of knowledge and experience

Why Econometrics Should Be in Your Analytics Toolkit: Applications of Causal Inference

Jan Chvosta, Senior Manager, Advanced Analytics R&D, SAS

Kenneth Sanford, Senior Research Statistician, Advanced Analytics, SAS

Econometrics is about using observational data (data collected outside a controlled environment) to make causal inference. At the heart of this problem is dealing with the nonexperimental nature of the data. In this talk we will focus on the methods used to make causal statements from data collected outside a laboratory environment. We will then compare and contrast the implications of ignoring these data complications with practical applications to price elasticity modeling and campaign evaluation.
Level: Appropriate for all levels of knowledge and experience

Non-industry specific

An Overview of Machine Learning With SAS Enterprise Miner

Patrick Hall, Research Statistician Developer, SAS

SAS and SAS Enterprise Miner provide advanced, multithreaded and distributed machine learning capabilities. Three examples of machine learning tasks along with sample code and web resources will be presented. Each example uses machine learning algorithms from SAS Enterprise Miner and data from the Kaggle predictive modeling competitions: document classification with the EMC Israel Data Science Challenge data, estimating the number of clusters in the Claim Prediction Challenge data, and deep learning with the MNIST Digit Recognizer data. Examples will emphasize the analytical approach to each task, the ability to incorporate machine learning results into the broader SAS Enterprise Miner data mining platform, and integration with other machine learning technologies.
Level: Intermediate

Big Data and Hadoop: Moving beyond the hype to realize your analytics strategy with SAS

Mauro Cazzari,

Kelly Hobson, Analytical Consultant, SAS

Wayne Thompson, Chief Data Scientist, SAS

Josh Wills, Senior Director of Data Science, Cloudera

Join us for this panel presentation featuring Cloudera and SAS experts and get answers to your most burning questions on Hadoop. How does Hadoop benefit my work as an analyst or data scientist? Can Hadoop help me fit and deploy my models faster before my business problem changes? Can machines really learn? What are the considerations for integrating Hadoop into our analytics strategy? What challenges and potential pitfalls can we avoid? How do I know if my organization is ready for Hadoop? Whether you are just beginning to explore Hadoop or have an existing Hadoop environment that you want to apply in your analytics, you'll benefit from this question-and-answer session and the straight talk on big data and Hadoop.
Level: Appropriate for all levels of knowledge and experience

Capture, Manage and Analyze Structured and Unstructured Data in a Single Big Data Platform

Helen Fowler, Senior Director, SAS and Teradata Center of Excellence (COE,) Americas

Vic Hoffman, Director, SAS and Teradata Center of Excellence (COE,) Retail Division

Understanding the SAS and Teradata Advanced Analytics Advantage Program for Hadoop

Looking for an integrated, end-to-end analytics and data management platform for big data that includes Hadoop? SAS and Teradata have partnered to build a high-performance architecture that allows you to capture more data from more sources and quickly apply advanced analytics to all of your data at once. This solution reduces the processing time for your data models from days to minutes.

Come to this presentation and learn more about the SAS and Teradata Advanced Analytics Advantage Program for Hadoop and how it can work for you.

Level: Appropriate for all levels of knowledge and experience

Do You Know or Do You Think You Know? Building a Testing Culture at State Farm

Andy Pulkstenis, Director of Analytics, State Farm Insurance

Experimental design is becoming more common in business settings, both in strategic and tactical arenas. Many firms are recognizing the broad applications, including initial product design, marketing, logistics, customer retention, profitability analysis, strategy optimization and website design, just to name a few. Surprisingly, many still resist fully integrating designed experiments into their strategies.

As with any analytic change, driving an organization to adopt strategic testing can be difficult. The presenter will share how State Farm has been moving from minimal business experimentation to a more ambitious culture of testing, as well as provide a step-by-step guide for changing, creating or improving a testing culture. He will share several State Farm examples that demonstrate the value of multivariate testing, from simple to complex. He will also share progress on incorporating test results into strategy optimization, as well as the current state of the internal constantly evolving testing landscape.

The presentation will provide some tangible takeaways for business practitioners in the crowd desiring to improve their own strategies through designed experiments, and is expected to be beneficial to both experienced experimenters and those newcomers who want to champion testing in their companies but are unsure how to get started.

Level: Appropriate for all levels of knowledge and experience

Follow the Boolean Rule: How to Create Rules-Based Profiles

Julia Marshall, Lead Text Analytics Specialist, Bridgeborn Inc.

Machine-learned, statistical text mining methods are good to define distinct, non-overlapping topics. But what if the differences between topics are more nuanced than a "bag of words" approach can decipher? SAS allows users to build rules using Boolean logic that can assess the intended meaning of the author by adding the contextual subtleties that are often necessary to precisely categorize documents - even categories with significant overlap. This talk will present a real-world case study of how profiles for USAID's Development Experience Clearinghouse were developed using the power of Boolean rules. The presentation will cover how to analyze the language within documents and how to translate that into rules that mimic how a human would categorize content. For both novice and experienced analysts, this practical session will also illustrate how to build and test Boolean-based rule logic, as well as some basic and necessary management procedures for maintaining rules.
Level: Appropriate for all levels of knowledge and experience

How to Transform Your Organization Into An Analytical Enterprise

Aiman Zeid, Senior Manager, Management Consulting, SAS

As the value of business insight is becoming a cornerstone of successful business strategies, organizations are striving to transform their environments to use analytics to produce accurate and timely business insight. Various approaches ranging from the adoption of powerful analytical technologies to re-engineering business processes have been used with different levels of success and adoption. This presentation will analyze the technical and organizational challenges companies must address to evaluate their current analytical capabilities and maturity level. It will also present a comprehensive methodology to develop a strategy and a road map to promote the use of analytics across the enterprise to support the organization's business objectives.
Level: Appropriate for all levels of knowledge and experience

IT Skills Gap: Where Are All the Women?

Sue Talley, Dean of Technology, Capella University, School of Business and Technology

Tech icons like Google and Yahoo have been making big media headlines recently - but not for their latest tech innovations. It's for the lack of diversity in their IT workforce. According to CompTIA, only 24 percent of US IT professionals are female, a figure on a downward trend. To meet the diversity challenge and the emerging IT needs in our country, women will be essential as the agents of change to transform the status quo of a male-dominated field.

Capella University Dean of Technology Sue Talley will discuss the lack of women in the IT field and the role of higher education in attracting and retaining women.

Level: Appropriate for all levels of knowledge and experience

Methods That Matter: New Directions in SAS/STAT Software

Bob Rodriguez, Senior Director, Research and Development, SAS

SAS/STAT software is expanding in response to emerging statistical needs in diverse areas such as business analytics, government statistics and clinical trials. This presentation introduces you to key enhancements in SAS/STAT 13.1 and 13.2, emphasizing the practical motivation for novel methods and models, the problems they solve and the benefits they offer. You will gain insights about new tools for building predictive models with generalized linear models, quantile regression, and multivariate adaptive regression splines; Bayesian discrete choice modeling; analysis of missing data; time-to-event analysis with missing data and competing risks; and item response models.
Level: Intermediate

Modeling Donations, Costs and Counts Using Two-Part Models

Laura Ring Kapitula, Assistant Professor, Grand Valley State University

There are many situations where an outcome of interest has a large number of zero outcomes and a group of non-zero outcomes that are discrete or highly skewed. For example, in modeling health care costs, some patients will have zero costs, and the distribution of positive costs will often be extremely right-skewed. In the analysis of count data, there are also times where there are more zeros than would be expected using standard methodology, or cases where the zeros might differ substantially than the non-zeros, such as number of cavities for a dental patient or number of children born to a mother. If data have such structure and ordinary least squares methods are used, then predictions and estimation may be very inaccurate. The two-part model gives us a flexible modeling framework, and it can be used for longitudinal data or for independent data. In a two-part model, the probability of a positive outcome is modeled, and another model is used to model the positive outcomes. Methodology will be demonstrated using SAS PROC GENMOD and SAS PROC GLIMMIX, but the ideas are generalizable to other statistical software.
Level: Intermediate

Unstructured Data Analysis: Real-World Applications and Business Case Studies

Goutam Chakraborty, Director of the Graduate Certificate in Business Data Mining and Professor of Marketing, Oklahoma State University

Murali Pagolu, Business Analytics Consultant, SAS

In recent years, decision makers have shown great interest in incorporating unstructured data into their decision making. Insights derived from analyzing textual information can help business managers make better decisions. Call center logs, emails, documents, web articles, blog posts, tweets, Facebook posts, YouTube video comments, and customer product reviews are prominent sources of unstructured data encompassing both internal and external sources. It is impossible for anyone to manually sift through large volumes of text data and analyze them, let alone use the results to make strategic decisions. Needless to say, you need sophisticated tools developed based on scientific methodologies that can help detect anomalies, capture trends and analyze patterns in the data. In this presentation, we discuss how to extract, standardize, organize and analyze textual data for extracting insightful customer intelligence from various data sources. Using SAS Text Analytics tools, we showcase some of the successful implementations of SAS Text Analytics solutions and other application areas along with case studies. While SAS products are used as tools for demonstration only, the topics and theories covered are generic (not tool-specific).
Level: Appropriate for all levels of knowledge and experience

This list was built using SAS software.