SAS® GLOBAL FORUM 2019
SAS Global Forum 2018 offered an array of educational and inspiring sessions. The best part? You can still access the session proceedings right here. To easily find what you’re looking for, we have a variety of ways you can sort the content (see tabs below).
A 'Big Red Button' for SAS® Administrators: Myth or Reality?
Session 1828Over the years, many attempts have been made to develop a solution that can make life for SAS® Administrators easier. Some were more successful than others, though none reached the point at which the process is fully automated, simple, efficient, fault-tolerant, and manageable. This paper explores the subject of so called Big Red Buttonone solution that can make life for SAS Administrators as easy as clicking one button. Starting with software provisioning through installation and configuration to support and maintenance, this paper looks at every step of the SAS Platform lifecycle, highlighting challenges and trying to find optimal solutions that link them all together. With virtualisation, automation, and cloudification on the rise, in recent years the possibility of creating such a solution has finally become a reality. Automation frameworks allow for unattended, easily scalable installation processes, cloudification provides an expandable and manageable infrastructure, and virtualisation helps to deliver software to clients through various channels. All these glued together with orchestration tools form the basis for a potential Big Red Button. So, is it still a myth or can it be done now?
Sergey Iglov, SASIT Limited
A Basic Introduction to SASPy and Jupyter Notebooks
Session 2822With the recent introduction of the official SASPy package, it is now trivial to incorporate SAS® into workflows, leveraging the simple yet presentationally elegant Jupyter Notebook environment along with the broader Python data science ecosystem that comes with it. This paper and presentation provides an overview of Jupyter Notebooks for the uninitiated, along with the initial steps and options for connecting to your SAS instance using SASPy. Included along the way are the general principles of passing data between Pythons DataFrames and SAS data sets, as well as the unique advantages SAS brings to the Notebook workspace. As an occasional contributor to the SASPy library, the presenter also has the unique opportunity to briefly cover the nature of its open-source release and the simple steps you can take to help improve the project as it evolves.
Jason Phillips, The University of Alabama
A Case Study of Mining Social Media Data for Disaster Relief: Hurricane Irma
Session 2695In the wake of two recent hurricanes, Harvey and Irma, local, state, and federal governments are trying to provide relief to the millions of affected people. With projected property damage in the hundreds of billions of dollars, these recent natural disasters will have long-lasting effects on their respective areas, and recovery in some areas hit by the storms could take several years. This paper aims to use social media data, specifically Twitter, to analyze how people in the affected areas reacted to these natural disasters in the days leading up to the storm, during the storm, and after the storm. The goal is for first responders to be better prepared when rescue efforts begin after a massive storm such as a hurricane. For the most recent hurricane, Irma, we collected tweets in specified locations in South Florida and searched for specific terms to identify the various needs of the civilians in different cities. Data was collected from Thursday (9/7/2017) to Wednesday (9/13/2017) (with the hurricane making landfall on Saturday night). We used SAS® Enterprise Miner(tm) for the analysis of the tweets. Techniques such as stemming and lemmatization of words were used in the pre-processing of the text data. Topic modeling, text clustering, and time series are combined to better understand peoples reactions throughout a storm event. This analysis was performed at daily and hourly levels.
Bogdan Gadidov, Kennesaw State University
Linh Le, Kennesaw State University
A Cautionary Tale of Using Machine Learning within Credit Risk in the Utilities Industry
Session 2601Machine learning is a scientific area that is constantly growing and that is enriched with evermore sophisticated and powerful algorithms. The question that arises is what is actually valuable in the business world and what algorithms can be applied in a business environment such as Credit Risk, which is heavily regulated and for which interpretation of results constitutes a top priority? We address these issues by showcasing a real-life business case and the algorithms that have been used to provide the most accurate results, along with their advantages and disadvantages. In addition to providing an overview of the most popular machine learning algorithms that we use in SAS® Enterprise Miner(tm), we compare the algorithms results with the incumbent methodology used within Credit Risk, namely Weight of Evidence logistic regression. We identify the benefits that complex algorithms can bring to businesses, as well as the business cases in which they can and should be implemented.
Spiros Potamitis, Centrica
Paul Malley, Centrica
A Fraud Management Solution for Middle Market Banks and Ways to Reduce False Positives
Session 2525Financial institutions generate enormous amounts of transaction data each day. The pressure on compliance and the need for quick detection of fraud continues to increase. The same institutions need to reduce losses from penalties and fraud as a consequence. The challenge lies in how best to use a select set of rules coupled with modelingusing data science and machine learning techniques to address this challenge. Suspicious transactions should be flagged with minimal false positives. The process also should maximize productivity and create a degree of seamlessness in both alert creation and investigations. Once compliance and fraud are both addressed, further analysis of customer and transaction data might be performed to gain insights into customer behavior. Such an approach can achieve the following goals: a) Reduce false positives, achieve cost benefits. This outcome also maintains customer satisfaction as excessive false alerts cause customer attrition for banks, in addition to reputational damage; b) An ability to create new rules and thus be ahead of the game with respect to fraudsters. Rules can get outdated quickly, so tweaking thresholds and modifying rules is much needed; c) Create an end-to-end process from alert generation to case management to reporting; and d) Create a closed loop system so that data about true fraud can be fed back into the source data for corrective modeling.
Gourish Hosangady, Aithent
A General SAS® Macro to Implement Optimal N:1 Propensity Score Matching within a Maximum Radius
Session 1783A propensity score is the probability that an individual will be assigned to a condition or group, given a set of baseline covariates when the assignment is made. For example, the type of drug treatment given to a patient in a real-world setting might be non-randomly based on the patients age, gender, geographic location, and socioeconomic status when the drug is prescribed. Propensity scores are used in many different types of observational studies to reduce selection bias. Subjects assigned to different groups are matched based on these propensity score probabilities, rather than matched based on the values of individual covariates. Although the underlying statistical theory behind the use of propensity scores is complex, implementing propensity score matching with SAS® is relatively straightforward. An output data set of each subjects propensity score can be generated with SAS using the LOGISTIC procedure. And, a generalized SAS macro can generate optimized N:1 propensity score matching of subjects assigned to different groups using the radius method. Matching can be optimized either for the number of matches within the maximum allowable radius or by the closeness of the matches within the radius. This presentation provides the general PROC LOGISTIC syntax to generate propensity scores, provides an overview of different propensity score matching techniques, and discusses how to use the SAS macro for optimized propensity score matching using the radius method.
Kathy Fraeman, Evidera
A Macro for Ensuring Data Integrity When Converting SAS® Data sets
Session 1778This paper describes the %COPY_TO_NEW_ENCODING macro, which ensures that a SAS® data set does not experience data loss when being copied to a new encoding. The focus is on the FORMAT procedures CNTLOUT data sets and the problems that can occur when converting from single-byte encoding like Windows Latin1 to multi-byte encoding like UTF-8.
Richard Langston, SAS
A Macro for Last Observation Carried Forward
This paper is about an application on the last observation carried forward (LOCF) method [1,2,3]. The last nonmissing observed value is used to fill in missing values at a later time point when data is in longitudinal structure. There are two simple loops inside this macro. The k-loop is for parameter variables, and the i-loop is for records continuously missing per parameter.
Jonson Jiang, inVentiv Health
A Macro That Creates U.S. Census Tracts Keyhole Markup Language Files for Google Map Use
Session 2418This paper introduces a macro that can generate the Keyhole Markup Language (KML) files for U.S. census tracts. The generated KML files can be used directly by Google Maps to add customized census tracts layers with user-defined colors and transparencies. When someone clicks on the census tracts layers in Google Maps, customized information is shown. To use the macro, the user needs to prepare only a simple SAS® input data set and download the related KML files from the U.S. census Bureau. The paper includes all the SAS code for the macro and provides examples that show you how to use the macro as well as how to display the KML files in Google Maps.
Ting Sa, Cincinnati Children's Hospital Medical Center
A Need For Speed: Loading Data via the Cloud
Session 1733The value of the effective use of data is universally accepted, and analytical analysis methods such as machine learning make it possible to quickly and automatically produce models that can analyze bigger, more complex data and deliver faster, more accurate results. However, before any such value can be realized, the data must be collected, moved, cleansed, transformed, and stored as efficiently and quickly as possible. SAS® Viya® not only addresses complex analytical challenges but can also be used to speed up data management processes. In this paper, we look at cloud infrastructure, code enhancements, and storage technologies that help you achieve this goal.
Hadley Christoffels, Sopra Steria
A One-Sided Fisher's Exact Test: A Tail of Clinical Worsening
In many disease areas, therapeutic intervention is designed to prevent clinical worsening of the disease, thereby increasing the quality and length of life for patients with life-threatening diseases. While this research hypothesis might be simple to state in non-statistical terms, there are many endpoints that can be used to provide insight to this question. For diseases that rapidly progress, intervention of a new therapy should delay clinical progression for the drug to be deemed efficacious. This suggests that the proportion of patients who experience clinical worsening while on an active study drug should be lower than the proportion experiencing clinical worsening in a control group. This presentation assumes the use of a Fishers Exact Test. The focus is on the structure of the input data set used with the FREQUENCY procedure to obtain the inferential statistics. In a one-sided setting, the structure and order of the treatment and response variables is essential to understanding whether a right-tailed or a left-tailed p-value is appropriate so that it is consistent with the underlying research hypotheses. This presentation uses a placebo-controlled clinical study to show how the order of the treatment and response variables impacts the statistical hypotheses associated with a one-sided Fishers Exact Test. While the hypotheses themselves might differ, the research question remains the same: Does the study drug reduce the incidence of clinical worsening?
Bill Coar, Axio Research
A Periodic Table of Introductory SAS® ODS Graphics Examples
This e-poster presents A Periodic Table of Introductory SAS® ODS Graphics Examples, the authors tongue-in-cheek take on Towards A Periodic Table of Visualization Methods for Management, the classic 2007 data visualization paper. The poster presents 200+ thumbnails of slides from the authors introductory SAS Output Delivery System (ODS) Graphics hands-on workshop. When you hold your pointer over a thumbnail, it expands to more fully reveal the SAS code snippets and output images for the programming exercises. Click on an image, and it expands to fill the entire monitor. You might come to be amused, but don't be surprised if you leave with a new SAS ODS Graphics trick or two!
A Programming Approach to Implementing SAS® Metadata-Bound Libraries for SAS® Data Set Encryption
Session 2181SAS® metadata-bound libraries provide a robust means of protecting SAS data without compromising the accessibility of the secured data. Although seamless access to metadata-bound data controlled through metadata authorizations is an advantage of using metadata-bound libraries, their implementation and maintenance can potentially be a cumbersome task for administrators. Administrators face complexities in maintaining and managing metadata-bound libraries if multiple users create and own SAS data. Proper planning is required to handle the complexities around heterogeneous directory ownerships, password and encryption key management, and monitoring of unencrypted SAS data sets. This paper explains those complexities and provides best practice recommendations for both administrators and SAS programmers working with SAS data. The paper provides code that can be used to empower SAS programmers to define metadata-bound libraries through a controlled process. Code is also provided for administrators to monitor and report on unencrypted SAS data. In addition, the paper explains the nuances of identity resolution and effective authorizations when accessing metadata-bound data through SAS/SHARE® libraries. It then provides a step-by-step approach for creating and managing the binding of a SAS library that is accessed through a SAS/SHARE server.
Deepali Rai, SAS
A Quick Bite on SAS® Studio Custom Tasks
Session 2626Custom tasks in SAS® Studio help the user generate reports in a point-and-click user interface (UI). As a developer, you will need to create custom tasks for programmers who are not familiar with SAS® or who are novices. After you create a task, you might want to share it with other users at your site. Tasks are saved as CTM files. You might also want to share CTK files, which are CTM files with some of the roles and options pre-selected. You can share the file through email or by just placing it in the share area. The user simply runs the CTM file, and it generates the UI. As the user navigates through the UI, the SAS code is generated simultaneously, which helps the user review the code as it relates to the UI. The task framework is flexible. All tasks use the same common task model and the Apache Velocity Template language. A task consists of these task elements: Registration, Metadata, UI, Dependencies, Requirements, and Code Template. We use all these elements to build our example custom task. Once the example custom task is built, we can execute and review the output (which is the user interface and the SAS code). Within the UI are two tabs, Data and Options. On the Data tab, we can choose the variables and sort variables. On the Options tab, we can limit the number of observations on which to base the output. After attending this presentation, a SAS developer can build and share their first custom task.
Michael Kola, SalientCRGT
A Quick Guide for Your Data Load Problems in SAS® Visual Analytics
Session 2796SAS® Visual Analytics is a user-friendly yet powerful and intuitive visualization tool that you can use for data exploration and reporting. However, before you can work with any data, that data has to be made available to the SAS Visual Analytics environment. This paper introduces you to the high-level architecture of SAS Visual Analytics, the connection setup between the SAS Visual Analytics servers, the kinds of data sources can be accessed, and the different ways of extracting data into SAS Visual Analytics. Most importantly, this presentation addresses the most common errors encountered when loading data into SAS Visual Analytics. This paper also discusses useful options for automating the start of SAS® LASR(tm) Analytic servers, loading and refreshing SAS Visual Analytics tables, and directing signature files. Finally, the paper includes a checkpoint list that can be used to ensure that the appropriate permissions are in place at the metadata and operating system levels to perform data load operations effortlessly. The techniques and troubleshooting methods discussed in this paper are based on the distributed SAS Visual Analytics installation on a UNIX framework, but methodologies discussed here can be useful for troubleshooting common SAS Visual Analytics issues on any environment.
Swetha Vuppalanchi, Iconsoft Inc.
A SAS® Macro Implementation of a Test for Co-directional Interactions
Session 2389In two-factor factorial experiments, caution is often advised in interpreting the main effects when the interaction term is significant. However, when the interaction is co-directional relative to a particular factor, the main effect test for that factor can be considered. Gerard and Sharp (2011) developed a formal method for testing for co-directional interactions based on union-intersection and intersection-union techniques. A SAS® macro was developed for the implementation of this method in SAS. This presentation uses examples to demonstrate the method and the macro. The macro can be called within or outside of the GLM procedure, and it has four required and two optional input parameters. Depending on the users request, the macro provides output that includes a brief ANOVA table that incorporates the test of the co-directional interaction for each factor or a detailed analysis including estimates of simple effects, corresponding standard errors, test statistics, and p-values for each 2x2 factorial arrangement of the two factors.
Julia Sharp, Colorado State University
Patrick Gerard, Clemson University
Bereket Tesfaldet, FDA
A SAS® Macro That Uses PRX Functions to Verify Delimited Text File Formatting
Session 2605The task of importing delimited text files can often seem like a black box process. If the resulting data set contains errors, how do you know? Some errors are easy to spot, such as unspecified extra fields showing up in the data set. Other errors, like data ending up in the wrong field, can be more difficult to identify, especially when they occur well into the resulting data set. Fortunately, every correctly formatted, delimited text file complies with one of two formats, each containing its own text pattern. If every cell of the delimited text file matches the text pattern, then the file is said to be in that format. The text-qualified format is the more complex of the two and the one that this paper concentrates on. The non-text-qualified format is very simple and receives only minor mentioning. This paper introduces a SAS® macro that uses SAS PRX functions to verify that a delimited text file is in text-qualified format. If the macro reports that it is, then the file can be imported without error.
Paul Genovesi, State of Washington/DSHS/RDA
A SAS® Program for a Model-Based Stratification Method
Session 2574Some important factors, such as baseline characteristics, are not always balanced between comparison groups in a clinical study. The imbalance often leads to biased estimations. This paper presents a SAS® program that was developed for a new approach that combines both simple stratification and model-based covariate adjustment methods; that is, the approach adjusts unbalanced factors using propensity scores and stratifies the factor by subgroup information when interaction between the stratified factor and the treatment is absent. The treatment effect coefficients estimated from covariate adjusted regression models for each stratum are combined using a weighted method based on either the sample size of each treatment group for each stratum or the inverse square of standard error for each stratum coefficient. When interaction is present, subgroup analysis results are reported. The analytical approach is illustrated using different statistical models, including analysis of covariance, survival analysis, and logistic regressions based on clinical data. The potential biased estimation and testing efficiency in the imbalance adjustment are evaluated based on different types of clinical outcomes.
Xiaoli Lu, VA CSPCC
A SAS® Visual Analytics Solution for the Centers for Medicare and Medicaid Services
Session 2700Government agencies have estimated that opioids now kill more Americans than car accidents. In this e-poster session, attendees learn how to use SAS® Studio, SAS® Visual Analytics, and SAS® Visual Statistics to quickly prototype SAS® solutions to better understand the opioid crisis in America for Medicare programs that provide prescription drugs. Using U.S. Federal Government Public Use Files or PUFs, attendees are led through the process of accessing PUF data using APIs, data explorations, clustering and machine learning models, and simple reporting in order to gain insights into this pressing government challenge. Customers who use SAS for population health analytics will find this session particularly useful since it makes extensive use of the Center for Medicare and Medicaid Services (CMS), the Centers for Disease Control and Prevention (CDC), and data from the U.S. Census Bureau. Handouts with step-by-step instructions are provided so that attendees can reproduce the analysis with PUF data on their own and even incorporate it as part of their own work. As a result of attending this session, attendees will gain a better understanding of the opioid epidemic, as well as a clear sense of how prototypes built with SAS can improve the overall quality of a solution.
Raymond Mierwald, Visual Connections, LLC
Shanika Palm, Visual Connections, LLC
Jonathan Mccopp, SAS
Manuel Figallo, SAS
A Simple Approach to Text Analysis Using SAS® Functions
Session 2557Analysts increasingly rely on unstructured text data for decision making more than ever before. Text data mining is a process of deriving actionable insights from a lake of texts. It discovers unseen patterns of words in data, or known words or textual patterns in undetected records in databases. SAS® has its own dedicated text mining tools such as SAS® Text Miner. However, their use by general users is precluded by affordability and availability. We developed a simplified but robust approach for text analysis using a combination of three simple SAS string functions, namely Index, IndexW, and SoundeX in the Base SAS® macro environment. The application was further condensed to a few lines of SAS code and tested with several large-scale data sets in epidemiological studies to ascertain its applicability, reliability, and scalability. This paper explains our methodology and discusses its applicability in diagnosing symptom-based disease.
Wilson Suraweera, Centre for Global Health Research, LKSKI, St Michael's Hospital, University of Toronto
Jaya Weerasooriya, Ministry of Health and Long Term Care, Government of Ontario
Neil Fernando, Canadian Imperial Bank of Commerce
A Simple Methodology for Customer Classification in Two Dimensions
This paper shows a simple way for customer classification in two dimensions. Several variables were used to create only two major characteristics (customer attractiveness and profitability), and then it was possible to identify potential customers for grant credit. This methodology basically uses the REG, GPLOT, and GINSIDE procedures and some DATA steps, enabling the visualization of the results in a simple scatterplot.
Adriana Mara Guedes Barbosa, Caixa Economica Federal
Alan Ricardo Da Silva, University of Brasilia
A Study of Modeling Approaches for Predicting Dropout in a Business College
Graduation rates are of interest for stakeholders of higher education, including educational researchers and policy makers. It is widely acknowledged that retention rates are driving graduation rates. This study explores the use of predictive analytics in an academic institution for improving retention rates using data from the Louisiana State Business College as an example. The study uses SAS® Enterprise Miner(tm) to build predictive models for identifying the students at risk of dropping out at different stages in their program. The institutional administrators can use the results from the models to identify students who need advising and remedial actions in order to help retain students and lead them to graduation. Preliminary findings from the study show that an ensemble model is the best for predicting student dropout. In addition, the predictive models can be further improved by collecting additional information about behavioral issues and study habits during the first year. Besides the practical implication, this study also shows the effectiveness of the analytics tools in improving graduation rates.
Helmut Schneider, Louisiana State University
Xuan Wang, Louisiana State University
A Survey of Some of the Most Useful SAS® Functions
SAS® functions provide amazing power to your DATA step programming. Some of these functions are essential, and some of them save you writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions might be new to you, and they will change the way you program and approach common programming tasks. The majority of the functions described in this talk work with character data. There are functions that search for strings, and others that can find and replace strings or join strings together. Still others can measure the spelling distance between two strings (useful for fuzzy matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest or smallest value? A knowledge of the functions described here will make you a much better SAS programmer.
Ron Cody, Camp Verde Associates
A Transition from SAS® on a PC to SAS on Linux: Dealing with Microsoft Excel and Microsoft Access
Session 2753Transitioning from SAS® installed on a PC running Microsoft Windows to a SAS® Grid environment running on Linux presents many challenges for analysts, programmers, and IT professionals alike. Aside from navigating the idiosyncrasies of each operating system, like case sensitivity and the direction of slashes in file pathnames, there arise more significant issues such as handling data from commonly used Microsoft Office products. This discussion focuses on the challenges of importing and exporting data files from Microsoft Excel and Microsoft Access between SAS software on Windows and SAS Grid software (specifically SAS® Enterprise Guide® and SAS® Studio) on Linux. The talk outlines the difficulties encountered and the solutions proposed in order to provide tips for other SAS programmers who must make this type of transition between SAS environments.
Jesse Speer, RTI International
Accelerate IoT Insights at the Intelligent Edge
Session 1993The Internet of Things (IoT) unlocks efficiencies and innovation from connected sensors, assets, and devices. Organizations now have access to both immediacy and depth of analysis by using edge analytics. Join Mark Barnum to see how Hewlett Packard Enterprise enables instantaneous, secure access to analytics for real-time decision-making, immediate action, and greater device control.
Mark Barnum, Hewlett Packard Enterprise
Accelerate Your Analytics with SAS® and Teradata Using Disparate Data Sources
Analytics today often involves working with multiple data types from multiple storage types, including traditional relational database management systems (RDBMSs) such as Teradata, Oracle, DB2, Microsoft SQL Server, and MySql, as well as file-system-type storage such as Apache Hadoop and Amazon Simple Storage Service (Amazon S3), as well as NoSQL sources such as MongoDB and Cassandra. Sourcing the data from a federated data layer brings its own share of issues, such as having to know all the details for every data platform (IP address, port numbers, logon details, data access mechanism, data query languages, and so on). Other drawbacks of having a federated data space are that often the data needs to be replicated and stored (using up valuable disc space), and you might no longer be able to leverage processes to speed up your analytics (such as in-database or on-platform processing). In this presentation, we present a solution that addresses all these issues. Teradata QueryGrid combines the most comprehensive in-database solution from SAS with the Teradata RDBMS. With Teradata QueryGrid, you can access data from a wide variety of data sources using a common language (SQL), abstracting away the connection details so that you don't need to know the gritty connection details, all while using the tremendous performance of SAS® running inside the Teradata database
Paul Segal, Teradata
Rosanne Sinatra, Teradata
Accelerate your Data Preparation with SAS® Code Accelerator
Session 2635SAS® In-Database Code Accelerator enables DS2 code to execute inside the database without translation to another language (such as SQL). This enables your data preparation steps to be dramatically accelerated, as you can now make use of the multi-threading capabilities in a massively parallel architected platform (such as the Teradata relational database management system [RDBMS] or the Apache Hadoop platform). In this short presentation, we introduce those of you unfamiliar with DS2 to the new features as well as demonstrate how performant it can be by running a live demonstration on the Teradata RDBMS.
Paul Segal, Teradata
Accelerate your End-to-End Enterprise Decisions with SAS® Viya®
Session 2298Do you need to add speed to your enterprise decision systems? SAS® Viya® to the rescue. This paper guides you through the entire process of building and deploying decisions systems that integrate advanced analytical models with precise business rules, using software available only on SAS Viya. This software includes SAS® Visual Data Mining and Machine Learning, SAS® Model Manager, and SAS® Decision Manager. Decision processes can be deployed to the SAS® Cloud Analytic Services Server, Teradata and Apache Hadoop databases, SAS® Micro Analytic Service for on-demand processing, and SAS® Event Stream Manager for integration into high-speed data streams.
David Duling, SAS
Steve Sparano, SAS
Accessibility and ODS Graphics: Seven Simple Steps to Section 508 Compliance Using SAS® 9.4M5
How do you create data visualizations that comply with the Section 508 amendment to the United States Workforce Rehabilitation Act, the Web Content Accessibility Guidelines (WCAG), and other accessibility standards? Its easy when you use the new Output Delivery System (ODS) Graphics accessibility features in SAS® 9.4M5. This paper defines seven simple steps to accessible data visualizations. The accessibility requirements that are satisfied by each step are explained and additional references are provided. It includes sample code for real-world examples that has been tested by the SAS® accessibility team. It also includes a handy one-page checklist that you can print separately for future reference.
Ed Summers, SAS
Julianna Langston, SAS
Dan Heath, SAS
Accessing Data from Microservices in SAS® 9.4 Using DS2 Built-In Packages
Session 2024This session is a short instruction on how, as of SAS® 9.4M3, to use the DS2 built-in packages for HTTP get and post operations, JSON parsing, and hash mapping to access data from a microservice. The example code details how to get an HTTP response from a service, parse the JSON returned, normalize the values into data rows, and then use the SAS® DS2 hash package to write those rows into a data set for further processing.
James Kelley, SAS
Administering the Administrators: Who Watches the Watchers?
The SAS® Platform administrators challenge: how best to manage the myriad of different ways SAS® users use a SAS Platform? If that challenge could be sharedwith power users, project teams, business leads, and so onthen those users are empowered. But the challenge then becomes how to administer them to ensure that they have the ability to manage their patch while not adversely impacting the remainder of the SAS Platform. Other examples include: Implementing and managing a traditional Dev-Test-Prod lifecycle Managing different SAS solutions (SAS® Credit Risk for Banking, SAS® Fraud Detection, and so on) Granting different business teams self-management of their sandpit (SAS® Visual Analytics, SAS® LASR(tm) Analytic Servers, SASWORK space) but protect critical processes There are benefits and consequences of a centralized management strategy, as well as of a delegated management strategy. So, how best to set up the management of these platforms? This paper covers managing multiple environments, both on separate and shared infrastructuresa scenario familiar to most SAS Platform administrators. However, the focus of this paper is to demonstrate how to share management of a single SAS Platform, and the tools (SAS® Management Console, SAS® Visual Analytics Administration, and so on) and SAS concepts (Access Control Templates, and so on) that make this possible.
Andrew Howell, ANJ Solutions
Adopt A Pet (Elephant?): Are You Enjoying Your Apache Hadoop Investment?
Session 1684Hundreds of companies have embraced Apache Hadoop as a component of their data storage and analytics strategy. But they have yet to realize the return on investment (ROI). Many have been focused on the savings from a move to cheaper hardware and software infrastructure. Others thought that having large volumes of unstructured distributed data would lend itself to new and innovative ways to analyze that data. Their hope is to drive out valuable business insights, resulting in better decisions for the company. Seriously, have companies really just resorted to an adopt-a-pet strategy to save a few bucks? What are you doing to achieve full adoption? Come and learn the path to success!
Rex Pruitt, SAS
Advanced Graphs Using Axis Tables
Session 2180A key feature of the graphs that are used for analysis data or for clinical research is the inclusion of textual data in the graph, usually aligned with the X or Y axis. The axis table statements that are available with the SGPLOT procedure make it easy to add such data to the graphs. You can also use axis tables for creating custom axes, multiple axes, and even multidimensional hierarchical axes. This presentation describes how to use axis tables to create complex graphs.
Sanjay Matange, SAS
Advanced Programming Techniques Using the DS2 Procedure
DS2 is a SAS® proprietary programming language appropriate for advanced data manipulation. In this paper, we explore the advantages of using PROC DS2 over DATA step programming in SAS. We explore various programming techniques within PROC DS2 to resolve real-world SAS programming problems such as the use of hash objects for performance improvement, support for calls to FCMP user-defined functions and subroutines within DS2, creation and parsing of JSON text using DS2, and the use of powerful and flexible matrix programming capability within DS2. We explore executing queries in-database using PROC FEDSQL and the use of embedded FedSQL to generate queries during run time to exchange data interactively between DS2 and supported databases. This enables processing data from multiple tables in different databases within the same query, thereby drastically reducing processing times and improving performance. We explore the use of DS2 for creating, bulk loading, manipulating, and querying tables in an efficient manner. We compare and contrast traditional SAS vs. DS2 programming techniques and methodologies and show how certain programming tasks can be accomplished using PROC DS2 at the cost of slightly higher added complexity to the code but with huge performance benefits. We also address when it makes the most sense to use DS2 and perform performance benchmarking between using traditional programming techniques vs. PROC DS2 to perform statistically intensive calculations in SAS.
Viraj Kumbhakarna, MUFG Union Bank
Algorithmic Marketing Attribution and Conversion Journey Analysis
Session 2111Everyone has a marketing attribution problem, and all attribution measurement methods are wrong. We hear that all the time. Like all urban myths, they are founded on truth. Most organizations believe they can do better on attribution. They all perceive gaps, for example, missing touchpoint data, multiple identities across devices, arbitrary decisions on weightings for rules, and uncertainty on what actions to drive from results. Broadly speaking, the holy grail of media measurement is to analyze the impact and business value of all company-generated marketing interactions across the complex customer journey. Our goal is to take a transparent approach in discussing and demonstrating how SAS® is building data-driven marketing technology to help progress past typical attribution methods, and make the business case for customer journey optimization. Being SAS, we advocate an analytic approach to addressing the operational and process-related obstacles that we commonly hear from customers. We want to treat them as two sides of the same coin. The output of attribution analytics informs marketers about what touchpoints and sequence of activities drive conversion. This leads marketers to make strategic decisions about future investment levels, as well as more tactical decisions about what activities to run. In an ideal world, the results of subsequent actions are fed back into the attribution model to increase not only its explanatory power, but also its predictive abilities.
Suneel Grover, SAS
Malcolm Lightbody, SAS
Alternative Variance Parameterizations in Count Data Models
Session 2694SAS/STAT® and SAS/ETS® software have several procedures for working with count data based on the Poisson or negative binomial distributions. In particular, the GENMOD and GLIMMIX procedures offer the most conventional approaches for estimating model coefficients and assessing goodness of fit, and also for working with correlated data. In addition, the COUNTREG procedure includes the Conway-Maxwell-Poisson distribution with a statement to model dispersion and the negative binomial distribution with two different variance functions. The FMM procedure includes the generalized Poisson distribution, which accounts for overdispersion in count data. However, the ability of SAS® to model count data with other distributions, and in particular to model the dispersion parameter (and, as a result, the variance function), can be enhanced with programming statements entered into the NLMIXED procedure.
Robin High, University of Nebraska Medical Center
An Easier and Faster Way to Untranspose a Wide File
Although the TRANSPOSE procedure is an extremely powerful tool for making long files wide and wide files less wide or long, getting it to do what you need often involves a lot of time, effort, and a substantial knowledge of SAS® functions and DATA step processing. This is especially true when you have to untranspose a wide file that contains both character and numeric variables. And, while the procedure usually seamlessly handles variable types, lengths, and formats, it doesn't always do that and just creates a system variable (that is, _label_) to capture variable labels. This paper introduces a macro that simplifies the process, significantly reduces the amount of coding and programming skills needed (thus reducing the likelihood of producing the wrong result), runs up to 50 or more times faster than the multiple PROC TRANSPOSE and DATA steps that would otherwise be needed, and creates untransposed variables that inherit all of the original variables characteristics.
Joe Matise, NORC at the University of Chicago
Arthur Tabachneck, Analyst Finder, Inc.
Matthew Kastin, NORC at the University of Chicago
Gerhard Svolba, SAS
An Ingenuity Journey to Set ROI and Benefits Tracking
It is becoming increasing competitive within organizations to prioritize capital and operational funds for new projects. Retailers and wholesalers invest heavily in new technology, people, process, and data, with goals to drive business value. As business owners and project sponsors, it is our responsibility to create the business case for project funding, and, equally important, to update the organization on the ROI and value of the software investment along the way and as an ongoing future cadence. For many, the challenges are: How do we get started? How do we measure effectively? How do we operationalize the process to be efficient and repeatable? In this presentation, we share with you our ingenuity to measure value and ROI results. It starts with creating a solid business justification case that aligns with the business opportunity to present to the Executive Leadership Team for project prioritization and funding. Next, its about identifying and aligning both the Business and Executive teams with the key metrics, time periods, and comparative to measure the value. Once the details are worked through, the next step is to operationalize the process and communicate the results in an established cadence. Join us to learn how we embarked on this essential journey to plan, analyze, and operationalize benefit tracking and ROI to gain company-wide adoption and yield game changing results!
John Jarrett, Academy Sports
An Insider's Guide to SAS/ACCESS® Software on SAS® 9.4M5 in the Cloud
Session 1838You might not have data in the cloud today, but chances are you will in the very near future. Are you ready? Do you have any idea where to start? If no is the answer to either of these questions, then this presentation is just what you are looking for. We answer the following questions (and more): What are the steps involved in creating a database in a cloud environment? What is Database as a Service (DBaaS)? How does SAS® define the terms cloud variant and database variant? How do I determine whether my cloud database is supported by SAS? The presentation would not be complete without discussing the major cloud players. We discuss Amazon Web Services, Microsoft Azure, Oracle Cloud, Teradata IntelliCloud, and Google Cloud Platform.
Jeff Bailey, SAS
An Introduction to Clustering Techniques
Session 2615Cluster analysis has been used in a wide variety of fields such as marketing, social science, biology, pattern recognition, and so on. It is used to identify homogenous groups of cases in order to better understand characteristics in each group. There are two major types of clusteringsupervised and unsupervised. Unlike supervised clustering, unsupervised clustering means that data is assigned to segments without the clusters being known. Furthermore, it refers to partitioning a set of objects into groups; the objects within the group are as similar as possible but, at the same time, the objects between each group are as dissimilar as possible. This paper provides a compressive overview of multiple techniques for unsupervised clustering analysis, including traditional data mining and machine learning approaches and statistical model approaches. Hierarchical clustering, K-means clustering, and Hybrid clustering are three common data mining and machine learning methods used in big data sets, whereas latent cluster analysis is a statistical model-based approach and is becoming more and more popular. This paper also introduces other approaches: the nonparametric clustering method is suitable when the data has an irregular shape and fuzzy clustering (Q-technique) can be applied to data with relatively few cases.
Xinghe Lu, Vanguard Group
An Introduction to Parallel Processing with the Fork Transformation in SAS® Data Integration Studio
Session 2733A job in SAS® Data Integration Studio is historically a sequential process. The user designs a series of extract, transform, and load steps that move data from source systems, apply derivation logic, and load to target tables one step at a time. Even when no true dependencies exist, the nature of the job structure forces an implied requirement that the previous step complete before starting each node. With the new fork transformation, users can now run independent streams of logic in parallel within a single job, creating opportunities for improved performance and shorter runtimes.
Jeff Dyson, The Financial Risk Group
An Investigation of the Factors Associated with Opioid Misuse
The drug epidemic in America is one of the most prevalent, pressing issues the country faces today. Americas opioid crisis consumes the lives of over two million people, while costing the U.S. Economy over $504 billion. Prescription opioids such as codeine, fentanyl, and oxycodone are a few of the drugs that are being prescribed by doctors at an alarming rate. In order for government officials to be able to successfully dissuade citizens from misusing opioids, officials must identify the factors associated with misuse behavior. The primary objective of this study is to examine correlations between predictors to identify variables closely associated with prescription opioid misuse. This study used the data from the 2016 National Survey on Drug Use and Health (NSDUH). The initial set of potential predictors was selected from those believed to be related to opioid misuse and includes variables from four categories: demographic, socioeconomic, psychological, and risk behaviors. The analysis tools include logistic regression, decision trees, and data visualization to illustrate the patterns of association. The data provides evidence that a person who exhibits certain factors is more likely to misuse prescription opioids than one who does not share the same characteristics. The factors that influence the propensity to misuse opioids include mental health, age, race, cigarette use, frequency of alcohol use, frequency of marijuana use, sedative use, and smokeless tobacco use.
Claire Masson, Louisiana State University
Lauren Agrigento, Louisiana State University
Shelby Katz, Louisiana State University
An unusual remedy using the usual "nbins" option to rectify anomalous histograms in SAS
Data visualization is a strong tool for understanding the nature and distribution of collected data. A histogram is one such data visualization tool that can be used to assess normality of the data. Hence, plotting the correct histograms is important since the decision regarding additional analytical methods (parametric or nonparametric) is based on whether the data follows normal distribution. In SAS(r), different procedures, such as SGPLOT, SGPANEL, or UNIVARIATE, can be used to generate histograms. However, histograms plotted in SAS using the SGPLOT or SGPANEL procedures show an anomaly when the largest value in a set of data coincides with one of the tick points on the X axis of the histogram. This paper discusses this anomaly and suggests a remedy for solving it. This paper also suggests that the UNIVARIATE procedure can be used to validate the histograms produced using the SGPLOT and SGPANEL procedures. Furthermore, an alternative method for calculating the value to be specified for the BINSTART option in the SGPLOT and SGPANEL procedures that does not alter the histogram produced by the usual method is also suggested. All of the procedures described above were performed using SAS(r) 9.3.
Rachana Lele, University of Louisville
Analysis of At-Risk Behaviors Compared to Public Health Funding in 2011-2015 in 18-24 Year Olds
The purpose of the study was to determine whether a correlation exists between increases in public health funding levels via the Affordable Care Act (ACA) and the percentage of casual and daily smokers, binge drinking habits, body mass index (BMI), and health care coverage of individuals 1824 years of age in the Behavioral Risk Factor Surveillance System (BRFSS). Increases in funding should in theory decrease negative behaviors, and the ACA increased the amount of public health funding available to tackle negative behaviors, which were tracked via the BRFSS. We hypothesized that there should have been a decrease in negative behaviors as more funding was allotted to public health. We examined our hypothesis by analyzing changes in reported behaviors tracked in the BRFSS during the years of the ACA implementation, 20112015, in this age group. Also, public health funding data from Americas Health Ranking was merged with the 2015 BRFSS data by state, and a survey logistic regression model between the outcomes and public health funding was created to examine the association between funding levels and negative behaviors on a state level. Our trend results suggest that both smoking and binge drinking decreased by a significant factor during the years when funding increased; the number of people covered by health care plans increased; and BMI showed no significant changes. In the logistic model, decreases in smoking and BMI were associated with increases in public funding.
Jacob Baggs, Kennesaw State University
Analysis of Nokia Customer Tweets with SAS® Enterprise Miner® and SAS® Sentiment Analysis Studio
The launch of new Nokia phones has produced some significant and trending news around the globe. There has been a lot of hype and buzz around the release of new Nokia phones in the mobile market at Mobile World Congress 2017 by HMD Global. Social media provides a platform for millions of people to share or express their opinions. There has been a magnitude of responses in social media after the launch of these Nokia phones. In this paper, I analyze the overall sentiment prevailing in the social media posts. I extracted real-time data from Twitter using the Google Twitter API over a period of time and studied the responses of people. I used SAS® Enterprise Miner(tm) and SAS® Sentimental Analysis Studio to evaluate key questions regarding the launch of Nokia phones, such as understanding the needs and expectations of customers and the perceptions of people about the launch of Nokia phones. In my results, the neuro-linguistic programming (NLP) rule-based model outperformed the default statistical model in SAS Sentimental Analysis Studio for predicting sentiments in test data. The NLP rule-based model provided deeper insights than did the statistical model in understanding consumers sentiments. I plan to improve the efficiency of predicting polarity of sentiment by incorporating emoticons in sentimental mining by developing a macro that cleans the tweets by replacing with equivalent text.
Vaibhav Vanamala, Oklahoma State University
Analysis of Profitability Bank Systems in South Korea Using Base SAS®
Session 2655Banks have the most important role in the financial market. The major role of banks is to link the funds of individuals and firms. In other words, banks are lubricants that enable the domestic economy to grow smoothly in that they maximize the fluidity of finance for each person or company. However, for customers who want to deposit their own money, selecting banks must be deliberated. Customers should be conservative in their choice since the credibility of banks is directly connected to their assets. Also, customers should consider the profitability of banks when saving their money. In this regard, our projects main focus was on an index that can discriminate banks earnings, using Base SAS. Above all, our project examines the possibility that you can analyze the system approximately with only Base SAS. Actually, it is not necessarily easy to make a project using bank data, so you might be looking forward to using more complicated or sophisticated SAS programs. However, we can suggest a significant solution using just Base SAS.
Jinwoo Cho, Sung Kyun Kwan University
Dongwoo Kim, Sung Kyun Kwan University
Analysis of Unstructured Data: Topic Mining and Predictive Modeling Using Text
Eighty percent of the data generated in digital space is unstructured. While the amount of textual data is increasing rapidly, the ability to summarize and make sense of such data for making better business decisions remains challenging. This paper provides insights into how to analyze textual survey data for extracting public opinion from a huge collection of feedback forms. It also suggests how to formulate rules in predicting the opinion of a user. The data set that was analyzed was collected based on the Toronto Casino Feedback Form, which contains 17,000 records with information about open-ended questions such as why do you not support the establishment of a casino, and closed-ended questions such as those regarding age group and gender. The primary objective was to understand and predict the opinion of a user toward the establishment of a casino by considering the survey filled in by the user with unstructured data. To identify public opinion, topics were extracted using the Text Topic node in SAS® Enterprise Miner(tm). The Text Rule Builder node was used to build rules that can differentiate the opinion of a user. From the analysis, we can identify that the majority of the public worried about gambling leading to addiction, an increase in crime rate, traffic congestion, and family and social relationships. People also opined positively about new jobs being created, tourism, additional revenue generated through taxes and tourism, and an increase in entertainment options.
Ravi Teja Allaparthi, Oklahoma State University
Analytics + IoT = Happy (Utility + Customer)
Session 2267The Public Utility Regulatory Policies Act (PURPA) changed the utility world in 1978. Power generation was no longer the domain of electric utilities, but was opened to independent power producers. Bigger challenges might be imminent with the falling price of batteries the primary mover. Individual homes and businesses will add batteries to new and existing solar panels. The biggest mover, however, will be the acceleration of electric and hybrid vehicles. Large growth of electric vehicles (EVs) could affect local reliability and increase the cost of delivering electricity to customers. In many areas of the US and the world, one new EV charger can add as much demand as a new customer. Fast chargers that recharge EVs in just a few hours can add three times that demand. Analytics and the Internet of Things (IoT) must become partners to create systems that enable the customer and utility to continually communicate their status and options. This will give customers maximum flexibility to charge their EVs without increasing the utilitys costs. With a two-way flow of information, the utility will be able to take advantage of excess energy from EVs to maintain grid stability and reduce costs. Additionally, a robust system could allow utilities and customers to coordinate demand response, with customers lowering demand to fit the utilitys needs. Actions could be as simple as controlling a water heater, but could be as complex as instructing a home thermostat to change set point.
Bradley Lawson, SAS
Analytics Applications in Targeted Marketing and Forecasting Demand: A Two-Stage Model
Identifying target customers for a product or service is one of the most important steps in developing a marketing plan for any business. A large electric utility company in Oklahoma with a customer base of more than 850,000 has recently launched its solar power program. The objective of this paper is twofold. The first stage is to build a propensity model to identify customers that have a higher propensity to enroll in the company's solar power program so as to drive savings on promotional mailing. Then, the second stage builds a forecasting model to predict the solar power capacity the company would need to fulfill the associated demand. The data has around 850,000 observations and more than 60 variables that provide information about customer demographics, customers interactions with the company in the past, utility usage data, and so on. Initial analysis uses binary classification models to predict whether a customer will enroll in the program and a multiple regression model forecast model (time series to account for seasonality and cyclicity) to predict the demand.
Anurag Hardikar, Oklahoma State University
With the advent of big data and cloud technologies, the scale of models that are being used by business has increased significantly. The days of sampling the data and building models on smaller data sets to make inferences on larger populations are passing. Collaboration with IT becomes paramount in order to be able to operationalize the models so that they can be executed on large data sets in a sustainable manner and deliver business results to scale. This requires the models to be either developed differently or centralized. In this presentation, we discuss the key concepts of centralization and the features that can be incorporated to enable the models to be managed diligently, in order to achieve business outcomes to scale.
Raj Kannan, Tata Consultancy Services
Analytics in the Cloud, Cross Functional Teams, and Apache Hadoop is not a Thing
Ryan Packer, Bank of New Zealand
Analytics Like a Pro: The Differences between My Fantasy Team and Production Analytics in Sports
Session 1760The world of sports has always been a numbers game. But it has never been like this. Thanks to recent advances in technology, managers, coaches, and trainers have access to more data than ever before. Information about everything from how quickly players move around the field to how well they sleep at night can now be tracked, measured, and quantified. Many sports organizations are left with one burning question: How can we unlock the potential of this vast treasure trove of data and transform it into a real competitive advantage? There have been many stories about teams that used data to improve performance. But is it really that easy to turn data into a winning formula? The short answer is no. Operationalizing data science is complex and requires some expertise, but organizations that follow a few important guidelines can be successful. In this presentation, I talk about my experience with offering game-changing insights to sports organizations, every day aggregating multiple sources and millions of data points. I also discuss some of the major operational and organizational challenges. I also show you how getting the right data into the right hands of the right people at the right time is the key to success.
Timothy Trussell, KINDUCT
Analytics of Things: New Analytical Models for Creating Business Value from IoT Data
Session 2321The number of devices and equipment generating sensor data is rapidly increasing. To intelligently handle this data and create tangible business value requires new analytic techniques and new ways to apply them. In this session, we cover new models from SAS® related to the Analytics of Things. The session focuses on actions and procedures tailored to Internet of Things (IoT) use cases such as predictive maintenance, asset degradation, anomaly detection, and signal processing. We highlight what new models are available in SAS, their related use cases, and how they can be deployed in standard or real-time scenarios for IoT solutions that are applicable from manufacturing to health care.
Ryan Gillespie, SAS
Analyzing Text In-Stream and at the Edge
Session 1962As companies increasingly use automation for operational intelligence, they are deploying machines to read, and interpret in real time, unstructured data such as news, emails, network logs, and so on. Real-time streaming analytics maximizes data value and enables organizations to act more quickly. For example, being able to analyze unstructured text in-stream and at the edge provides a competitive advantage to financial technology (fintech) companies, who use these analyses to drive algorithmic trading strategies. Companies are also applying streaming analytics to provide optimal customer service at the point of interaction, improve operational efficiencies, and analyze themes of chatter about their offerings. This paper explains how you can augment real-time text analytics (such as sentiment analysis, entity extraction, content categorization, and topic detection) with in-stream analytics to derive real-time answers for innovative applications such as quant solutions at capital markets, fake-news detection at online portals, and others.
Simran Bagga, SAS
Analyzing Theft Occurrences in Chicago Using SAS® Enterprise Miner® and SAS® Enterprise Guide®
In 2016, Chicago reported more thefts than any other crime, making it one of the most prevalent types of crime in the city. This research explores patterns related to thefts in Chicago that can help minimize theft occurrences by answering questions such as these: What are the specific locations where most of the thefts are committed? What could be the possible reasons for the frequency of thefts being higher in those locations than other crimes? Crime data for the years 2012 through 2017 was obtained from Kaggle, a publicly available data source, which had more than a million observations. Several predictive models such as logistic regression, decision tree, neural network, and ensemble models were built to predict the target. For creating the binary target, FBI codes 06 (theft) and 07 (motor vehicle theft) were assigned a value of 1 while all other crimes were assigned a value of 0. The models were compared using the model comparison algorithm, and the ensemble model emerged as the best model with the largest receiver operating characteristic (ROC) index. Time series data was also prepared by aggregating the total number of theft incidents for each month to forecast the number of thefts likely to take place in the next 12 months. Between exponential smoothing and stepwise autoregressive forecasting methods, the latter was resulting in a better forecast with a mean absolute percent error (MAPE) of 7.40.
Kunal Parekh, Verisk Analytics
Shikha Prasad, Oklahoma State University
Animate your Data!
Session 1817When reporting your safety data, do you ever feel sorry for the person who has to read all the laboratory listings and summaries? Or have you ever wondered if there is a better way to visualize safety data? Lets use animation to help the reviewer and to reveal patterns in our safety data, or in any data! This hands-on workshop demonstrates how you can use animation in SAS® 9.4 to report your safety data, using techniques such as visualizing a patients laboratory results, vital sign results, and electrocardiogram results and seeing how those safety results change over time. In addition, you learn how to animate adverse events over time, and how to show the relationships between adverse events and laboratory results using animation. You also learn how to use the EXPAND procedure to ensure that your animations are smooth. Animating your data will bring your data to life and help improve lives!
Richann Watson, DataRich Consulting
Application of Propensity Score Models in Observational Studies
Treatment effects from observational studies might be biased because patients are not randomly allocated to a treatment group. Propensity score methods are increasingly being used to address this bias. After propensity score adjustment, the distribution of baseline covariates will be balanced between treated and untreated patients. This paper reviews variable selection, balancing the propensity score, sensitivity analyses, and presentation of results for five different propensity score methods: covariate adjustment, stratification, inverse probability of treatment weighted (IPTW), stabilized IPTW, and matching. Strengths and limitations of each method are illustrated by estimating the effect of anti-hypertension treatment on survival in advanced stage non-small cell lung cancer patients.
Nikki Carroll, Kaiser Permanente
Application of Support Vector Machine Modeling and Graph Theory Metrics for Disease Classification
Session 2030Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as support vector machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of logistic regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System using Base SAS® and SAS® Enterprise Miner(tm). In this data set, logistic regression outperformed SVM with a receiver operating characteristic (ROC) index of 81.8 and 81.7 for models with and without graph metrics, respectively. SVM with a polynomial kernel had an ROC index of 72.9 and 75.6 for models with and without graph metrics, respectively. Although this did not perform as well as logistic regression, the results are consistent with previous studies using SVM to classify diabetes.
Jessica Rudd, Kennesaw State University
Are New Modeling Techniques Worth It?
Session 1934New modeling techniques are constantly being developed, and at an increasing pace. Keeping up with these techniques is possible if the technique is accompanied by a library or an example that implements the technique in one of the many analytic tools at our disposable. However, the main question will always be whether the new technique is adding value and actually addressing the business problem at hand. This presentation addresses the question of when to apply new modeling techniques, and more importantly, when not to.
Tom Zougas, TransUnion
Are Your Color Choices Ruining Your Reports?
The goal of visualizing data is to communicate information effectively, to provide decision makers a quick and easy way to analyze data, and to help your readers understand data. Doing this might seem as simple as putting data into a graph. However, there's more to it. Your color choices can make or break a visualization. Its not just an aesthetic choice, its a crucial tool to convey information. When used correctly, color sets the tone and helps to create visualizations that tell stories. On the contrary, a badly chosen color palette obscures the information you are trying to portray and, in turn, makes the data visualization less effective. In this poster, we explore color choices using SAS® Visual Analytics 8.1 running on SAS® Viya®.
Jaime D'agord, Zencos
Association Rule Mining of Polypharmacy Patterns in Health Care Data Using SAS® Enterprise Miner®
Pediatric polypharmacy is prevalent in both outpatient and inpatient settings. It is associated with exposure to adverse drug events (ADEs), which is a worldwide drug problem. Currently, polypharmacy is defined in terms of concurrent medication count. Clinical administrative databases offer opportunities to speed up the understanding of drug utilization patterns. Association rule mining (ARM) is a well-established data mining technique that has been commonly used for mining commercial transactional databases. Link analysis (LA) is a popular network analysis technique that is used to discover and visualize associations between different items. We transformed administrative data to a transaction format suitable for mining rules and applied ARM and LA to analyze drug utilization and polypharmacy patterns in clinical administrative databases using SAS® Enterprise Miner(tm). Our results show that ARM can find associations among drugs, generic drug specific polypharmacy, and polypharmacy as associated with patient characteristics in clinical administrative data. The link graphs visualize drug utilization and the drug-drug combination patterns. ARM and LA provide a new way to analyze drug utilization and polypharmacy. We believe that this same approach can be used in mining other databases such as administrative claims data and electronic medical records.
Dingwei Dai, Children's Hospital of Philadelphia
At Your Service: Using SAS® Viya® and Python to Create Worker Programs for Real-Time Analytics
Session 2861Being able to automatically analyze and score data on demand has always been a challenge when you are creating a service program. SAS® Viya® brings with it new and exciting options to help tackle this task. When combining SAS® programming with the Python language, it is possible to create worker programs that can be deployed to automatically process any incoming data. This paper provides ideas for how to create an intelligent worker program that wakes up when needed, analyzes data, and then rests until it is needed again. SAS Viya programming is essential for creating any on-demand analytics environment and can be scaled up to meet any business needs.
Scott Koval, Pinnacle Solutions, Inc
Jon Klopfer, Pinnacle Solutions, Inc
Auto Scaling SAS® Real-Time Decision Manager on Amazon Web Services
Session 2075SAS® Real-Time Decision Manager combines SAS® Analytics with business logic and contact strategies to deliver enhanced real-time recommendations and decisions to interactive customer channels such as websites, call centers, point of sale (POS) locations, and automated teller machines (ATMs). SAS Real-Time Decision Manager helps you make smarter decisions by automating and applying analytics to the decision process during real-time customer interactions. Due to the interactive nature of these channels, the volume of requests can be subject to wide fluctuations, either hourly, daily, or seasonally. To enable our customers to accommodate the variance caused by a Black Friday or Cyber Tuesday event, this session presents a live demonstration of how elasticity can be brought to SAS Real-Time Decision Manager using Amazon Web Services (AWS), enabling the platform to dynamically grow or contract in response to the observed workloads. The session demonstrates the use of AWS Auto Scaling groups, Launch configurations, and Scaling plans to simply and seamlessly match application capacity to demandenabling SAS customers to best meet the demands of their customers, while minimizing their infrastructure costs.
Chris West, SAS
Auto-populating Microsoft PowerPoint Presentations So You Don't Have To
In the field of clinical research, Data Safety Monitoring Board (DSMB) members are increasingly requesting high-level overview presentations to assess the current status of a clinical trial in lieu of lengthy binders. When multiple clinical trials report to the same DSMB, a certain level of standardization across presentations is desirable to facilitate ease in understanding and familiarity with the presentation format for the DSMB members. Manually entering text and data into a presentation is tedious and subject to transcription errors. By using the ODS destination for PowerPoint, the ODS LAYOUT statements, and the ODSTEXT, ODSLIST, and a series of other familiar SAS® procedures, we have created a suite of SAS macros that automatically populate presentations, therefore eliminating the need to manually enter data into the presentation slides. These SAS macros enable you to create presentations that conform to a general template in a standardized manner that saves time, ensures data accuracy and integrity, and provides continuity in data presentation to DSMB members. We share the basics of how to create your own macros to auto-populate Microsoft PowerPoint presentations, as well as some tips and tricks we have learned along the way.
Kaitie Lawson, Rho, Inc.
Brett Jepson, Rho, Inc.
Automate the Tedious Stuff: Use Cases in Taking Your Time Back
With all of the buzz around big data, machine learning, and predictive analytics, it is easy to forget just how much of our day-to-day is spent on the far less glamorous task of data preparation. Analysts, researchers, and scientists alike spend a lot of their time trying to get the data into a workable state before they can begin to understand the story and insights hidden within. This paper outlines three programs that were developed to handle common data wrangling tasks (for example, cleaning, filtering, transforming, and merging). These programs took 60 hours of manual effort and replaced that with automated processes that run in less than 30 minutes.
Danni Bayn, FCB
Automated Assistants for Fraud Investigation Productivity
Session 2924The year 2018 marks a turning point in machine learning and artificial intelligence. In this emerging technology session, Mike presents a novel application of using machine learning to improve investigator productivity. Using the machine learning capabilities of SAS® Viya®, he demonstrates how his team has developed an intelligent agent that can automate investigator and investigation tasks, ultimately reducing the cost and complexity of investigations.
Michael Ames, SAS
Automated Management of UNIX/Linux Multi-Tiered SAS® Services: The Enhanced SAS_lsm Utility
Session 1921Have you tried to manually restart SAS servers to avoid loss of productivity and failed, causing delays? Have you encountered issues and struggled to analyze the appropriate SAS logs as part of the troubleshooting process? After experiencing these issues, you can see that a utility that automates management of a SAS multi-tiered deployments services would be a big benefit. To address these issues, SAS Technical Support developed the SAS Local Services Management (SAS_lsm) utility. The SAS_lsm utility is now available via SAS® 9.4M5 and SAS Note 58231. It initially provided support for starting, stopping, and status checking a basic, multi-tiered deployments services based on the sas.servers script. In version 3, SAS Technical Support has vastly improved SAS_lsm by adding the following support: metadata horizontal clustering services; massively parallel processing SAS® LASR(tm) services; user-defined services; deployment maintenance (target start from/stop at tier); centralized log collection, analysis, and potential fix suggestions; and streamlined input for Technical Support track creation. This paper illustrates the expanded benefits of SAS_lsm. This includes a discussion of the new tier descriptors supported, features, and a practical demonstration that highlights the utilitys configuration, typical usage scenario, problem-fix recommendation process, and simplified track-creation process. Now, grab a donut to go with your coffee, and lets examine the enhanced SAS_lsm utility!
Clifford Meyers, SAS
Automating Code Generation to Reduce Errors and Effort
Session 2584Very often SAS® code is used to produce a series of reports or outputs that use very similar code with minor variations. Editing that code by hand is time consuming and error prone. This paper presents a series of solutions to automate code generation. These solutions enable programmers to reduce effort and likelihood of errors by using some tools in SAS® specifically for that purpose. In simple terms, programmers can just write a macro and call the macro with variable parameters. This method still requires code editing. An alternative is to store the code variations in an external data set and have SAS read that data set and generate code automatically. This paper discusses techniques used to do this, including CALL EXECUTE and CALL SYMPUT, and creating macro variables with PROC SQL INTO: clause. This technique has additional side benefits, including reduced source code change control, simpler tracking of changes, and even the ability for the code changes to be managed by non-programmers. This paper assumes reasonable knowledge creating and using macro variables.
Steve Cavill, Infoclarity
Automation of Clinical Trial Result Posting to ClinicalTrials.gov and EudraCT
Session 2766According to US Food and Drug Administration Amendments Act (FDAAA) and European Medicines Agency (EMA) regulations, companies are now required to disclose their applicable clinical trials aggregated results to US ClinicalTrials.gov and to EMA European Clinical Trials Database (EudraCT) public websites: ClinicalTrials.gov (for US), and eudract.ema.europa.eu (for EMA). Currently in many pharmaceutical companies, the clinical trial aggregated results are prepared manually by a designated group based on the clinical study report. The manual process is time consuming and error-prone, with iterative back-and-forth steps and a lot of reviews among different groups. Therefore, an innovative and automated process is needed to proactively streamline the result posting process, improve the accuracy, efficiency, and consistency of the result posting, and reduce manual entry errors. At Johnson Johnson, a biostatics programming group developed an innovative automated process to proactively streamline the result posting process, improve the accuracy, efficiency, and consistency of the result posting, and reduce manual entry errors by using SAS®.
Sherry Meeh, Johnson & Johnson
Automation of Linux Multi-tiered SAS® and Load Sharing Facility (LFS) Services
Session 2757Have you ever faced the situation where your SAS® Platform went down and you need to start the entire platform as soon as possible? Have you experienced the situation when you missed a service to start and proceeded to the next service in multi-node environment? Alternatively, have you forgotten to maintain the start-up/shut-down order in a multi-mode and multi-cluster environment? These situations are very normal for a SAS Platform administrators work and responsibilities. This paper explains how you can automate your platform services with no need to worry about sequence and order. Automation helps to reduce the effort and time it takes to start or stop a Linux environment running SAS quickly, without human error. Such automation can be particularly helpful for a SAS® Grid environment that is running in a Linux environment.
Piyush Singh, Tcs
Ghiyasuddin Mohammed Faraz Khan, Sapphire Software Solutions Inc.
Prasoon Sangwan, Tata Consultancy Services
Back to Basics: Get Better Insights from Data
Get the most out of your data, big or small. There is more data than ever. But more data doesn't always mean better insights. If you're not careful, more data can sometimes lead you down the wrong path. The worst thing you can do is find patterns that arent really there. SAS® Visual Analytics gives you the tools to avoid these kinds of problems. With simple drag-and-drop functionality, you can explore your data. You can see the shape of your data. You can see potential relationships in your data. You can get a grasp of the power and limits of your data. With that understanding, you can get better insights. Its time to move beyond just pretty visualizations, and this paper shows you how.
Atrin Assa, SAS
Base SAS® and SAS® Enterprise Guide®: Automate Your SAS® World with Dynamic Code
Communication is the basic foundation of all relationships, including our SAS® relationship with the server, PC, or mainframe. To communicate more efficientlyand to increasingly automate your SAS worldyou will want to learn how to transform static code into dynamic code that automatically re-creates the static code, and then executes the re-created static code automatically. Our presentation highlights the powerful partnership that occurs when dynamic code is creatively combined with a dynamic FILENAME statement, macro variables, the SET INDSNAME option, and the CALL EXECUTE command within one SAS® Enterprise Guide® Program node. You have the exciting opportunity to learn how 1,469 time-consuming manual steps are amazingly replaced with only one time-saving dynamic automated step. We invite you to attend our session, in which we detail the UNIX and Microsoft Windows syntax for our project example and introduce you to your newest BFF (Best Friend Forever) in SAS. Please see the appendixes to review additional starting-point information about the syntax for Windows and IBM z/OS, and to review the source code that created the data sets for our project example.
Kent Phelps, Illuminator Coaching, Inc.
Ronda Phelps, Illuminator Coaching, Inc.
Bayesian Concepts: An Introduction
Session 1863You employ Bayesian concepts to navigate your everyday life, perhaps without being aware that you are doing so. You rely on past experiences to assess risk, assign probable cause, navigate uncertainty, and predict the future. Yet, as a statistician, economist, epidemiologist, or data scientist, you hold tight to your frequentist methods. Why? This paper explores the philosophy of Bayesian reasoning, explains advantages to applying Bayes rule, and confronts the criticism of subjective Bayesian priors. Procedures in SAS/STAT® software and SAS® Enterprise Miner(tm) in which Bayesian methods can be applied are presented.
John Amrhein, McDougall Scientific
Fei Wang, McDougall Scientific
Bayesian Networks for Causal Analysis
Bayesian networks (BN) are a type of graphical model that represents relationships between random variables. The networks can be very complex with many layers of interactions. Graphical models become BNs when the relationships are probabilistic and uni-directional. Building BNs for causal analyses is a natural and reliable way of expressing (and confirming or refuting) our belief and knowledge about causes and effects. In addition, BNs can be easily reconfigured with minor modifications to facilitate our understanding of probabilistic mechanisms. This paper describes the construction of BNs for causal analyses and how to infer causal structures from observational and interventional data. The paper includes applications of causal BNs for classification using the HP Bayesian Network Classifier node in SAS® Enterprise Miner(tm). Visualization, inferences, and scenario analyses for the examples are discussed.
John Amrhein, McDougall Scientific
Fei Wang, McDougall Scientific
Best Practices for Driving Business Value with CECL Compliance
Session 2044The current expected credit loss (CECL) standard is a big change in credit loss provisioning and an even bigger opportunity to drive value through the Credit Risk function. At the very least you will be able to rationalize your existing credit risk models and pools. Ideally, you should be able to create a risk feedback loop to your product pricing and also be in synch with your liquidity risk projections. Understanding CECL is key to being able to drive the full value of the new requirement. We start with a review of CECL background before digging into methodology considerations. When discussing methodology considerations, we specifically cover segmentation, loan life and loan average life, loss rate options, forecasting, and qualitative adjustments. As we delve into these considerations you will see where the opportunities are to rationalize your existing loan loss process, provide enhanced risk-based pricing information, and share information with the Liquidity Risk function. We also cover how to overcome some of the common challenges in realizing this value, including data quality issues and alignment of resources and processes.
Peter Baquero, SAS
Anthony Mancuso, SAS
Albert Hopping, SAS
Kevin Miller, SAS
BI Centralization: The Benefits of Implementing a Report Encyclopedia for Your Organization
Session 2818In the world of business intelligence (BI) and reporting, the standardization, maintenance and upkeep, accuracy, and reuse of reports across an organization are all key aspects to a well-groomed BI solutionregardless of the technology. In a medium- to large-sized organization, it is highly likely that SAS® is not the only reporting tool available to BI developers. As the sources of reports grow in number, it can become increasingly difficult to keep track of all the reports and the metadata associated with them. One solution to help solve this problem is to create a centralized catalog, or encyclopedia, to house all of this data. The report encyclopedia should serve as a method to accomplish the key aspects of a BI solution previously mentioned. By having a method to centralize report metadata in an organized manner for developers, clients, and users, the tool helps to educate users on the extent of reports offered by your team and can promote the reuse or modification of reports. The tool can also alleviate much of the stress associated with requesting a report. This paper explores how a central, connected tool such as a report encyclopedia can help an organization with all of these issues, as well as how the metadata can be used to generate SAS report indexes after a report is created.
John Mattox, Magellan Health
Big Data Analytic Models: The Challenges of Philosophies as Covariates in Higher Education
Session 1682Use of big data analytics in business is commonplace and is supporting the growth of industries throughout the world. A major source of big data is K-12 and higher education. Since 2004, billions in funding have been used to create big-data systems in education, but there is limited success in the use of this data to improve educational outcomes of students in the K-12 and higher education systems. Why? A singular goal of business is to generate revenue, and evidence from big data facilitates these efforts. Too often in education goals and outcomes are based on philosophies and are less driven by evidence from big data. The purpose of this session is to outline the challenges of big data in education and the impact of philosophies as covariates.
Sean Mulvenon, University of Nevada, Las Vegas (UNLV)
Biomedical Image Analytics Using SAS® Viya®
Biomedical imaging has become the largest driver of health care data growth, generating millions of terabytes of data annually in the US alone. With the release of SAS® Viya® 3.3, SAS has, for the first time, extended its powerful analytics environment to the processing and interpretation of biomedical image data. This new extension, available in SAS® Visual Data Mining and Machine Learning, enables customers to load, visualize, process, and save health care image data and associated metadata at scale. In particular, it accommodates both 2-D and 3-D images, and recognizes all commonly used medical image formats, including the widely used Digital Imaging and Communications in Medicine (DICOM) standard. The visualization functionality enables users to examine underlying anatomical structures in medical images via exquisite 3-D renderings. The new feature set, when combined with other data analytic capabilities available in SAS Viya, empowers customers to assemble end-to-end solutions to significant, image-based health care problems. This paper demonstrates these capabilities with an example problem: diagnostic classification of malignant and benign lung nodules that is based on raw computed tomography (CT) images and radiologist annotation of nodule locations.
Fijoy Vadakkumpadan, SAS
Saratendu Sethi, SAS
Bitcoin Price Forecasting Using Web Search and Social Media Data
As the worlds first decentralized electronic currency system, Bitcoin has achieved great success and might represent a fundamental change in financial systems. The unique feature of cryptocurrencies is that its price fluctuation relies heavily on peoples pertinent opinions instead of on institutionalized money regulation. Therefore, understanding the interplay between social media and the value of Bitcoin is crucial for Bitcoin price prediction. In this study, related comments posted on a Bitcoin forum were analyzed, and a conceptual link between extracted keywords of interests was developed. Five major clusters were obtained from text mining analysis to help gain insight into Bitcoins user opinion. Furthermore, cross-correlation between Bitcoin price fluctuation, Google Trends, and Twitter were examined in this study. To better facilitate a Bitcoin investors future investment, forecasting models were built, and the effectiveness of proposed models was validated based on Akaike information criterion (AIC) and mean absolute percentage error (MAPE) value comparison. The forecasting model using both Bitcoin daily transaction data and Google Trends data was selected based on its better performance.
Travis Miller, Oklahoma State University
Rosie Nguyen, Oklahoma State University
Rishanki Jain, Oklahoma State University
Linyi Tang, Oklahoma State University
Bricolage: My Autobiography, with SAS® Procedures
Session 1724This is a career presentation, with statistical methods. Bricolage is a construction that is made by using whatever is lying around. One can use SAS® to become a specialist in biostatistics, marketing research, survival analysis, or clinical trials. Alternatively, as a generalist, statistical methods can be applied to solve whatever problem that pops up. This includes using Fishers Exact Test to determine whether urban schools face significantly more administrative problems than rural ones, the NLEVELS option to easily find the number of unique users, the LAG function for spotting duplicate records, the MIXED procedure for analyzing pre- and post-test data to evaluate program effectiveness, and statistical graphics to communicate results to a non-technical audience.
Annmaria De Mars, 7 Generation Games
Bridging the Skills Gap between Post-Secondary Education Outcomes and Employment Opportunities
Monitoring employment qualifications is a difficult task that must be done to ensure that post-secondary education supplies graduates with the essential skills required in the workplace. Because the skill sets in demand vary over time, it is imperative that education accommodate this variation. Predicting these changes is an increasingly difficult task but must be done in order to address the increasing skill gap. This problem can be solved only by an ongoing monitoring system that can associate program outcomes with job requirements. Through the application of SAS® Text Miner, this paper examines automated processes to crawl targeted websites for entry and mid-level positions in order to classify them into topic themes and to highlight the underlying skills required. Using a stratified sample of job postings, the development of an automated text profiler for ongoing performance monitoring is explored. With the use of text cluster and text topic, skill sets can be categorized to better match program outcomes, and a visualization of the relationship between post-secondary program outcomes and the underlying sought-after employment skills can be created. The data can then be used by post-secondary educators to better bridge the skills gap.
Hong-Yu Xavier Fu, George Brown College
Allan Esser, George Brown College
Bhagmattie Annissa Rodriguez-Ramdhanie, All Canadian Self-Storage
Building a Scalable Analytic Ecosystem Based on the Data System's Process Requirements
Session 2779Organizations often struggle to build a successful artificial intelligence (AI) and big data ecosystem to solve their business problems. Starting with the business problems and then moving the program through increasing levels of maturity, the data requirements need to be explicitly captured to design the supporting hardware and software infrastructure. This can be a chicken and egg scenariomodeling to obtain reliable results cannot occur without the data and the supporting infrastructure, while this infrastructure needs to be based on the data requirements of collecting, processing, storing, querying, modeling, and visualizing the data. Some IT organizations lack the experience to effectively navigate this process, taking a trial-and-error approach, which can waste time and resources. This presentation discusses an effective process to capture the data requirements in order to identify the right supporting hardware infrastructure using SAS® Viya® to build a scalable analytic ecosystem.
Sarah Kalicin, Intel
Building Revenue Management Decision Support Platforms with SAS®
Session 1867The revenue management function in travel and hospitality addresses a well-known economics problem, maximizing revenue. The main facets of the problem are segmenting demand, understanding the price-demand relationship, estimating future demand, and finding the price level that maximizes revenue. However, with 12 InterContinental Hotels Group (IHG) brands spanning 5,000 hotels and 775k+ rooms in nearly 100 territories and countries, that analytics challenge is no small task. The problem becomes even more complex when multiple guest segments, macro factors, seasonality, different lengths of stay, and local special events are added to the equation. Further, with decision updates needed daily for every booking date one year ahead, the speed requirement for the process necessitates an analytics platform. Eric Schmidt from InterContinental Hotels Group shares how one of the worlds largest hotel companies leverages SAS® to help meet this revenue management analytics challenge.
Eric Schmidt, IHG
Business Customer Value Segmentation for Strategic Targeting in the Utilities Industry Using SAS®
Session 1772Numerous papers have discussed the importance of businesses understanding the value of their customers, using different segmentation techniques in order to target customers more efficiently, enhance business processes, and improve the customer journey. Business users refer to segmentation as a combination of using science and art in order to deliver meaningful and useful results for the business. Additionally, traditional approaches of dynamically calculating customer value are not applicable in cases when the data required for the calculations is sparse and not accessible. The aim of this paper is to provide a way of combining business knowledge with demographic, behavioral, and pricing data using SAS® Enterprise Guide® and SAS® Enterprise Miner(tm), with an emphasis on the elements that were used to deliver a simple but effective analytical solution and get a strong insight into customer value to drive appropriate strategies based on the findings.
Paul Malley, Centrica
Spiros Potamitis, Centrica
Capital Planning with SAS® Infrastructure for Risk Management and SAS® Risk and Finance Workbench
In this paper, we demonstrate how to integrate SAS® Infrastructure for Risk Management and SAS® Risk and Finance Workbench to perform efficient capital planning. On one hand, SAS Infrastructure for Risk Management serves as the execution engine to perform high-performance computation with great auditability and scalability. A script automatically turns the worksheet-style data and formulas into SAS® data sets and computational job flows, performing end-to-end calculation without intermediate manual data movements. On the other hand, SAS Risk and Finance Workbench provides a central hub to manage the various aspects of the process such as data preparation and validation, capital projection, stress testing, and regulatory reporting. The seamless integration of both software applications streamlines the capital planning process with ease of use, fast and scalable computation, and sound process control.
Minna Jin, SAS
Huina Chen, SAS
Juan Du, SAS
Catch at the Speed of Light: Analytics on Live-Streaming Data Using SAS® Event Stream Processing
Session 2269Two of the most important dimensions of data, variety and volume, have challenged the world of data analytics for quite some time. While volume dictated how data needs to be stored and accessed, variety resulted in innovative techniques to mine non-traditional data types. The third but most important dimension of data, velocity, has significantly changed the paradigm of what constitutes big data. We are indeed living in interesting times from the perspective of analytics as we look to mine data on the verge of its creation. Performing analytics on samples of stored historical data has its due share of importance but that is not enough now. Insights at the speed of lightwhat is happening right now at this second can be of tremendous value. The timelines of data to insights and insights to actions are shrinking due to rapidly changing dynamics in the way businesses operate now. SAS® Event Stream Processing perfectly caters to this need for capturing and capitalizing streaming data. With the growing use of social media channels, consumers are increasingly sharing their activities, likes and dislikes, opinions, choices, and interests. Live-streaming data can be of significant value if rightly tapped using proper analytics. In this paper, we showcase the combined power of analytics with SAS Event Stream Processing to build a real-time processing engine that can generate insights on the fly on SAS® Viya®.
Murali Pagolu, SAS
Cause-Specific Analysis of Competing Risks Using the PHREG Procedure
Session 2159Competing-risks analysis extends the capabilities of conventional survival analysis to deal with time-to-event data that have multiple causes of failure. Two regression modeling approaches can be used: one focuses on the cumulative incidence function (CIF) from a particular cause, and the other focuses on the cause-specific hazard function. These two quantities, unlike the hazard function and the survival function in conventional survival settings, are not connected through a simple one-to-one relationship. The Fine and Gray model extends the Cox model to analyze the cumulative incidence function but is often mistakenly assumed to be the only modeling technique available. The cause-specific approach that simultaneously models all the cause-specific hazard functions offers a more natural interpretation than the Fine and Gray model. SAS/STAT® 14.3 includes updates to the PHREG procedure to perform the cause-specific analysis of competing risks. This paper describes how cause-specific regression works and compares it to the Fine and Gray method. Examples illustrate how to interpret the models appropriately and how to obtain predicted cumulative incidence functions.
Changbin Guo, SAS
Ying So, SAS
Certified Smart: Steps to Becoming SAS® Certified
Session 1692Have you been wondering what all the buzz is about SAS® certification? Is that next big job promotion hinging on you becoming SAS certified? Do you want to make yourself more marketable to potential employers by becoming a SAS® Global Certified Professional? If you have been considering becoming SAS certified and are not sure just where to start, then this seminar is for you! This seminar is designed to help you get started on your journey to becoming SAS certified. In this session, we discuss the available SAS certifications in various technology areas, show you how to find more information about the certification process, and even show you the resources that are available to you as you begin your preparation to become a SAS Global Certified Professional.
Mark Craver, SAS
Becky Gray, SAS
Change is Good, or at Least Expected: Techniques for Visualizing Categorical Values Over Time
Many techniques exist for showing how numeric values change over time. Bar charts, line charts, plots, and many other graph types are all excellent ways to demonstrate how temperature, expenses, and other measures increase and decrease over minutes, months or decades. On the other hand, such graphs don't lend themselves to showing how and when categorical values such as grade, rating, score, and status change over time. A simple combination of data manipulation, file merging, custom formatting, and the Output Delivery System (ODS) can produce a wide range of useful, easy-to-interpret, and effective reports. By using color, fonts, custom messages, and other features to indicate a change in a data value, these reports make it easy to monitor progress or to detect when things are going in the wrong direction.
Lisa Horwitz, SAS
Claim Analytics: A Litigation Prediction Case Study
Session 2504Claim analytics has been evolving in the insurance industry for the past two decades. This presentation provides an overview of claim analytics. Then, a common high-level claim analytics technical process with large data sets is introduced. The steps of this process include data acquisition, data preparation, variable creation, variable selection, model building (also known as model fitting), model validation, model testing, and so on. Next, we describe a case study. Over the past couple of decades, in the property and casualty insurance industry, around 20% of closed claims have settled with litigation propensity, representing 7080% of total dollars paid. Apparently, the litigation is one of the main claim severity drivers. In this case study, we introduce the workers compensation (WC) litigation propensity predictive model used at Gallagher Bassett Services, Inc, which is designed to score the open WC claims to predict the future open claims litigation propensity. Data, including over a few thousand clients data in a WC book of business, was used and explored to build the model. Multiple cutting-edge statistical and machine learning techniques (GLM logistic regression, decision tree, neural network, gradient boosting, random forest, and so on) and WC business knowledge were used to discover and derive complex trends and patterns across the WC book of business data to build the model.
Mei Najim, Gallagher Bassett
Claim Risk Scoring Using Survival Analysis Framework and Machine Learning with Random Forest
Session 2521The Workplace Safety and Insurance Board of Ontario is an independent trust agency that administers compensation and no-fault insurance for Ontario workplaces. Claim risk scoring can enable identifying claims at most risk of prolonged duration. Early identification of such claims helps with targeting these claims with interventions and customized claim management initiatives to improve duration and health outcomes. Claim risk scoring uses discrete time survival analysis framework. Logistic regression with spline for time to better estimate the hazard function and interaction of a number of factors with time spline to properly address proportional hazard assumption is used to estimate the hazards and the corresponding survival probability (a very sophisticated conventional model). In recent years, machine learning methods, including random forests (RF), started to gain popularity, especially when the emphasis of the modeling is accurate prediction. Comparison of the existing conventional model and RF machine learning algorithm implementation is presented. The high-performance procedure HPFOREST in SAS® Enterprise Miner(tm) was used for RF. Tuning RF parameters using graphical analysis was explored. Time-specific percent response and lift charts, accuracy, and sensitivity statistics were used to evaluate the predictive power of the models. RF achieved better performance in the early stages of the claim life cycle and was implemented.
Yuriy Chechulin, Workplace Safety and Insurance Board of Ontario
Jina Qu, WSIB
Terrance D'souza, WSIB
Classifying and Predicting Spam Messages Using Text Mining in SAS® Enterprise Miner®
In this technologically advanced digital world, identifying a spam text message is very important. They are not just unwanted messages; they can also trap us in scam subscriptions that might infect our devices with malicious software. Unlike with email, there are sometimes direct costs involved because some recipients are charged a fee for every text message received. The data set used for this analysis is a collection of 6,727 messages with 5,574 English messages (747 spam, 4,827 non-spam) from UC Irvine Machine Learning Repository and a corpus of 1,353 spam messages from Dublin Institute of Technology (DIT). This paper motivates work on identifying clusters of high frequency spam and non-spam words. A classification model that can classify and predict the messages as spam and non-spam based on the text rule builder rules is discussed. The predictive power of this model is assessed by the misclassification rate in the scored data (4%). This model can help carrier companies protect their customers from spammers. Companies can use the list of high-frequency spam words and take necessary precautions to avoid them in their promotional offers. We plan to build five other predictive models using memory-based reasoning (MBR), logistic regression, decision tree, random forest, and neural network, and compare their performance with the text rule builder model in terms of their predictive power (accuracy and misclassification) of classifying a message as spam or non-spam.
Venkata Sai Mounika Kondamudi, Oklahoma State University
Code Like It Matters: Writing Code That's Readable and Shareable
Session 2520Coming from a background in computer programming to the world of SAS® yields interesting insights and revelations. There are many SAS programmers who are consultants or who work individually, sometimes as the sole maintainer of their code. Since SAS code is designed for tasks like data processing and analytics, SAS developers working on teams might use different strategies for collaboration than those used in traditional software engineering. Whether a programmer works individually, on a team, or on a project basis (delivering code and moving on to the next project), there are a number of best practices from traditional software engineering that can be leveraged to improve SAS code. These practices make it easier to read, maintain, and understand and remember why the code is written the way it is. This paper presents a number of best practices, with examples and suggestions for use in SAS. The reader is encouraged not to apply all the suggestions at once, but to consider them and how they can improve their work or the dynamic of their team.
Paul Kaefer, UnitedHealth Group
Come On, Baby, Light my SAS® Viya®: Programming for CAS
Session 2622This paper is for anyone who writes SAS®9 programs and wants to learn how to take advantage of SAS® Cloud Analytic Services (CAS) in SAS® Viya®. By the end of this paper, SAS®9 programmers will be familiar with CAS sessions and libraries, and the scope and reuse of those sessions. A programming pattern for the CAS data life cycle is presented and, finally, a set of benchmark performance metrics are presented that illustrate the potential performance gained from CAS, compared to SAS®9.
David Shannon, Amadeus Software Limited
Command-Line Administration in SAS® Viya®
Administration of the SAS® Viya® environment gives users more flexibility with the sas-admin command-line utility. You can manage custom identity groups, create folder structures, or update the configuration, all from the command line. It is well-suited for scripting common changes in this complex environment. There are even capabilities that enable content promotion for building development and test environments. The command-line utility has the following benefits: It is automatically plannable as part of the server installation process. You can download it from the web on Red Hat Linux, Microsoft Windows, and even Apple Mac. It is deployed as a stand-alone executable there's no required installation run-time environment as for Java. There is profile support, so you can manage multiple environments from one location.
Danny Hamrick, SAS
Comparisons of Large Complex Data Sets to Catch Any Changes Using SAS® Enterprise Guide®
Session 2381As tremendous data files keep changing, comparing large, complex data files becomes more and more challenging. It is impossible to use the COMPARE procedure to catch the differences between current and previous data files. So we split the current and previous data into three parts: 1) a part that consists of the common dimensions with modified values exists in both the current and previous data files; 2) an additional part that exists only in the current data file; and 3) another part that exists only in the previous data file. Then we compared the common dimensions, determined which ones were modified, and determined the additional parts in the current and previous data.
Kaiqing Fan, Mastech Digital Inc.
Competing Risk Survival Analysis: A Novel Way to Look at Treatment Wait Times
Survival analysis is used in the investigation of time-to-event data and can provide the probability of experiencing an event of interest by a given time. However, some people might experience additional events that alter the probability of experiencing the event of interest. This scenario is known as competing risks. This paper provides a practical overview of survival analysis with competing risks by considering a novel way to look at treatment wait times. Wait times in healthcare are typically reported as the percentage of patients who received the procedure of interest within the target time frame. Wait times do not take account of the patients that are still waiting for the procedure or for patients who spend time in the queue without ultimately receiving the procedure. By using survival analysis with competing risks, we can include the experience of all patients and calculate the probability of any patient receiving a particular procedure by a given target time.
Colleen Mcgahan, BC Cancer Agency
Complicated Data? SAS® User Formats Can Simplify Your Code
Session 2682The transition to ICD10 diagnosis codes by healthcare providers presents a challenge for administrative claims-based research studies that span the October 1, 2015 cut over date. Healthcare research studies that identify population cohorts using a mix of ICD9 and ICD10 codes become difficult to manage for the following reasons: 1) The same diagnosis code can appear in ICD9 and ICD10 but have different meanings; and 2) SAS® user formats that exist for ICD9 to obtain descriptions no longer work. The solution to this problem is to use SAS user formats to remove the challenges presented in consuming administrative claims data with mixed diagnosis code versions in the same SAS data set. Researchers can generate SAS user formats that can work with a mix of ICD9 and ICD10 codes to obtain correct diagnosis descriptions regardless of the code version. By incorporating a SAS user format solution, researchers can make processing claims that cross the October 1, 2015 transition date easier for the programmer and consumer of this data. This e-poster does the following: 1) Reviews a simple SAS user format that has worked historically; 2) Highlights the challenges researchers face with longitudinal studies using a mix of codes; 3) Presents an easy solution, a new SAS user format, removing the necessity for the consumer of the data to evaluate the code version in order to process the data correctly; and 4) Demonstrates the use of SAS user formats in simplifying complex data interpretation.
Margaret Burgess, Optum
Nelson Stephanie, Optum
Catherine Olson, Optum
Compute Service: A RESTful Approach to the SAS® Programming Environment
Session 2083SAS® Viya® provides an open architecture that makes it easy to include SAS® analytics in your application development via REST APIs. The Compute Service API provides access directly into the SAS language programming environment. Compute Service not only enables you to execute SAS code, but also provides access to SAS data sets, SAS files, Output Delivery System (ODS) results, and more. This paper provides an introduction to the Compute Service API, providing examples for how to best use the many feature this API provides. Learn how the Compute Service API can make a great addition to your application development pipeline.
Joseph Henry, SAS
Jason Spruill, SAS
Confirmatory Factor Analysis in SAS®: Application to Public Administration
When hypothesizing the factor structure of latent variables in a study, confirmatory factor analysis (CFA) is the appropriate method to confirm factor structure of responses. Learning about building CFA within any statistical package is beneficial as it enables researchers to find evidence for validity of instruments. An instrument was developed by Wright (2016) to assess Community-Based Development Organizations (CBDO) ability to build capacity. This study addresses the continued development of the instruments structure using CFA in SAS® 9.4 and examines how it contributes to enhancing an organizations ability to create sustainable communities. Developing and evaluating the instrument is crucial due to the growing role of CBDOs in the offering of public services, yet there's a significant gap in the literature about organizational factors affecting CBDOs ability to build capacity. The purpose of the data analysis is to investigate this scales factor structure of responses and address the capacity of nonprofits within the United States by surveying 1,000 organizations. Different steps for performing CFA using SAS®, using primarily the CALIS procedure, are explored and their strengths and limitations are specified. Results are compared to the same model built in Mplus, which is recommended for latent variable models. Suggestions for best practices are provided through the methodological comparison of results.
Niloofar Ramezani, George Mason University
Kevin Gittner, University of Northern Colorado
Considerations for Analysis of Healthcare Claims Data
Session 2630Healthcare-related data is estimated to grow exponentially over the next few years, especially with the growing adoption of electronic medical records and billing software. The growth in available data will yield many opportunities for healthcare analytics as analysts, researchers, and administrators look to leverage this information. One needs to consider the operational constructs of the data in order to process and analyze it in an efficient manner. Healthcare claims data is the transactional level of data that is the core from which most analysis of data results. Each claim can contain hundreds of variables about the course of care. Claims include diagnosis and procedure data, as well as variables outlining information about the recipient of services (Enrollment), providers of services (Provider), and ancillary variables that outline a number of administrative fields. Key to understanding this data are the inherent structures as to how this data is stored and accessed. There are a number of considerations that need to be addressed in order to effectively process and manage the data to get it into an analytical form. This session discusses the use of common SAS® procedures and DATA step functionality to address the various challenges specific to healthcare data. Topics include SQL, transposing data, efficient joins, data structures, and considerations stemming from the claims adjudication process.
Bradley Casselman, The University of Alabama
Construction of Forward-Looking Distributions by Using Historical Data and Scenario Assessments
Session 1743Many banks use the loss distribution approach in their advanced measurement models to estimate regulatory or economic capital. This boils down to estimating the 99.9% VaR of the aggregate loss distribution and is notoriously difficult to do accurately. Also, it is well-known that the accuracy with which the tail of the loss severity distribution is estimated is the most important driver in determining a reasonable estimate of regulatory capital. To this end, banks use internal data and external data (jointly referred to as historical data) as well as scenario assessments in their endeavor to improve the accuracy with which the severity distribution is estimated. In this paper, we propose a simple new method whereby the severity distribution can be estimated using historical data and experts scenario assessments jointly. The way in which historical data and scenario assessments are integrated incorporates measures of agreement between these data sources, which can be used to evaluate the quality of both. In particular, we show that the procedure has definite advantages over traditional methods where the severity distribution is modeled and fitted separately for the body and tail parts, with the body part based only on historical data and the tail part on scenario assessments.
Riaan De Jongh, Center for BMI, North-West University
Helgard Raubenheimer, Center for BMI, North-West University
Control, Coordinate, and Create Content with SAS® Drive
Session 2255Do you know how to quickly share a SAS® file with your colleagues? Would you like to browse through a report before opening it? Can you organize your SAS content all in one place? This paper shows you how to do these things and more with SAS® Drive. We are excited to announce SAS Drive, a new tool for SAS users. With SAS Drive, you can manage all your content in one convenient location. You can search for an object that was created in a SAS application. You can share, tag, and preview your SAS content. Sharing takes only one click on the object. You decide who you want to share it with and whether they can read, edit, or share it. Tagging content can help you organize your work. Previewing files can help you find the right one. In addition to controlling your SAS content, you can drag-and-drop most content types from your local machine, and SAS Drive will upload it for you. This simple upload feature enables you to add image files for use in reports, CSV files for importing data, and so on. Undo and redo are part of SAS Drive, helping you to instantly recover from any unintentional actions. You can delete objects when you no longer need them, and you can easily create new content. For quick access to what you care most about, SAS Drive includes a collapsible area where you can place any content you choose. This paper includes screen shots and illustrative examples showing you how to manage all your SAS content in one place.
Cheryl Coyle, SAS
Scott Leslie, SAS
Scott Leslie, SAS
Bharat Trivedi, SAS
Cool SQL Tricks
Session 1823This hands-on workshop is a mixed bag of lesser-known tricks that you can use to make cool SQL queries. Tricks include: the Optimizer, indexes and the WHERE clause, recursive joins, fuzzy joins, coalesce tricks, how the pivot table makes your query faster, and more.
Russ Lavery, Independent Contractor
Copy That! Using SAS to Create Directories and Duplicate Files
Whether it is part of a clients request or needed for organizational purposes, the creation of several directories is often required. Along those same lines, business requirements sometimes require that the same document or checklist be filled out for each individual client or product that a company manages. The monotony of creating and naming folders or using a save-as function for files is a task that few eagerly anticipate having to do. With a little help from SAS®, you don't have to! In this paper, we explore different methods of using SAS system options, functions, and commands to create separate directories and duplicate files based on a given data set. First, we investigate multiple ways to create a single directory in SAS. Then, using the SAS data set Sashelp.Cars, we use these functions and commands, along with the help of macros, to create directories for each of the car manufacturers listed in the data set. Finally, we discuss how to use SAS to copy existing files, and, again using the Sashelp.Cars data set as an example, generate a checklist for each car model in the data set.
Nicole Ciaccia, Educational Testing Service
Cost Analysis in Population Health Using SAS® Real World Evidence
Health-care organizations are faced with the challenge of analyzing high volumes of population-level health data in order to gain insights into ways to reduce cost while improving health-care quality and efficiency. This paper explains how to develop statistical models using SAS® Real World Evidence to analyze cost and understand the key factors associated with health-care costs. Using SAS Real World Evidence, a population cohort based on a research question is created. Statistical modeling is performed on this cohort using the powerful Add-in Builder available in SAS Real World Evidence. The results from statistical models are then analyzed in order to understand characteristics associated with cost.
Jay Paulson, SAS
Create Awesomeness: Build a Custom App to Extend SAS® Visual Analytics to Get the Results You Need
Robby Powell, SAS
Creating a Multi-tenant Environment Using SAS® Enterprise Guide® and a SAS® Grid
Session 2642The Centers for Medicare and Medicaid Services (CMS) Chronic Conditions Data Warehouse (CCW) Virtual Research Data Center (VRDC) is a unique multi-tenant environment. It has nearly one thousand unique users running between three and four hundred real-time sessions with SAS® Enterprise Guide® and a SAS® grid. It also has over a petabyte of SAS® data that is constantly growing. This paper discusses the challenges of running a system of this size, focusing on performance and ability to scale.
Timothy Acton, General Dynamics Health Solutions
Creating a Spatial Weights Matrix for the SPATIALREG Procedure
This paper describes a macro for creating a spatial weights matrix for the SPATIALREG procedure. The new PROC SPATIALREG estimates spatial regression models by including in its structure the effect of spatial proximity, represented in a matrix known as W matrix. However, this W matrix cannot be created directly in the procedure. Thus, this paper shows some ways to do that using hash objects and PROC IML.
Alan Ricardo Da Silva, University of Brasilia
Creating a Successful Data Science Program: A Joint Academic and Industry Perspective
Session 1794There is an ongoing misperception with regard to the definition and role of data science. While the field of statistics has been traditionally defined as a branch of mathematics, data science is often associated with the areas of data mining, knowledge discovery, and analytics. Furthermore, there is still insufficient collaboration between academia and industry as to the role and potential of data science. At the same time, demand for data science graduates and professionals is growing. On this backdrop, new data science academic programs are appearing across North America without the diligent consideration of their content or industry needs. In this paper, we discuss the key differences between predictive and explanatory paradigms of data science and statistics. We suggest that the fields can be selectively combined to address the needs of the development of a well-founded and practically oriented data scientist. Drawing from a unique experience of teaching data science and leadership in business analytics, we review examples of joint initiatives that might help bridge the gap between academia and industry. This discussion is illustrated with examples of successful data science programs such as Data Intelligence at Concordia University. We propose that members of the software industry, like SAS, can play a constructive role in facilitating collaboration between universities and the industry to develop innovative programs and help solve real-life business problems.
Krzysztof Dzieciolowski, Concordia University
Credit Score: A Comparison of Gradient Boosting with Logistic Regression in Practical Cases
Session 1857The use of boosting methods is increasing every day with the constant evolution of software and hardware infrastructure and their cost reduction. However, we still encounter some problems and reliability issues, mainly when we refer to automatic processes, where in most cases you cannot interpret their parameters (the influence of each variable).This paper evaluates one of the main methods of boosting, gradient boosting, and its use in credit score models. We propose to build a macro procedure that prepares the database and reduces the complexity of the deployment process by selecting the best variables and fewer interactions, but preserving the same quality of the fit. We also evaluate the main methodology used today for credit score models, logistic regression, in order to compare the results with the boosting process. We present five practical cases of models from the largest credit bureau in Brazil (Serasa Experian), and evaluate the main results, considering simulations of boosting and logistic regression. Finally, we analyze the results and indicate the pros and cons of using new methodology.
Paulo Di Cellio Dias, Serasa Experian
Marc Witarsa, Serasa Experian
Melissa Forti, BRadesco
Customizing the Aggregation of Risk Data
You asked for real-time and on-the-fly queries of risk data, and SAS answered with SAS® High-Performance Risk. Now with new and changing regulations such as the FRTB (Fundamental Review of the Trading Book) and SIMM (Standard Initial Margin Model), you need customizable queries. Adding to its established evaluation-stage actions, SAS High-Performance Risk now offers you control over how your query aggregates data. You can customize exposure calculations, weight aggregations, perform correlated aggregations, and calculate other nonlinear measures.
Scott Gray, SAS
Stacey Christian, SAS
Data Integration Best Practices
The creation of and adherence to best practices and standards can be of great advantage in the development, maintenance, and monitoring of data integration processes and jobs. Developer creativity is always valued, but it is often helpful to channel good ideas through process templates to maintain standards and enhancement productivity. Standard control tables are used to drive and record data integration activity. SAS® Data Integration Studio (or Base SAS®) and the judicious use of auto-call utility macros facilitate data integration best practices and standards. This paper walks you through those best practices and standards.
Harry Droogendyk, Stratia Consulting Inc
Data Management in SAS® Viya®: A Deep Dive
Session 1670This paper provides an in-depth look into the new SAS® data management capabilities in support of SAS® Viya® and SAS® Cloud Analytic Services. The paper includes an overview of SAS® Data Management in SAS Viya, and contains details about how the feature set integrates with SAS Cloud Analytic Services. Examples and usage scenarios of how to best leverage the technology are also included.
Wilbram Hazejager, SAS
Nancy Rausch, SAS
Data Management Meets Machine Learning
Session 1683Machine learning, a branch of artificial intelligence, can be described simply as systems that learn from data in order to make predictions or to act (autonomously or semi-autonomously) in response to what it has learned. Unlike pre-programmed solutions or business rules engines, machine learning can eliminate the need for someone to continuously code or analyze data themselves to solve a problem. While there are a variety of applications of machine learning (and the more advanced deep learning) most have been focused on machine learning that trains a computer to perform human-like tasks, such as recognizing speech, identifying images (or objects and events portrayed therein), and making predictions. In this presentation, we explore the use of machine learning as an approach to helping with upstream activities in data management, including classification and feature identification, as well as discuss implications for data quality, data governance, and master data management.
Greg Nelson, Thotwave Technologies, LLC.
Data Quality Control for Big Data: Preventing Information Loss with High-Performance Binning
Session 2821It is a well-known fact that the structure of real-world data is rarely complete and straight-forward. Keeping this in mind, we must also note that the quality, assumptions, and base state of the data we are working with have a very strong influence on the selection and structure of the statistical model chosen for analysis, data maintenance, or both. If the structure and assumptions of the raw data are altered too much, then the integrity of the results as a whole are grossly compromised. This paper provides programmers with a simple technique that enables the aggregation of data without losing information. This technique also checks for the quality of binned categories in order to improve the performance of statistical modeling techniques. The Base SAS® high-performance procedure, HPBIN, gives you a basic idea of syntax as well as various methods, tips, and details for how to bin variables into comprehensible categories. You will also learn how to check whether these categories are reliable and realistic by reviewing the weight of evidence (WOE) and information value (IV) for the binned variables. This paper is intended for any level of SAS® user interested in quality control or in Base SAS high-performance procedures.
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Data Science and SAS®: A Data Science Perspective
Two of the hottest buzz words in the past five years of IT are big data and data science. Big data is today's catchphrase that refers to the exponential growth of digital data from a multitude of sources, with the promise of software providing new insights and discoveries that enable data-driven decision-making. Data science has become the field where new algorithms and methods are developed to take advantage of the huge volumes of data. This paper and presentation explore how SAS® provides solutions to every aspect of the life cycle of big data and how a data scientists use of SAS can provide a realization of the promise of providing new and additional insights and discoveries.
Richard La Valley, OGSystems
Data Visualization from SAS® to Google Maps on Microsoft SharePoint
Session 2604Google Maps is a very popular web mapping service developed by Google. Microsoft SharePoint is a popular web application platform and used for content management by companies and organizations. Connecting SAS® with Google Maps and SharePoint combines the power of these three into one. As a continuation of my SAS Global Forum Paper 1062-2017 Data Visualization from SAS® to Microsoft SharePoint, this paper expands on how to implement geocoding and data visualization from SAS to Google Maps on Microsoft SharePoint. The paper shows users how to use SAS procedures to create and send XML data files from SAS to SharePoint Document Library. The XML data files serve as data feeds for Google Maps web pages on SharePoint and SAS code examples are included. A couple of examples with different views on data visualization from SAS to Google Maps APIs on SharePoint are provided.
Xiaogang (Isaac) Tang, Wyndham Worldwide
Data-Driven Marketing Strategies for an Online Retailer
Session 2881Online retailing has changed the overall buying and selling experience for both retailers and consumers. From handmade articles to high-end electric cars, almost anything and everything can be purchased with a click of a button. Though online retailing has helped retailers save a considerable amount of money on infrastructure, there are still some challenges, like maintaining an optimized supply chain, sales planning, and offering the right product mix to the right customers. This paper attempts to identify different combinations of highly associated products that customers buy frequently, with the help of association analysis. For example, in the initial analysis, it was observed that in France, Tin Spaceboy and Woodland Animals (toys) are highly associated, but the quantity sold for the former product is less than for the latter one. Such a finding could potentially be used to decide the best product mix, to plan sales, and to help build customized marketing campaigns. Furthermore, this paper attempts to forecast the sales of highly associated products using time series forecasting. The data has more than 500,000 transactions made over 12 months in 38 countries. SAS® Enterprise Miner(tm) and SAS® Forecast Studio were used for the analysis.
Jaideep Muley, Oklahoma State University
Data-Driven Programming Techniques Using SAS® Metadata
Session 1653Data-driven programming, or data-oriented programming (DOP), is a specific programming paradigm in which the data (or data structures) itself, and not the program logic, controls the flow of a program. Often, data-driven programming approaches are applied in organizations with structured data for filtering, aggregating, transforming, and calling other programs. SAS® users can easily access metadata content to capture valuable information about the librefs that are currently assigned, the names of the tables available in a libref, whether a data set is empty, how many observations are in a data set, how many character versus numeric variables are in a data set, a variables attributes, the names of variables associated with simple and composite indexes, and much more. The value of accessing the content of the contents of these read-only SAS metadata data sets, called dictionary tables, or their counterparts, SASHELP views, is limitless. This paper explores how SAS metadata can be dynamically created using data-driven programming techniques.
Kirk Paul Lafler, Software Intelligence Corporation
Database-Ics II: Design, Standardize, and Control the Process for Better Data Management
Session 2471Developmental design is an art, and too many times we dive right into the coding and the results instead of taking the time to think through the process. This paper examines the entire development process and shows the audience different facets to think about during development. Areas covered are design, standardization, and control tables, as well as SAS® tools that can be used to make your process more efficient. This paper updates concepts presented in Database-ics: A Primer for Creating a Usable Database in SAS®, which I presented in 2005 at MidWest SAS® Users Group (MWSUG), Western Users of SAS® Software (WUSS), and SAS® Users Group International (SUGI), and adds additional concepts.
Frank Ferriola, Charles Schwab & Co.
Democratizing Big Data: How to Turn Clickstream Data from Google Analytics into Something Useful
Session 2833When one hears the phrase big data, it is often in the context of financial data, genome mapping, or meteorological studies, but big data also exists in the webosphere. With more information about user behaviors, brands and advertisers can serve hyper-personalized content to every individual. With the sheer amount of data available from digital user behavior, many of us are turned off from participating in this new frontier of marketing because we do not have the background necessary to collect, clean, and run the types of statistical analysis needed, much less the resources to do so (or so we think). After wrestling with this very problem, our team figured out a way to use Google Analytics to mine clickstream data for free, extract it using SAS®, and turn it into beautiful, dynamic visuals powered by open-source D3.js to bring clickstream analysis to the people.
Jennifer Seid, FCB
Demystify Income Models with Advanced Machine Learning Techniques for Ultimate Accuracy
Session 1652In the age of big data, leaders and financial institutions are looking for accurate modeled income to enable robust prescreening and pre-qualification, and to effectively manage the customer portfolio in order to increase profits. More kinds of data are available than ever beforefar more than current technologies can keep up with. While this mass of data creates the potential to learn more about the customer base, all this data can be hard to interpret and overwhelming to analyze. Its critical to extract insights and leverage the best-in-class machine learning techniques. This session demonstrates how to revamp traditional generic income scores by engineering data sources that improve customer decisioning, generating insights across multiple account life cycle stages and leveraging advanced machine learning techniques to achieve ultimate accuracy. This session shows you how to: develop prototype models and configure them to business units for different use cases determine the types of data that can be made available to income solutions in a Fair Credit Reporting Act (FCRA) environment use Hadoop and SAS® LASR(tm) efficiency to build models and improve performance with big data differentiate the market by incorporating multisource premium data assets, including trended data attributes identify the cutting-edge machine learning techniques that fit your business and business model innovate with industry-leading accuracy metrics for performance measurement
Vickey Chang, Equifax
Demystifying Buzzwords: Using Data Science and Machine Learning on Unsupervised Big Data
Session 2781Each month it seems that a new technology introduced that can transform your organization in unimaginable ways. Each technology arrives with its own set of industry buzzwords that make it difficult to understand how your organization would benefit. In this talk, we answer questions we are asked most frequently about machine learning, data science, artificial intelligence, and advanced analytics. We review the relative strengths and weaknesses of the various tools, techniques, and technologies associated with these buzzwords. You will walk away a winner by knowing how to put each to its best use.
Ben Murphy, Zencos
Deploying and Maintaining Models in a Big Data Environment: An Intelligent SAS® Workflow
Creating predictive models is one part of data mining. Implementation and maintenance are other parts. With the advent of a big data ecosystem, it is critical for organizations to create a coherent SAS® workflow that can work with different analytical platforms. With different organizations using different platforms, everyone wants the diverse models to be independent, traceable, and reusable. We demonstrate how SAS can be leveraged to create an intelligent workflow that can support different ecosystems and technologies, including the interactions of Apache Hadoop (Hortonworks, Cloudera, and so on), Teradata, Microsoft SQL server, and different analytical platforms, to create a seamless SAS workflow that is flexible, scalable, and extensible. In today's world, all customer activity is captured on a real-time basis. Data increases led to the ability to store data efficiently with reduced cost, which introduced the big data environment. The predictive models that consume the big data should be maintained in the same way. This efficient arrangement of predictive models leads us to consider the following questions: Do we need to deploy models independently? Does the process support the frequency of daily runs or monthly runs? Does the process indicate how healthy or rebuildable the models are? What analytical platforms (open source or licensed) are compatible, and how well integrated are all the platforms? Do I get a monthly summary of active models? How are the logs maintained?
Anvesh Reddy Minukuri, Comcast
Ramcharan Kakarla, Comcast
Sundar Krishnan, Comcast
Deploying SAS® Grid Systems on VMware ESXi Virtually Provisioned Storage
Session 1931Creating performant SAS® file systems that are compliant with VMware ESXi can be a challenge. There are several ways to implement virtual and software-defined storage resources within a VMware ESXi hypervisor controlled system. This paper covers several considerations for file system and logical unit number (LUN) construction from several perspectives: host logical volume, virtual machine file system (VMFS) and virtual machine disk (VMDK), and VMFS and raw device mapping (RDM) implementations. Also discussed are non-LUN mounts to hyper-converged storage systems. The focus is on tuning and performance of the method used, as well as pros and cons between hypervisor-level, and back-end storage-level utility operations.
Tony Brown, SAS
Detecting Defective Equipment in the Healthcare Service Industry
Healthcare equipment service companies confront the challenge of how to identify defective equipment before they accrue a large financial burden to the supplier, downtime for the customer, and significant delay in the treatment of patients. Current methodologies employ sensors to detect large variations and to alert the service provider before failure or break down. However, these sensors are not available in all the machines, especially the older ones, and alerts usually come in at the last minute. Since time is of the essence, engineers scramble to make a quick break fix without looking at a holistic view of the problem and the history of the machine. This situation results in defective equipment being in the market for a significant period of time, causing excessive stress to both the service provider and customer, and impacting the reputation of the company. Using modern visualization tools like SAS® Visual Analytics and JMP®, we can apply meaningful visualizations to analyze large volumes of service event data. Doing so has enabled us to create a useful index that effectively measures customer experience over time and to apply regression techniques in tools like SAS® Enterprise Guide® to quickly identify the defective equipment, thus saving many hours of downtime and unproductive labor.
Sean Cohen, Siemens Healthineers
Sarith Mohan, Siemens Healthineers
Determine Differences in Health Care Costs Before and After Alzheimer's Diagnoses Using SAS® Studio
Delayed or missed diagnoses of Alzheimers might deprive afflicted individuals of treatments that improve their symptoms, slow disease progression, and help them maintain independence. Cost-based evidence to substantiate the importance of early diagnosis is lacking. The objective of this paper is to use SAS® Studio to analyze the administrative claims data of 9,879 Alzheimers patients and identify cost trends before and after diagnoses. Significant differences in fall-related, injury-related, and poisoning-related costs (inpatient, outpatient, ER, and office visit costs) before and after Alzheimers diagnoses are assessed, and those results are interpreted. Total treatment costs were lower (61 percent for poisoning, 13 percent for falls, 12 percent for injuries) in the first six months after diagnosis compared to the six months before. Inpatient health care costs were comparably lower (60 percent for poisoning, 14 percent for falls, 17 percent for injuries). Outpatient costs were higher related to falls (97 percent) and injuries (134 percent). The results suggest a trend toward less costly care immediately after an Alzheimers diagnosis. The research method resulted in tips for future SAS Studio users, including:1) how to prepare sample data sets for use with the statistics tasks in SAS Studio; and 2) how to execute summary statistics and paired t-tests and interpret the results.
Catherine Olson, Optum
Thomas Iverson, Optum
Developing BI Best Practices: Texas Parks and Wildlife's ETL Evolution
The development of extract, transform, load (ETL) processes at Texas Parks and Wildlife has undergone an evolution since we first implemented a SAS® Business Intelligence (BI) system. We began constructing our data mart with subject matter experts who used SAS® Enterprise Guide® and who had a need to share that data among less-experienced staff in our user community. This required us to develop best practices for assembling a data mart architecture that maximized ease-of-use for consumers, while offering a secure reporting environment. We also developed best practice guidelines for determining whether data was best served to and from the data mart via physical SAS® tables, information maps, OLAP cubes, or stored processes. Originally, we used Microsoft Task Scheduler to automate SAS Enterprise Guide processes and developed best practices in designing process flows to run on an automated schedule. Recently, we invested in SAS® Data Integration Studio and have developed best practices to transfer SAS Enterprise Guide process flows and schedule them within that tool. In this paper, we share the best practices we have developed in evolving ETL processes that we have found to be helpful in ensuring the success of our BI implementation.
Drew Turner, Texas Parks and Wildlife
John Taylor, Texas Parks and Wildlife
Alejandro Farias, Texas Parks and Wildlife
Developing Business Intelligence Best Practices: Lessons Learned from a Successful BI Implementation
When Texas Parks and Wildlife initiated the implementation of a SAS® Business Intelligence system, it was driven by the business units and was not fully supported by our Information Technology division. Although our implementation team, primarily consisting of staff with backgrounds in biology and accounting, had to learn on-the-job about how to get a complex information technology project off the ground, this project has been successful and has radically transformed and improved the efficiencies of many aspects of our business processes. There were many lessons learned by our team along the way, and this paper highlights some of the best practices we developed, as well as several tips and tricks we discovered and would recommend for anyone attempting to implement a SAS BI installation. Some of the topics covered include developing implementation and security plans using best practices, proven tactics for achieving user buy-in, tips for monitoring and ensuring high levels of server performance, and best practices recommendations for providing training and user support.
Dan Strickland, SAS Consulting Services
Alejandro Farias, Texas Parks and Wildlife
Drew Turner, Texas Parks and Wildlife
Development of an Individual-Level Social Determinants of Health (SDoH) Classification Model
Growing evidence in the health services literature suggests that social determinants of health (SDoH) can affect a persons risk for adverse health events and increased cost of care. Individuals can be scored on those factors associated with SDoH measures that are individual in nature, and weights can be created that can be used to generate greater insight into risk, costs, and health care utilization patterns. To develop this prototype model, data elements consist of Z-codes from the claims data of a large Midwest Medicaid payer. Z-codes were created as a new set of supplemental codes with the advent of ICD-10 and are physician-coded attributes at a person level. We examine a subset of Z-codes (55-65) that indicate Persons with potential health hazards related to socioeconomic and psychosocial circumstances. These are set as binary indicators at the individual subject level. A subset of the Z-codes were selected for use in this analysis. SAS® 9.4 was used to research and develop this procedure. Initial research indicates that using a Latent Class statistical model is the best option for creating the classification model. This model was developed and tested for optimization, feasibility, and accuracy. From this model, predicted probabilities were generated, which can then be used in a generalized linear regression model as weights to help define, adjust, and assess the outcomes of interest and combine with 3M Clinical Risk Group (CRG) scores as a model covariate.
Ryan Butterfield, Drph, 3M Health Information Systems, Inc.
Paul Labrec, 3M Health Information Systems, Inc.
Melissa Gottschalk, 3M Health Information Systems, Inc.
Diamonds in the Rough: New Discoveries with the SAS® Visual Investigator Text Analytics Workspace
Session 2015Take your investigative process to a new dimension and incorporate the analysis of your unstructured data. Adding text analytics to your process gives you the potential to discover new relationships between existing or newly discovered entities. Discover new locations, organizations, or people that can further your investigative process. See patterns of common phrases that were previously undetected. Let the analytics reveal common themes that might cluster the unstructured data into meaningful results. The new text analytics workspace in SAS® Visual Investigator provides these features to enhance your investigation.
Danielle Davis, SAS
Discovering Insightful Relationships inside the Panama Papers Using SAS® Visual Analytics
Session 1786Network analytics is a broad methodology that supports the desire to perform link analysis through visual tools such as SAS® Visual Analytics, SAS® Social Network Analysis, and SAS® Visual Investigator. Link analysis visually displays all possible relationships that exist between entities, based on available data, to provide insight into direct and indirect associations. This can be a very helpful tool to support an investigation as a part of a fraud or anti-money laundering investigation process. Beneath the surface, data management techniques and advanced analytical routines are used to discover relationships, transform data, and build the appropriate data structures to support link analysis. Network statistics can help describe networks more accurately by using quantitative data to define complexities and unusual connections between entities. This paper explores an approach to support network analytics and link analysis by using the Panama Papers as a real-world example. The Panama Papers leak is the largest leak of confidential data to-date. The data contained within the Panama Papers provides a wealth of knowledge to financial investigation units because it exposes previously unknown relationships between corporate entities and individuals.
Stephen Overton, Overton Technologies
Distribution Circuit Load Forecasting Using Advanced Metering Infrastructure Data
The ability to perform very short-term to very long-term forecasts of distribution circuit loads at intermediate distribution circuit locations between customer meters and substation feeder buses using advanced metering infrastructure (AMI) data provides significant advantages to distribution system planners and operators in a number of areas. Some of the important applications of these forecasts include anticipation of device overloads, facilitation of decision making for switching operations, and help with integrating distributed energy resources (DER) into system operations. This session presents forecasting results for distribution circuits using SAS® Energy Forecasting, which uses methods such as GLM to generate forecasts. Results for different circuit locations are derived from Ameren Illinois AMI and circuit taxonomy data. Included in the presentation are details of the forecasting methodology and a discussion of applications to distribution system operations.
Prasenjit Shil, Ameren
Tom Anderson, SAS
Do You Need to Put Your Data on a Diet?
You have amassed an amazing amount of data in your Apache Hadoop environment in a relatively short amount of time. Unfortunately, you realize that all of this data cannot be processed in the environment in which it is contained. You have come up with alternatives that cannot be presented to the executive sponsors of this data rich environment. What's next? Many have traveled the path that you are now on. This presentation addresses the concept of how to do more with less by coming up with a diet plan for your data. Big data examples are discussed, along with analytical concepts that you can use to trim the fat.
Howard Plemmons, SAS
Priya Sharma, SAS
Docker Toolkit for Data Scientists
SAS® continues to grow its capabilities of running SAS in the cloud with containers! Learn what a container is and how it can be used to run SAS® Analytics for Containers and leverage the power of in-memory compute capabilities with SAS® Viya®. This paper discusses how SAS containers run in a variety of cloud platforms including Amazon Web Services, Google Cloud Platform, Microsoft Azure, as well as how they run on a private OpenStack cloud using Red Hat OpenShift. Additional topics include provisioning web-browser based clients via Jupyter Notebooks and SAS® Studio to empower data scientists with the tool of their choice, and how to take advantage of working with SAS® Analytics for Containers with Hadoop.
Donna De Capite, SAS
Doin' Data Quality in SAS® Viya®
Session 2156SAS® Viya® introduces data quality capabilities for big data using SAS® Data Preparation and DATA step programming for SAS® Cloud Analytic Services (CAS). In this session, a Senior Software Development Manager at SAS shows how to configure SAS® Data Quality transformations in SAS® Data Studio and how to submit DATA step functions created in SAS® Data Quality for execution in CAS. We also cover management of the vital SAS® Quality Knowledge Base in SAS® Environment Manager.
Brian Rineer, SAS
Drug Abuse: Is the Number of Breweries Playing A Role?
Session 2841The purpose of this paper is to help researchers seek a preventive treatment to predict certain patterns of population on drug abusive behavior. In this study, we use SAS® to build a model to understand the effects of unprecedented factorspolitical influence and the number of brewerieson human mental status and their decision-making process. The number of drug-induced deaths in our study is used to depict an overall picture of drug abusive behaviors existing in the nation. Crime rates among states and the corresponding number of deaths induced by alcohol are another scope that we consider as an aftermath of drug abusive usage. The model provides us with the probability measure of the correlations among political influence, number of breweries, alcohol-induced deaths, and drug-induced deaths. The study constraints political influence, alcohol-induced death, and number of breweries as independent variables to correlate with drug-induced death as the dependent variable. The model that is developed can help establish multiple criteria to target certain populations before drug abusive behaviors are reported, and make an early intervention if possible.
Su Li, Clark University
Nilam Adhikari, Clark University
Easing into Data Exploration, Reporting, and Analytics Using SAS® Enterprise Guide®
Session 1860Whether you have been programming in SAS® for years, are new to it, or have dabbled with SAS® Enterprise Guide® before, this hands-on workshop sheds some light on the depth, breadth, and power of the SAS Enterprise Guide environment. With all the demands on your time, you need powerful tools that are easy to learn and that deliver end-to-end support for your data exploration, reporting, and analytics needs. This workshop uses the current production version of SAS Enterprise Guide, but the content is still useful to users of earlier versions. Included are the following: data exploration tools formatting codecleaning up after your coworkers enhanced programming environment (and how to calm it down) easily creating reports and graphics producing the output formats you need (XLS, PDF, RTF, HTML) workspace layout productivity tips, and additional tips and tricks
Marje Fecht, Prowerk Consulting
Economic Capital Modeling with SAS® Econometrics
Session 2114A statistical approach to developing an economic capital model requires estimation of the probability distribution model of the aggregate loss that an organization expects to see in a particular time period. A well-developed economic capital model not only helps your enterprise comply with industry regulations but also helps it assess and manage operational risks. A loss distribution approach decomposes the aggregate loss of each line of business into frequency and severity of individual loss events, builds separate distribution models for frequency and severity, and then combines the two distribution models to estimate the distribution of the aggregate loss for that business line. The final step estimates a copula-based model of dependencies among the business lines and combines simulations from the dependency model with the aggregate loss models of individual business lines. This process yields an empirical estimate of the enterprise-wide loss distribution, which helps you develop an economic capital model. The process is characterized by both big data and big computation problems. This paper describes how SAS® Econometrics software can help you take advantage of the distributed computing environment of SAS® Viya® to implement each step of the process efficiently.
Mahesh Joshi, SAS
Effective Management of Non-performing Loans Using SAS® Credit Assessment Manager
High levels of non-performing loans (NPLs) or bad debt is problematic both to the bank and to the economy. NPLs from the 2008 financial crisis have persisted, and there is evidence of steady growth. Banks are under pressure to devise strategies to manage their NPL portfolio. Financial regulations, such as IFRS 9 and the guidelines developed by the Basel Committee on Banking Supervision (BCBS), have proposed key guidance to identify, measure, and manage NPLs. Banks need a solution to meet the systemic requirements of an NPL strategy, specifically those of the individual assessment of significant non-performing exposures and high-risk performing loans (also called a watch-list). However, given the high volume of loans, an individual credit assessment of loans that does not result in delayed expected credit loss (ECL) recognition is the challenge that banks are facing now. SAS® Credit Assessment Manager provides a framework for both the qualitative and quantitative assessment of NPLs individually. This framework, built on regulatory guidance, complements the comprehensive assessment of banks assets. Banks can properly diagnose an NPL using cash flows and collateral allocation to determine the impairment amount. SAS Credit Assessment Manager is a workflow-based stand-alone solution that seamlessly fits into any credit risk management infrastructure. This paper discusses best practices and mechanics of using SAS Credit Assessment Manager for the individual assessment of an NPL.
Satish Garla, SAS
Sumanta Boruah, SAS
Efficient Elimination of Duplicate Data Using the MODIFY Statement
Session 2426Many an extract, transform, load (ETL) transformation phase starts with cleaning the extracted data from duplicate data. For example, transactions with the same key but different dates might be deemed duplicate, and the ETL needs to choose the latest transaction. Usually, this is done by sorting the extract file by the key and the date, and then selecting the most recent record. However, this approach might exact a very high processing cost if the non-key (or, satellite) variables are numerous and long, and especially if the result needs to be re-sorted to bring the extract into the original order. This paper shows that this kind of data cleansing can be accomplished by using a principally different algorithm based on merely marking the duplicate records for deletion in the original extract file. In real-life scenarios, the proposed approach, particularly when the duplicates are relatively rare, can result in cutting the processing time by more than an order of magnitude.
Paul Dorfman, Dorfman Consulting
Efficient Use of Disk Space in SAS® Application Programs
Session 2362This session is a tutorial on managing disk space for SAS® data sets and files created by or for the SAS system. Basic housekeeping is covered: keep files that are in-use and back up or discard files that are not in use. Backup methods are discussed, including the important question of whether the operating system that your SAS site runs on might change in the future, necessitating use of the special transport format for backup files. SAS procedures that are commonly used for disk file management are described: DELETE, DATASETS, and CATALOG. The DELETE statement in the SQL procedure and SAS DATA step functions for file management are also discussed. File compression is a very important tool for saving disk space, and the SAS options for this are described. Logical deletion of rows in a data set can waste disk space; prototype SAS code to detect files with this condition is supplied in an appendix. Multiple SAS programming techniques that promote efficient use of disk space are described, as well as suggestions for managing the SAS WORK library.
Thomas Billings, MUFG Union Bank
Efficiently Join a SAS® Data Set with External Database Tables
Joining a SAS® data set with an external database is a relatively common procedure in research, but there are several methods to improve its efficiency. If only one external database table is involved, the join is an inner join, and using the dbkey data set option should be quick and easy. The second method is to generate a macro list based on the SAS data set, which can then be used to abstract data. Since the macro list works in both implicit and explicit SQL pass-through connections to the external database, and the data abstraction procedure could be very complex, this method is very flexible. If an analyst or an investigator has the permission, he could import the SAS data set into the database and use it like other database tables for data abstraction. However, in most cases, data analysts or investigators are not allowed to perform this kind of operation due to security or other reasons, limiting its use in the real work environment. The fourth method is to import a SAS data set as a temporary table, which in most cases does not need permission from database administrator and is automatically dropped after the single PROC step. In a real-world scenario, generating a macro list or importing a SAS data set as a temporary table is, in most cases, easily implemented and makes it flexible to join a SAS data set with external database.
Dadong Li, New York University Medical Center
Michael Cantor, New York University Medical Center
Enable Personal Data Governance for Sustainable Compliance
Session 2052In the context of European Unions General Data Protection Regulation (GDPR), one of the challenges for data controllers and data stewards is to identify the personal data categories in an application in a very short amount of time, document them, and then keep an up-to-date view. We propose an approach to automate the governance efforts and to significantly reduce the amount of time and effort needed to have the latest view of the personal data, therefore better servicing customers and answering the regulator. We use several processes developed in SAS® Data Management Studio to identify the personal data and update the governance view within SAS® Business Data Network and SAS® Lineage. We demonstrate several features in other products such as the Personal Data Discovery Dashboard in SAS® Visual Analytics, the Personal Metadata Linker in SAS® Data Management, and SAS® Personal Data Compliance Manager as it applies to Records of Processing Activities and the Data Protection Impact Assessment.
Vincent Rejany, SAS
Bodan Teleuca, SAS
Enhancing Subscription-Based Business by Predicting Churn Likelihood
Customer retention is a challenge faced by most businesses in today's competitive market. Predicting customer churn would help a subscription business such as KKBOX in creating substantial difference in their revenue stream. This paper describes work relating to predicting churn likelihood using SAS® 9.4 and SAS® Enterprise Miner(tm) for data cleaning, preparation, and analysis. KKBOX provides streaming services to millions of users, with over 30 million sound tracks. It provides both free and premium (paid) streaming services on various devices, including wearables. The paid subscribers have the added benefit of being able to also play the music offline. The work described by this paper includes segmentation of the paid subscribers into meaningful categories based on both transactional and listening behaviors. Insights from segmentation can help in formulating customized strategies to enhance customer retention, loyalty, and profitability. The work also includes a summary of predictive model built to identify customer churn for the KKBOX music subscription service. The predictive model offers insights into potential patterns between churners and non-churners based on recent usage, usage rate, number of unique songs heard, and whether customers opted for auto renewal or other features. As acquisition of new customers in any business is usually expensive, subscription services like KKBOX can benefit financially from investing in retention of its existing customers.
Smitha Etlapur, Oklahoma State University
Venkata Sai Mounika Kondamudi, Oklahoma State University
Sujal Reddy Alugubelli, Oklahoma State University
Varsha Reddy Akkaloori, Oklahoma State University
Enterprise, Prepare for Analysis! Using SAS® Data Management to Prepare Data for Analysis
Session 2693Data analysis teams often have few resources for many needs. Using SAS® Data Management, we can integrate enterprise-level data from different sources, cleanse that data, and enrich it for analysis. These steps reduce the workload for the analysis team, enabling them to focus on the actual analytics. This paper highlights some uses of enterprise-level tools in SAS® Data Integration Studio and SAS® Data Quality in data preparation. Several of the techniques shown are also applicable to users of Base SAS® (with the SAS® macro facility).
Bob Janka, Experis
Estimation of Correlation Coefficient in Data with Repeated Measures
Session 2424Repeated measurements are commonly collected in research settings. While the correlation coefficient is often used to characterize the relationship between two continuous variables, it can produce unreliable estimates in the repeated measure setting. Alternative correlation measures have been proposed, but a comprehensive evaluation of the estimators and confidence intervals is not available. We provide a comparison of correlation estimators for two continuous variables in repeated measures data. We consider five methods using SAS/STAT® software procedures, including a nave Pearson correlation coefficient (PROC CORR), correlation of subject means (PROC CORR), partial correlation adjusting for patient ID (PROC GLM), partial correlation coefficient (PROC MIXED), and a mixed model (PROC MIXED) approach. Confidence intervals were calculated using the normal approximation, cluster bootstrap, and multistage bootstrap. The performance of the five correlation methods and confidence intervals were compared through the analysis of pharmacokinetics data collected on 18 subjects, measured over a total of 76 visits. Although the nave estimate does not account for subject-level variability, the method produced a point estimate similar to the mixed model approach under the conditions of this example (complete data). The mixed model approach and corresponding confidence interval was the most appropriate measure of correlation as the method fully specifies the correlation structure.
Katherine Irimata, Arizona State University
Paul Wakim, National Institutes of Health
Xiaobai Li, National Institutes of Health
Evaluating the Accuracy of Clinical Prediction Models for Binary and Survival Outcomes
Session 2831Clinical prediction models use regression-based methods to elucidate potential predictors of outcomes. For a binary outcome, the LOGISTIC and HPLOGISTIC procedures offer options for model development, testing, and validation. Several fit statistics can be used to gauge the predictive accuracy of a model as well as for comparisons between competing models. In the appropriate context, these statistics include sensitivity, specificity, positive and negative predictive values, the receiver operating characteristic (ROC) curve, and concordance indices. A similar development for survival models faces many challenges, including the feature of time-dependent outcome and censoring in accrued survival data. New options in the PHREG procedure permit calculation of some of the aforementioned fit statistics. We discuss their interpretation and illustrate their application with empirical data sets.
Joseph Gardiner, Michigan State University
Evaluation of Different Approaches to Reject Inference: A Case Study in Credit Risk
When creating a scorecard for adjudication, you are often faced with missing performance on rejected applications in the development sample. It is widely believed and circulated in the industry that not including the rejects, and thus building a model based only on known good and bad(KGB), introduces a bias when the resulting scorecard is intended for the through-the-door (TTD) population. Different methods have been proposed to deal with this situation. In this paper, we conduct an empirical study analyzing the three methods most often encountered in the industry: re-classification, re-weighting, and parceling. These three methods together with a control state of no reject inference are tested on adjudication data in a financial institution.
Sergiu Luca, Desjardins
Examining the Drivers of Hospital Re-admission of Type-2 Diabetic patients
Session 2819Currently, one out of three people in the U.S. is expected to develop diabetes in their lifetime. As Type 1 diabetes is genetic and involves hereditary transmission, we concentrate on Type 2 diabetes in our analysis. Treatment regimens in Type 2 diabetes are complicated, surrounding lifestyle adaptations and social behavior. Around one-fifth of the Medicare beneficiaries with Type 2 diabetes discharged from a hospital get re-admitted within 30 days. Predicting these potential re-admissions can help to improve quality of care by health care providers. The objective of this paper is to assess the drivers that influence re-admissions and to predict potential readmission for a Type 2 diabetes patient. A data set of approximately 600,000 observations and 70 variables containing quantitative and qualitative information about patients provided by the Center for Health Systems Innovation at Oklahoma State University was used for this analysis. Various factors were used to analyze the re-admissions, such as gender, race, diagnosis information, comorbidity scores, risk scores, cumulative hazard rate, and other patient- and hospital-level factors. We conclude from the resulting predictive model that patient prevention group, diagnosis category, cumulative hazard rate, and risk score are the most important variables for predicting re-admission. This analysis can be extended to analyze planned and unplanned re-admissions so that separate treatment regimens can be developed for them.
Shashank Reddy Gudipati, Oklahoma State University
Excelling to Another Level with SAS®
Have you ever wished that with one click (or by submitting a single macro call) you could copy any SAS® data set, including or excluding either variable names or labels, and automagically create or modify Microsoft Excel workbooks, or paste tables into Microsoft Word files or Microsoft PowerPoint presentations? Or how about doing any of the above but, at the same time, create pivot tables or base new worksheets on existing Excel templates or pre-formatted workbooks so that you don't have to spend time duplicating such things as font usage, effects, formulas, highlighting, or graphs? You can, and with just Base SAS®. There are some little known but easy-to-use methods that are available for automating many of your (or your users) common tasks.
Matthew Kastin, NORC at the University of Chicago
Arthur Tabachneck, Analyst Finder, Inc.
Tom Abernathy, Pfizer, Inc
Experiences and Pitfalls Establishing a Smart Data Lab and Transferring Prototypes into Production
Session 1908Fraport AG is among the leading groups of companies in the international airport business. With Frankfurt Airport, the company operates one of the worlds most important air transportation hubs with more than 64 million passengers per year. After investing in SAS® High-Performance Analytics, Fraport decided to run a first data analysis project to gain value out of its investment. Under the leadership of Group Strategy and the IT department, a first Smart Data Lab was convened to investigate the four top management concerns. The results were convincing and the Smart Data Lab developed into a group-wide instrument in the data-driven solution of complex questions. Gain insight into the journey from the idea all the way to the practice and consolidation of a laboratory environment for data analysis and into the pitfalls of bringing a prototype into operation.
Christian Wrobel, Fraport AG
Exploring Web Services with SAS®
Session 1937Web services are the building blocks of the APIs that drive the modern web. From Amazon Web Services to Philips Hue light bulbs, web services enable almost limitless integration. In this paper, we explore what a web service is and how we can use them via SAS®. We look at mechanisms to call web services from SAS using the HTTP and SOAP procedures, how to create web services from SAS® Stored Processes using SAS® BI Web Services, and consider the future of web services and SAS with SAS® Viya®, along with some practical examples.
Richard Carey, Demarq
Extending the Capability of SAS® Forecast Studio to Enable Custom Exception Handling
SAS® Forecast Studio enables you to generate a large number of forecasts in a fast and automated way. However, examining forecast exceptions can become very time-consuming. This paper demonstrates how to automate the process of selecting alternate models and adjusting forecasts for different types of exceptions. See how our method uses model-related data sets and catalogs that are generated as part of a SAS® Forecast Studio project, based on data from a large fashion retailer. In a programmatic way, alternate models are selected, and the related data sets are updated. The resulting alternate model selections can be viewed in SAS® Forecast Studio. This paper describes the exception types addressed, the overall process flow for selecting forecast exceptions, and the method used to select alternate models. In addition, out-of-sample accuracy results for the original and alternate models along with a comparison to a nave benchmark model demonstrate how applying this approach improves forecast results.
Divya Guru Rajan, CoreCompete
External Databases: Tools for the SAS® Administrator
Session 2357SAS® Management Console enables the configuration of several external data sources for end users. The literature often diminishes the complexity and importance of the configuration aspect of external databases, making this task somehow ambiguous. Most administrators find themselves overwhelmed by the number of options available to them when configuring these sources. As it is possible to fly through the configuration process by choosing the default values, the resulting database access can easily be flimsy or have sub-par performance. This paper provides the administrator with a complete set of tools that help them define external data source connection from scratch. It also helps in optimizing the available options, from the standards to the most advanced ones, in order to make the most out of that connection.
Mathieu Gaouette, Prospective MG
Extracting Data from Zipped XML Files with the ZIP and XML FILENAME Options in SAS®
One of the challenges facing statistical agencies is leveraging large volumes of transactional data. A common means of transmitting this type of information is in zipped batches of XML files. This poster describes how the SAS® FILENAME ZIP access method can be used to first read in the XML files from a zipped folder, and then read in and parse the XML data using the FILENAME XML option. This poster also describes the use of the XML Mapper program to generate a schema to parse the data according to the programmers specification. Finally, SAS macro language techniques are applied to further automate the process across all XML files in all zipped folders in a given directory.
Maura Bardos, Energy Information Administration
Andrew Thomson, Energy Information Administration
Extracting the Embedded Git Repository from a Project in SAS® Enterprise Guide® 7.1
Session 2132SAS® Enterprise Guide® 7.1 includes an easy-to-use change tracking capability that is based on the Git version control system. This feature enables the user to maintain and manage the history of changes to all scripts in SAS Enterprise Guide. Furthermore, the user has the ability to access and modify the existing history of an externally controlled script from within SAS Enterprise Guide. Although this is an extremely useful feature, it does not currently support extracting the embedded Git repository. In some cases, it might be necessary to extract the embedded history for use outside of SAS Enterprise Guide. For example, as the number of contributors to a project grows, it might be more efficient to manage the version control process using an external application that supports branch creation and merging. The main objective of this e-poster is to demonstrate a way to extract the embedded Git repository from a project in SAS Enterprise Guide.
Shahriar Khosravi, BMO Financial Group
Factors versus Clusters
Session 2868Factor analysis is an exploratory statistical technique for investigating dimensions and the factor structure underlying a set of variables (items). Cluster analysis is an exploratory statistical technique to group observations (people, things, events, and so on) into clusters or groups so that the degree of association is strong between members of the same cluster and weak between members of different clusters. Factor and cluster analysis guidelines and SAS® code are discussed, and results for sample data analysis are illustrated and discussed. The following SAS procedures are demonstrated: FACTOR, CORR, STANDARDIZE, CLUSTER, and FASTCLUS.
Diana Suhr, SIR Consulting
FCMP: A Powerful SAS® Procedure You Should Be Using
The FCMP procedure is the SAS® Function Compiler procedure. As the name suggests, it enables you to create and save functions (and subroutines) that can be of varying size and complexity. The functions and subroutines can then be integrated with other built-in SAS functions in your SAS programs. This integration simplifies your SAS programming and facilitates code reuse that can be shared with your fellow SAS programmers. You should be using this procedure. The modularity of your code will increase. The readability of your code will increase. The maintainability of your code will increase. Your productivity will increase. Come learn how this powerful procedure will make your programming life simpler.
Bill Mcneill, SAS
Files Arriving at an Inconvenient Time? Let SAS® Process Your Files with FILEEXIST While You Sleep
Session 2562The FILEEXIST and SLEEP functions can be paired together to iteratively scan a location on a network drive for the arrival of a file. This paper provides a simple framework to control the interval between these attempts to locate the file and also control the acceptable number of scans to perform before ceasing the operation. These controls are accomplished through parameters defined in LET statements and executed through a DO UNTIL processing loop. The example provided can be a stepping stone for beginner or intermediate programmers to understand the basics of macro processing and DO loops, while also providing them with a helpful tool to automate their file processing.
Matthew Shotts, Educational Testing Service
Find Your SAS® Sensei __
In martial arts, a sensei (__) is one with more experience who can guide you along the path. When practicing the art of SAS®, it is wise to pay heed to the SAS senseifor many paths log to darkness. Fortunately, there are many communities to be found. The web is overflowing with gurusyou might even have read it somewhere. Professionals can be found so long as you are in the right links. Rookies are welcomed every year! Don't be a twitas a last resort, you can even find Support. Experience new possibilities in your SAS, with the power of knowing how to find the right resources for your needs. Join us to level-up your ranking, with not a training course or WebEx in sight! No matter what your experience level, you can always improve. Those who can, teach.
Allan Bowe, Boemska
Finding and Generalizing a 'Best Before' Date
Session 1861Do you need a Best Before date that is 18 months from manufacture, and that is represented as a month end date? Do you manually provide dates as input to your processes? Do you struggle to get dates into the right format for database queries, or for your reports and dashboards? This presentation will help you find the right date, and then generalize the coding to avoid manual input, repetitive and messy coding, and frustration. Examples emphasize the easy manipulation of dates, and focus on generalization to support flexible coding, including: Dynamically identifying date ranges, such as reporting and analytics periods (current calendar year; most recent 6 months; past 90 days; current fiscal year; year over year) Dynamically generating field names that represent date values or ranges Controlling the appearance of date values in reports Generating date-time stamps for file names, without special symbols
Marje Fecht, Prowerk Consulting
Finding the Best Tools for SAS® Programs Change Management within a Regulated Environment
Change management within regulated environments among a multi-functional team, over different projects, can pose many challenges. Recent enhancements to SAS® Enterprise Guide® have added new debugging features along with tools to monitor and maintain audit trails of SAS® programs. These features are easy to use and powerful. There are other tools apart from SAS such as GitHub that can also be used to manage changes to SAS programs. This paper evaluates all the tools mentioned to find the best solution. Some of the criteria included in this evaluation include: ease of use: How easy is it for users to gain program change management capabilities? functionality in SAS environments: Apart from a SAS Enterprise Guide project file, can it adapt to SAS files in batch mode or stored processes? configuration and administration: Is it easy to set up, and does it require additional administration efforts beyond those of a SAS programmer? feature rich: How many different useful functions does the tool have for SAS programs and related outputs? Data integrity is of paramount importance in a regulated environment and involves tracking changes and maintaining an audit trail. The uniqueness of a statistical computing environment is due to system configuration, and the end users usage of the application. Thus, selecting and implementing the optimal change management system that does not significantly impact users productivity is imperative.
Sy Truong, Pharmacyclics
Pradheep Raman, Pharmacyclics
Finding the Treasure: Using Geospatial Data for Better Results with SAS® Visual Analytics
Traditional business intelligence systems have focused on answering the who, what, and when questions, but organizations need to know the where of data as well. SAS® Visual Analytics makes it easy to plot geospatial data, which can add a completely new element to your data visualizations and analysis. When looking at a tabular report, you notice multiple columns that represent customers, competitors, and demographic information. But when you place the same geocoded data on a map, new insights jump off the page! Perhaps you see where the better customers are, where they are in relation to competitors, and the regions that provide the most market potential based on underlying demographics. These finding are about the where of the data. Its like creating a treasure map. In this session, you learn how to use SAS Visual Analytics geospatial objects, how to create custom geographic data items to use with the maps, and see many examples of how others have incorporated maps for snazzy data presentations.
Tricia Aanderud, Zencos
Fine-Tuning Topical Content in Written Expressions in Cloud-Based Environments Using SAS®
Session 2070Our recent work in the examination of variability in topic content among different social groups in social media demonstrates our ability to automatically detect different words and expressions that imply the same or similar meaning. The approach relies on first carrying out a vector-based topic identification that is computed across the entire conversational corpus in the social media collection. In the next step, various social groupings or sub-nets are identified. Social network membership and role (leader/follower) are identified so as to weight potential conversational influence. Finally, word and phrase lists are generated for each of the topic scores in each of the social groupings. The word and phrase lists are identified through rule induction through the use of either machine learning or specialized procedures such as the Boolean Rule generator in SAS® Text Analytics. By examining the overlap of common terms for topics among the various sub-nets, we can identify which descriptors apply across all social groups and which specialized, idiosyncratic, and idiomatic descriptors emerge in various sub-networks. This general approach is illustrated with an examination of various social groupings that are identified using the collected transcripts of all SAS Global Forum papers from the early 70s to the most recent.
Gurpreet Bawa, Accenture
Fitting Compartment Models Using PROC NLMIXED
The CMPTMODEL statement is a new enhancement to the NLMIXED procedure in SAS/STAT® 14.3. This statement enables you to fit a large class of pharmacokinetics (PK) models, including one-, two-, and three-compartment models, with intravenous (bolus and infusion) and extravascular (oral) types of drug administration. The CMPTMODEL statement also supports multiple dosages and PK models that have various parameterizations. This paper introduces the new statement and illustrates its usage through examples. Related concepts are also discussed, such as the %PKCONVRT autocall macro (which converts PK data sets that are stored according to industry standard to data sets that can be directly used by PROC NLMIXED), extension to Emax models, prediction, visualization, and fitting Bayesian PK models (by using the MCMC procedure).
Raghavendra Rao Kurada, SAS
Fang Chen, SAS
Five Approaches for High-Performance Data Loading to the SAS® Cloud Analytic Services Server
SAS® Viya® offers the SAS® Cloud Analytic Services (CAS) Server as the in-memory analytics engine to serve the demands of powerful analytics, the scope and volume of which are ever increasing. However, before that analysis can occur, large quantities of data must often be loaded into the CAS memory space. Therefore, the transfer of huge amounts of data from source systems into CAS is a major consideration when planning the architecture of your SAS® solution. High-performance data transfer is often provided by loading data using multiple parallel channels. This process uses many pathways simultaneously to transfer data at a rate several times faster than is possible using a single, serial path. Using this kind of high-performance data transfer is a natural complement to the speed and efficiency of in-memory analytics provided by CAS. CAS now provides five different major techniques for the parallel transfer of data over a wide range of potential data sources. This paper illustrates those techniques, explains the technology, and highlights the benefits of high-performance data transfer across multiple parallel channels.
Rob Collum, SAS
Flight 3411 and Its Aftermath: A Burlesque Analysis via Tweets
A viral outrage targeting the brand United Airlines emerged on April 9, 2017 after it forcibly removed a passenger from United Express Flight 3411. Millions of people witnessed videos of this incident, which was spread over social media, and the brand received several mentions in Twitter filled with criticism. The aftermath of the incident was even more disastrous. From referring to the passenger removal as re-accommodation to the price of stocks plummeting, it was a harsh learning experience for United Airlines. Some competitors took a jab at the airline by ridiculing and taunting the company on Twitter, and United Airlines was targeted in a burlesque way. In this paper, we analyze the overall sentiment surrounding the Flight 3411 incident and how other airlines gained a competitive advantage by parodying the brand. For this analysis, we extracted data from Twitter by accessing the REST API over time, and we used SAS® Enterprise Miner(tm) to understand how several brands used this as a catalyst and leveraged the opportunity to promote their own services. The Google Drive Python API was used to download the tweets, and Base SAS® was used to clean and analyze them. We used concept links to understand relationships between terms used in the tweets. Finally, we discuss avenues for future research in the area of competitive marketing through spoof advertising.
Vinoth Kumar Raja, West Corporation
Padmashri Janarthanam, University of Nebraska Omaha
Fluid and Dynamic: Using measurement-based workload prediction to dynamically provision your Cloud
As the capabilities of SAS® Viya® and SAS®9 are brought together, alongside the integration with open-source technologies such as R and Python, it appears that a set of technologies with wide-ranging resource requirements will inevitably end up sharing the same infrastructure. Its more than likely that some traditional SAS® batch jobs will continue to run throughout the night, probably joined by some newly scheduled Python programs. Increasing amounts of data will be loaded into memory every day, but for many customers, month-end processing cycles will always be a thing. As organizations naturally look to progress toward the cloud, the resource requirements for such an environment will appear complex and might seem overwhelmingly expensive. Will high I/O instances be required or is it better to opt for compute-optimized? Will there be benefit from a Burstable Performance setup? While all of the above might be a common conclusion, once an environment is productionized and put through its paces, usage patterns should emerge that can be used to both optimize the use of the resources available and scale the infrastructure to what is actually required. This paper discusses the possible use of granular performance measurements to gauge periodic workload requirements, in order to both plan and execute appropriate dynamic alterations to provisioned cloud infrastructure, and to ensure that the infrastructure that is currently in place remains optimally utilized.
Nik Markovic, Boemska
Forecasting the Value of Fine Wines
Session 1829Fine wines have gained attention globally as an investment opportunity with possible diversification benefits relative to more traditional investments and potentially high rates of return. Using a database from auctionforecast.com with over two million prices from auctions globally, we analyzed the price dynamics of fine wines to quantify predictive factors for investors. Brand, vintage, ratings, auction houses, bottle size, age of the wine, market trends, and more are considered in this analysis. Although data mining techniques are common in other applications, long-range forecasting requires a careful separation between current discrimination factors and long-term drivers. In this talk we demonstrate how one must handle these issues in model development. The models used are a combination of Age-Period-Cohort models and traditional scoring techniques. This approach has direct applicability to consumer behavior in many industries and has specifically been used to great success in retail lending.
Joseph Breeden, Prescient Models
Forecasting: Something Old, Something New
ARIMA (autoregressive integrated moving average) models for data taken over time were popularized in the 1970s by Box and Jenkins in their famous book. The SAS® procedures PROC ESM (exponential smoothing models) and PROC UCM (unobserved components models, which are a simple subset of statespace modelssee PROC SSM) have become available much more recently than PROC ARIMA. Not surprisingly, since ARIMA models are universal approximators for most reasonable time series, the models fit by these newer procedures are very closely related to ARIMA models. In this talk, some of these relationships are shown and several examples of the techniques are given. At the end, the attendee will find that there is something quite familiar about these seemingly new innovations in forecasting and will have more insights into how these methods work in practice. The talk serves as an introduction to the topics for anyone with some basic knowledge of ARIMA models, and the examples should be of interest to anyone planning to analyze data taken over time.
Dave Dickey, North Carolina State University
Fraudsters Love Digital : An Emerging Threat Missed on Many Insurers Digitization Roadmaps
In most financial services companies and at almost every conference, there is a real buzz around digital transformation programs and digital disruption initiatives that will significantly impact, if not reinvent, many insurers and their business role today. This is all seen as positive and good, helping to deliver new and improved service levels, broaden appeal, and potentially reduce servicing costs. So for the vast majority of customers this is good news. But it is also good news to the small minority (perhaps up to 10% of claimants) who are looking to actively defraud the insurer. In this presentation, we explore the need to build-in suitable safe guards to ensure that the rise of the armchair fraudster doesn't go unchallenged. The presentation illustrates how insurers are embracing real-time fraud analytics to help them in this fightbut without compromising the good that such programs deliver for the majority of customers.
David Hartley, SAS
Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS®
There is confusion among many SAS® users about the similarity and differences in how SAS uses frequencies, sampling weights, and unequal variance weights when estimating parameters and their variances. This paper describes the calculation details for each and compares the results using several SAS procedures. The author also gives advice on which is appropriate for different situations.
Robert Lucas, R M Lucas Consulting
Frequently Asked Questions about Getting Started with SAS® Grid Manager
Scalability. Fault tolerance. Load balancing. High performance. High Availability. These are all phrases that commonly show up during analytics infrastructure conversations. However, maybe you are still confused about how all this relates to SAS® Grid Manager and what it would mean for your organization. When considering SAS Grid Manager, many customers have similar points of confusion: How will their SAS® workflow change? How do they connect to the grid? How should they manage it, and what do they need to do in order to successfully implement it? In this paper, we cover questions we are asked most often about implementation, administration, and usage of SAS Grid Manager.
Nick Welke, Zencos
From Academy to Industry: An Experimental Student Journey
The choice of a course or school is a key moment in the life of any student. This paper presents an experimental approach to the student journey, which starts with admission to a university and ends on graduation day. During this journey, it is fundamental to provide several opportunities for the students to interact with industry experts and experience real-life challenges. In order to build a student journey, it is important to build an inside-out analytical culture: starting from the university, and then passing it through to the students. This paper uses an experimental path starting at Universidade Lusfona de Humanidades e Tecnologias (Lusophone University of Humanities and Technologies). ULHT is the largest Portuguese private university, presenting its academic offerings through different schools. The School of Communication, Architecture, Arts and Information Technologies (ECATI) presents itself as an innovative university project in the Portuguese context and in the European space of higher education. It is aimed at the development of education at all levels of higher education.
Francesco Costigliola, University Lusófona of Humanities and Technology
From Idea to Implementation: How a SAS® Communities Thread Changed How Sleep Number Direct Markets
Session 2756Born from a 2017 thread regarding who was traveling the farthest for SAS® Global Forum in Orlando, our odyssey began with becoming acquainted with the ZIPCITYDISTANCE function. Calculating distance with SAS® was not new to us, but this function was. Previously we had been calculating distance with the zipcode centroid method from code we found on the web! At SAS Global Forum, we learned of the GEODIST function. We soon found that by using this new calculation, we could determine our closest customers with greater accuracy. Now we wouldnt mistakenly send offers to people who might be farther away. Next, we wanted to know whether we could calculate the distance to their closest Sleep Number store for over 15 million people who have purchased or considered the purchase of a Sleep Number bed. We determined that the increased precision of GEODIST could replace certain processes but would entail performing 8.8 billion calculations daily (never an issue for SAS). Through this change, we saw customers closest store accuracy improve by over 35%, and we are measuring customer purchase tendencies from these changes.
Aaron Carlson, Sleep Number Corporation
From Unstructured Text to the Data Warehouse: Customer Support at the University of North Texas
Session 1900Traditional business intelligence uses a data warehouse to generate reports that increase organizational knowledge. In the current big data environment, organizational data include large collections of unstructured documents, especially in the domain of customer support. However, a formal process of expanding the traditional dimensional model to include elements that are derived from such collections is often missing. In this presentation, we provide a case study from the IT Shared Services (ITSS) division at the University of North Texas (UNT). As part of the UNT Systems ServiceNow initiative, we examine a collection of ITSS service work-order data to take unstructured text to the data warehouse. Going beyond traditional reporting elements such as service requests by time, or request category by department, we show how text analytics help uncover the hidden dimension of service topic. Based on this dimension, derived facts, such as certain service tickets addressing certain topics, are added to the data warehouse. These uncovered elements represent a part of organizational knowledge that would otherwise remain undetected, and that can be used by decision makers to improve customer support and address service issues.
Nick Evangelopoulos, University of North Texas
Fun with Address Matching: Use of the COMPGED Function and the SQL Procedure
Session 2487Address matching is often a challenging task for a SAS® programmer. What seems like a relatively straightforward quest ends up amounting to hours of frustration and manual record review. Who knew there were more than five ways to spell the word North? Now, multiply this by 20+ words that might have varying naming conventions (I want to scream just thinking about it). This paper discusses some common data cleaning techniques used in address matching, including the TRANSWRD and COMPRESS functions. A roadmap for the use of the COMPGED function and the SQL procedure to identify matches is provided. The advantages and flexibility of this approach are sure to drive you on your way to address matching in no time.
Alexandra Varga, Kaiser Permenente Center for Health Research
Suzanne Salas, Kaiser Permanente
Elizabeth Shuster, Kaiser Permanente, Center for Health Research
Fuzzy Matching Programming Techniques Using SAS® Software
Data comes in all forms, shapes, sizes, and complexities. Stored in files and data sets, SAS® users across industries know all too well that data can be, and often is, problematic and plagued with a variety of issues. When unique and reliable identifiers are available, users routinely are able to match records from two or more data sets using merge, join, and/or hash programming techniques without a problem. But what happens when a unique identifier, referred to as the key, is not reliable or does not exist? These types of problems are common and are found in files containing a subscriber name, mailing address, or misspelled email address, where one or more characters are transposed, or are partially or incorrectly recorded. This presentation introduces the concept of fuzzy matching; a sampling of data issues users have to deal with; popular data cleaning and user-defined validation techniques; the application of the CAT functions; the SOUNDEX (for phonetic matching) algorithm; the SPEDIS, COMPGED, and COMPLEV functions; and an assortment of programming techniques to resolve key identifier issues and to successfully merge, join, and match less than perfect or messy data.
Stephen Sloan, Accenture
Kirk Paul Lafler, Software Intelligence Corporation
Get Better Weight of Evidence for Scorecards by Using a Genetic Algorithm
The scorecard is a very important risk management tool in credit card businesses. When a client applies for a credit card at a bank, if his scorecard is lower than a cutoff score (for example, 520), then the bank is going to reject his application. The weight of evidence (WOE) plays the most important role in how to build a better scorecard in order to distinguish a risky applicant from a better applicant. This paper tries to get better WOE by solving an optimization problem via
Keshan Xia, 3GOLDEN Beijing Technologies Co. Ltd.
Peter Eberhardt, Fernwood Consulting Group Inc.
Kastin, NORC at the University of Chicago
Getting More Insight into Your Forecast Errors with the GLMSELECT and QUANTSELECT Procedures
Session 1673Is it sufficient just to monitor the quality of your forecast models over time? Can data science methods identify the drivers for large forecast errors and provide more insights than descriptive statistics? Do demand planners really improve forecast accuracy with their manual overwrites? Using a real-life case study, this paper answers these questions. It shows how you can study the impact of factors like product group, forecast horizons, seasonality, or the forecast model type on forecast accuracy and convert them into actionable results. You learn how univariate methods provide first insights into the structure and relationships of your forecast data. You gain insight into how manual overwrites of the statistical forecast change forecast accuracy in both directions and how you use analytical and graphical methods to illustrate these findings. You see how multivariate analytical methods like linear and quantile regression provide additional relevant insight. You learn how to use the GLMSELECT, QUANTSELECT, and QUANTREG procedures to identify the most important influential factors on the forecast error. You see how you can enhance and interpret the output of these procedures to quantify the effects of the influential factors. You learn how to convert the results from the SAS® procedures into actions to improve your forecasting process. The paper shows an outline of how to use the REGSELECT and QTRSELECT procedures to apply these methods in SAS® Viya®.
Gerhard Svolba, SAS
Getting Started with Bayesian Analysis
Session 2909Getting Started with Bayesian Analysis will give a brief introduction to Bayesian statistics and its analysis within SAS. Participants will learn the difference between Bayesian and Frequentist approaches to statistics and be introduced to PROC MCMC.
Danny Modlin, SAS
Getting Started with Survival Analysis
Session 2910Modeling time to an event poses particular challenges that are different from either logistic regression or linear regression modeling. This presentation gives you the essentials to start using this technique right away, to make sense of censored data (observations for which the time to the event is only partially determined), and to explain your findings in a meaningful way. You start by learning about the data set structures required for analyzing time to event data. Next, you find out how to plot the critical survival functions and perform non-parametric homogeneity tests. You will then learn how to incorporate categorical and continuous input variables, using the Cox proportional hazards model, to determine the factors that might affect the time to the event of interest.
Marc Huber, SAS
Getting Started with Time Series Models
Session 2908Getting Started with Time Series Models will introduce the basic features of time series variation, and the model components used to accommodate them. Participants will be introduced to three families of time series models. Comparisons and contrasts among these families will be discussed.
Danny Modlin, SAS
Getting to Know the No-Show: Predictive Modeling of Missing a Medical Appointment
Patients not showing up for appointments, without canceling, are still a major loss of efficiency for health-care clinics, and reduce patient face-time with health-care providers. This in turn leads to a loss of revenue for health-care clinics, and overbooking of patient appointments. Using a data set of 110,000 medical appointments collected over a three-month period, different health-care populations were identified based on age and frequency of health care use. Once health-care populations were selected, stepwise logistic regression was performed using the LOGISTIC procedure in order to identify factors within and across populations that are relevant indicators of no-show appointments. The data was spilt for cross validation, with approximately 80% allocated to build the fitted models, and 20% reserved for validation. The ROC curves for the training and validation data sets were very similar and ranged from 0.68 to 0.75, showing good diagnostic capability for their respective models. The accuracy of the prediction model was also evaluated on the validation data set, and the model correctly categorized no-show appointments 76%82% of the time. The appointment being on the same day as scheduled, receiving a reminder text, age of the patient, adulthood status, and the presence of a chronic condition all contributed to the likelihood of an appointment being a no-show.
Joe Lorenz, Grand Valley State University
Kayla Hawkins, Grand Valley State University
Robert Downer, Grand Valley State University
Harmonize Your SAS® Environments with Hot Fix Versions
Session 2572The organizations that use SAS, tend to have more than one SAS environment, with at least one of them acting as development or test, and the other one as a production environment. Many of them might have sandbox and user acceptance testing (UAT) environments in addition to development, test, and production environments, depending on the software development life cycle approach that they use. At times it becomes difficult for SAS administrators or SAS installation representatives to maintain a consistency of hot-fix versions for SAS products across the environments. This paper provides an approach for how you can avoid running into hot-fix version inconsistency across your SAS environments.
Jitendra Pandey, Electrolux Home Products Inc.
Harvesting Unstructured Data to Reduce Anti-Money Laundering (AML) Compliance Risk
As an anti-money laundering (AML) analyst, you face a never-ending job of staying one step ahead of nefarious actors (for example, terrorist organizations, drug cartels, and other money launderers). The financial services industry has called into question whether traditional methods of combating money laundering and terrorism financing are effective and sustainable. Heightened regulatory expectations, emphasis on 100% coverage, identification of emerging risks, and rising staffing costs are driving institutions to modernize their systems. One area gaining traction in the industry is to leverage the vast amounts of unstructured data to gain deeper insights. From suspicious activity reports (SARs) to case notes and wire messages, most financial institutions have yet to apply analytics to this data to uncover new patterns and trends that might not surface themselves in traditional structured data. This paper explores the potential use cases for text analytics in AML and provides examples of entity and fact extraction and document categorization of unstructured data using SAS® Visual Text Analytics.
Austin Cook, SAS
Beth Herron, SAS
High-Availability Services in an Enterprise Environment with SAS® Grid Manager
Session 1726Many organizations relay on services that are critical to day-to-day operations and that must constantly be available and accessible, even if infrastructure becomes unavailable. Out-of-the-box SAS Grid Manager provides a high-availability platform based on a multi-machine architecture with Load Sharing Facility (LSF) and services failover. In enterprise reality, the base installation and configuration of LSF should be amended to provide high-availability according to companies requirements (for example, server topology, resource management, multi-tenancy). This paper contains tips and tricks and configuration examples on all layers (SAS, LSF, Enterprise Grid Orchestrator [EGO] and many more) to provide a better and reliable high-availability of services in your organization.
Andrey Turlov, Allianz Technology SE
House Prices Segmentation
Session 2873If you ever try to buy a 1,500 sq. ft. house in San Francisco, California and then look for a similar house in Stillwater, Oklahoma, you would see a stark difference in the price of the house. Obviously, location is a huge factor in real estate prices. If you limit the location to a city, do you think the cost of two 1,500 sq. ft. houses in San Francisco or Chicago would be the same? It depends. There are a lot of factors that go into the final sale price of the house, such as the condition of the house, proximity to schools and parks, proximity to public transport, and so on. This paper tries to understand the underlying factors that go into creating the price of each house. The goal of this paper is to build a segmentation model to identify housing submarkets based on different factors. In the segmentation approach, the idea is to uncover the factors that act as the most relevant partitioning criteria.
Mettila Kaimathuruth, Oklahoma State University
How a Code-Checking Algorithm Can Prevent Errors
Session 2798When a company uses an automated production system for reporting, there is always a risk of having recurring errors due to issues with reports being submitted incorrectly. One way to reduce these errors is to use a code-checking program that assesses several aspects of a program before it is scheduled, including its compatibility with the production environment, inclusion of comments, and notification of security risks. In this paper, I discuss some of the methods that can be included in a code-checking program, as well as some methods for implementing these techniques. The first and most important is simulating a run in an automated production environment. We then look at analyzing the volume and completeness of comments in the code being tested. Also, we review methods to handle warnings and other non-critical issues that could be identified. Finally, we look at methods of checking for risky fields being used, including personal or financial Information, which need to have a limited distribution.
Thomas Hirsch, Magellan Health
How Effective is Change of Bowling Pace in Cricket?
Session 2598Cricket, similar to baseball in that it uses a bat and ball, is one of the most popular sports in the world, especially in India. At a whopping 411 million unique TV viewers in 2017 (Ahluwalia 2017), the Indian Premier League (IPL) cricket tournament has a follower base 25% larger than the entire population of the United States. T20 cricket, the format of the IPL, is dominated by batsmen, meaning bowlers (pitchers) struggle to keep swashbuckling pinch-hitters from smashing them out of the park. In the past few years, however, many bowlers have developed a strategy in which a ball comes out much slower than their normal pace in an effort to deceive the batsman. But no public data about bowling speeds exist. So, I recorded data from approximately 1,000 balls from last years IPL to analyze the effectiveness of these slower balls. It turns out the slower balls are effective because they get batsmen out a statistically significant higher portion of the time than a normal pace ball. But there are many more interesting questions that can be asked: Who has the best slower ball? Is there such thing as too slow for a slower ball? When are bowlers most likely to bowl a slower ball? All of these questions and more can be answered by my data set and can be used by cricket teams around the world to improve their standing in the extremely competitive industry of sport.
Deven Chakraborty, Imperial College London
How Good is That Forecast? The Nuances of Prediction Evaluation Across Time
Session 1862When predicting across time, typical methodologies of prediction evaluation no longer hold true. It is not practical to take a holdout sample randomly from observations in the data set or even to use a typical k-fold cross-validation structure. Even newer methods of prediction evaluation in cross-sectional data like target shuffling should not just be applied to data where a temporal structure is inherent. How then can we determine if we have a good forecast or if we have reached our conclusions by random chance? This talk highlights advantages and disadvantages to techniques for evaluating predictions when forecasting future observations. It also discusses possible biases arising from time structures of data that should be considered.
Aric Labarr, Elder Research Inc.
How SAS® Helps to Price Auto Insurance in Brazil
Session 1912In an increasingly dynamic and complex market such as auto insurance, it is absolutely mandatory to construct advanced pricing methodologies with strong predictive power in order to ensure profitability. Because our business is in a country with continental dimensions, finding the right price in the face of great risk and customer behavior disparity is a major challenge. Therefore, we developed methodologies for risk discrimination, georeferencing, and forecasting using SAS®. SAS tools have enabled us to build a more holistic view of our business, and to integrate pricing intelligence across risk and demand elasticity profiles.
Anna Mattos, SulAmerica Seguros
Edgar Meireles, SulAmerica Seguros
Karla Lopes, SulAmerica Seguros
How Statistically Analyzing System Behaviors with SAS® Visual Analytics Revealed Unknown Data Issues
Automatic loading, tracking, and analysis of data readiness in SAS® Visual Analytics is easy when you combine SAS® Data Integration Studio with the DATASET and LASR procedures. This paper is a followup to a previous paper presented on the methodology that we use at the University of North Carolina at Chapel Hill to track our data preparation and readiness by using SAS Visual Analytics reporting. This paper covers real-world examples of how our analysis and visualization methods surfaced unknown data integrity issues brought about by anomalous system behaviors. This paper also covers how we recognized the issues, and how creating these SAS Visual Analytics visualizations can help any SAS® customer quickly identify potential data integrity issues that originate from system behaviors.
Jessica Fraley, University of North Carolina at Chapel Hill
How to Build a Recommendation Engine Using SAS® Viya®
Session 2095Helping users find items of interest is useful and positive in nearly all situations. It increases employee productivity, product sales, customer loyalty, and so on. This capability is available and easy to use for SAS® Viya® customers. This paper describes each step of the process: 1) loading data into SAS Viya; 2) building a collaborative filtering recommendation model using factorization machines; 3) deploying the model for production use; and 4) integrating the model so that users can get on-demand results through a REST web service call. These steps are illustrated using the SAS Research and Development Library as an example. The library recommends titles to patrons using implicit feedback from their check-out history.
Jared Dean, SAS
How to Load Relational Data into SAS® Cloud Analytic Services Using Java
Session 1906Java is one of the most popular programming languages available today. Did you know that you can easily use Java with SAS® Cloud Analytic Services (CAS)? This presentation shows you how to do this and more. During this presentation, we cover the following tasks: how to load the Java libraries required to invoke CAS actions within your Java program how to invoke CAS actions from Java how to load data from a relational database into CAS You will walk away from this presentation confident that you can read and write data from CAS to a relational database using a Java program.
Salman Maher, SAS
How to Prevent Turnover of the Best and Most-Experienced Employees with SAS® Enterprise Miner®
Attrition is a common issue every company faces. Many companies have an investment in their employees and as such are interested in employee satisfaction and why some employees leave a company. In fact, companies often incorporate surveys as part of their annual review process. A data set was simulated by user ludoben (on the Kaggle website) using variables that any normal human resources department would know about their employees. Our task was to predict which employees might leave from the ten available features. We found that the most at-risk employees for leaving were found in the most extreme regions of each feature.
Liyuan Liu, Kennesaw State University
Lauren Staples, Kennesaw State University
How to Use Streaming Analytics to Create a Real-Time Digital Twin
Session 2004With the Internet of Things (IoT), devices are frequently in remote places, operating in different physical environments. The devices are communicating with control systems and also communicating with each other. One challenge is to know and understand the environment a device is operating in, and whether the device is operating properly and efficiently. To meet this challenge, you create a digital twin of the device, where you have a virtual representation of the device in real time. The benefit of a digital twin is that you can know how the device is operating wherever it is physically located. IoT devices have a number of sensors installed on them, as well as sensors for the environment around them. Analytics is needed to bring this sensor data together and create a true real-time digital twin. This presentation shows how streaming analytics in SAS® Event Stream Processing is used to create a real-time model of the remote device. Analytics enables the digital twin to do the following: fill in gaps where sensor data is not available provide notification when a device is not operating efficiently provide advance notice when a device is failing detect when devices are not interacting properly forecast future operating conditions
Brad Klenz, SAS
Human Resources Analytics: Why Do the Best Employees Leave?
Session 2729Over the last few years, companies have evolved in making better use of their Human Resource (HR) data. Traditionally companies gathered HR data to track employees efficiency or manage payroll. These days, HR analytics is used for recruitment, retention, training, performance, and to gauge an organizations overall effectiveness. This paper focuses on using HR data to predict the attrition rate of an employee and help organizations know beforehand which employee could be leaving the organization. Knowing this, companies could take preemptive actions to increase retention of valuable employees. We use the incremental response model technique in this paper and build a recommendation engine for an HR department that provides suggestions based on employees previous patterns and similarities with other employees. Initial analysis of the paper showed that Accounting, Tech, and Support have some of the hardest working people with generally higher performance evaluations, yet they are among the least satisfied and least promoted. The data is taken from Kaggle and has 15,000 observations. SAS® Enterprise Guide® and SAS® Enterprise Miner(tm) were used to build and compare the models. This paper was presented at Analytics Experience 2017. We have performed descriptive analysis of the data and built predictive models. This paper includes cluster analysis, an incremental response model, and a recommendation engine.
Ankita Khurana, Oklahoma State University
Identifying Behaviors and Irregularities in Public Procurement in the Brazilian Federal Government
In the last 2.5 years, several suppliers were identified in more than 9,000 procurement processes with behavioral deviations corresponding to US$1.29 billion in resources from the Federal Government of Brazil. These behaviors indicate suspicious or undue actions in public procurement systems. These deviations were detectable only by using the antifraud methodologies in SAS® tools. The Ministry of Planning, Development and Management is the body of the Federal Government of Brazil responsible for standardization and monitoring of processes in addition to managing public procurement systems. In order to expand the level of monitoring and optimize the regulation of these acquisitions, a number of data analysis and exploration works have been developed, with the proper integration of several governmental databases, to accomplish the following: identify evidences of irregularities in public procurements; identify suppliers with inadequate and systematic behavior in procurement processes; identify evidences of collusion between companies participating in bidding processes; recover public funds diverted by fraudsters; acquire knowledge and improve methodologies, information systems, procedures, controls, monitoring of public procurements, and regulations for all institutions of the Federal Government; subsidize public managers of historical information and behavior of suppliers for decision-making. This paper presents the methodologies, techniques, and procedures applied.
Andre Castro, Ministry of Planning, Development and Management
Rodrigo Marquez, Ministry of Planning, Development and Management
Wesley Lira, Ministry of Planning, Development and Management
Daniel Rogério, Ministry of Planning, Development and Management
Identifying Duplicate Variables in a SAS® Data Set
Session 1654In the big data era, removing duplicate data from a data set can reduce disk storage use and improve processing time. Many papers have discussed removal of duplicate observations, but it is also useful to identify duplicate variables for possible removal. One way to identify duplicate variables is with the COMPARE procedure. PROC COMPARE is commonly used to compare two data sets, but can also compare variables in the same data set. It can accept a list of variable pairs to compare and determine which variable pairs are identical. This paper shows how to obtain a summary report of identical variables for all numeric or character variables in a data set, using the following steps: Dynamically build a list of all possible numeric or character variable pairs for PROC COMPARE to analyze. Convert PROC COMPARE pairwise results (for example, N1 is identical to N3, N3 is identical to N5, N1 is identical to N5, and so on) into a summary report that groups all identical variables (for example, N1, N3, and N5 are identical). For very large data sets, the paper shows how to substantially improve performance by first executing PROC COMPARE one or more times on a small number of observations to reduce the number of extraneous comparisons.
Bruce Gilsen, Federal Reserve Board
Identifying Semantically Equivalent Questions Using Singular Value Decomposition
Session 2647In the past few decades, inquisitive people are visiting question-and-answer sites such as Quora to get access to expert answers. With over 100 million monthly visitors, its not surprising that many people ask similarly worded questions that can be answered with the same exact content. This situation causes site visitors to spend more time discovering the best response to their question. It also frustrates authors because they feel they need to answer multiple versions of the same question. This paper aims to solve a challenge released by Quora to improve the experience of its authors and site visitors by grouping queries with similar intent. A public data set released by Quora that contains over 400,000 question pairs was used as the data source. The PyDictionary module in Python was used for extracting synonyms for the most frequently occurring terms. With the help of SAS® Enterprise Miner(tm), singular value decomposition (SVD) was implemented to reduce the dimensions of the term by document matrix. Euclidean distance was used to compute the distance between sentences that were projected into the SVD space. The accuracy of the classification of question pairs as duplicates and non-duplicates was 62.4%. In addition, classification could also be performed by comparing whether a question is duplicate of any other question in the corpus of questions (approximately 800,000). Thereby, duplication could be avoided at the corpus level.
Varsha Reddy Akkaloori, Oklahoma State University
If You Need These OBS and These VARS, Then Drop IF and Keep WHERE
Session 2417Reading data effectively in the DATA step requires knowing the implications of various methods and DATA step mechanics: the observation loop and the Program Data Vector (PDV). The impact is especially pronounced when working with large data sets. Individual techniques for subsetting data have varying levels of efficiency and implications for input/output time. Using the WHERE statement or option to subset observations consumes fewer resources than the subsetting IF statement. Also, using DROP and KEEP to select variables to include or exclude can be efficient depending on how they are used.
Jay Iyengar, Data Systems Consultants
Image Classification Using SAS® Enterprise Miner 14.1
Session 2832Character classification or image classification plays a vital role in many computer vision problems (for example, optical character recognition (OCR), license plate recognition, and so on), and therefore could be used in solving many business problems. The task is challenging because, in addition to dealing with the large number of levels necessary to classify each image, extensive data preparation is also required. We worked with a new publicly available data set, the Devanagari Handwritten Character Dataset available on the UCI Machine Learning Repository website, which contains 92 thousand images of 46 Devanagari characters. Our goal is to develop an image recognition system for Devanagari script. Devanagari is an Indic script and forms the basis of over 100 languages spoken in India and Nepal, including Hindi, Marathi, Sanskrit, and Maithili to name a few. There is no capitalization of words, unlike in the English language. The learning model was trained over 92,000 image (32 *32 pixels). Each image can be classified into 46 different characters. Each character was then further classified into 36 consonants and 10 characters. We used SAS® Enterprise Miner 14.1 for modeling and variable reduction. For initial modeling, a neural network was used, which resulted in a 40% misclassification rate. This rate could be reduced significantly with the use of other modeling techniques.
Akshay Arora, Oklahoma State University
Image Processing: Seeing the World through the Eyes of SAS® Viya®
Session 2759Ever wonder how the algorithms used by Facebook and Google detect your friends in your photos? Image recognition and classification algorithms, such as deep neural networks, can extract important information from photos and classify them almost instantly after you post a picture. Not only is this process useful on social media, but there are numerous applications of image classification algorithms in healthcare, manufacturing, and security screenings. Using the latest machine learning capabilities available in SAS® Viya® for text and image processing, organizations can leverage in-memory processing with SAS® Cloud Analytic Services (CAS) and enhanced parameter tuning to develop more sophisticated deep learning models. In this paper, we discuss key components of building an image classification algorithm.
Ivan Gomez, Zencos
Implementing Analytics: Perspectives from the Client Side
Most organizations are in the midst of transforming themselves from an intuition- or judgement-based culture into a culture where fact-based decision making is the norm. This means using data and analytics, and embedding these into the organizations decision making at all levels. This transformation is both an analytical initiative as well as a change in management. I discuss my experiences in heading Analytics groups in large client organizations in the Consumer Packaged Goods and Pharmaceutical sectors, focusing on the challenges involved in implementing and transforming companies to build a fact-based analytics culture. I first describe how data and analytics are used to formulate marketing strategy and present several real-life use cases to illustrate this point. The presentation discusses the types of analytic work or projects undertaken in detail, both continuously and episodically, and addresses questions such as: Who initiates the analytics work? What is the analytics process? How is analytics activation done? Four analytics use cases related to marketing mix analysis (resource allocation), price optimization, and forecasting assortment optimization are covered. I also discuss learnings and best practices in analytics implementation and activation, such as the importance of getting both top-down and bottom-up buy-ins, the right communication strategy with internal clients, securing quick-wins, getting the right analytic partner or vendor, and so on.
Suresh Divakar, Independent Consultant
Implementing External File Processing with No Record Delimiter via a Metadata-Driven Approach
Session 2643Most of the time, we process external files that have fixed-width columns or that contain data separated by a common delimiter such as a comma, tab, pipe, and so on. However, there are times when we have to deal with unusual or inconsistent file structure and format. This paper describes an innovative approach to reading external files that have no available delimiter with which to parse the data. Furthermore, the structure and type of the data can change from one record to another. Each record might have a unique format, sometimes requiring looping constructs to properly set the required data. This case is derived from a real-world example in the automobile insurance sector. The solution uses external metadata to define the required characteristics of the raw file, along with hash object lookup and advanced INFILE statement processing to achieve the required business outcome.
Princewill Benga, F&P Consulting
Implementing Privacy Protection-Compliant SAS® Aggregate Reports
Session 2022Personal data protection in published reports is an important human right, and is required by US Federal Law, European Union General Data Protection Regulation, and many other jurisdictions. This paper highlights the importance of producing privacy protection-compliant reports, discusses aspects of potential privacy breaches, and suggests a robust algorithm for producing well-protected aggregate reports. This paper walks through the complex logic of an enhanced complementary suppression process, and demonstrates SAS® coding techniques to implement and automatically generate aggregate tabular reports compliant with privacy protection law. The result is a set of SAS macros ready for use in any reporting organization that is responsible for compliance with privacy protection.
Leonid Batkhan, SAS
Important Performance Considerations When Moving SAS® to a Public Cloud
Any architecture that is chosen by a SAS® customer to run their SAS applications has the following requirements: a good understanding of all layers and components of the SAS infrastructure an administrator to configure and manage the infrastructure the ability to meet SAS requirements not to just run the software but to also enable it to perform well This paper discusses important performance considerations for SAS®9 (both SAS® Foundation and SAS® Grid) and SAS® Viya®. We give guidance for how to configure the cloud infrastructure to get the best performance with SAS.
Margaret Crevar, SAS
Improved Campaigning at Africa's Largest Pay TV Service Provider with SAS® Marketing Optimization
Session 2762Traditionally, the entertainment industry incurs significant cost in the engagement of a diverse customer base with appropriate messaging and slow marketing turnaround, leading to reduced customer satisfaction levels and customer retention rates. MultiChoice South Africa, Africas largest digital satellite television service provider, leverages a rich set of historical and personal customer information, in both structured and unstructured format, in order to understand and predict customer behavior and marketing campaign propensity. These predictive models are used to improve campaign targeting, generating significant incremental revenue while reducing costs of communication. This paper describes MultiChoices data science journey using SAS® Marketing Optimization to campaign to the right customers at the right time through the right communication channel in order to ensure that the company maximizes profit, improves retention rates, and improves overall customer satisfaction.
Jean De Villiers, Multichoice SA
Improving Financial Reporting Accuracy Using Smart Meter Data
Session 2734Utility companies typically bill their customers based on usage during the respective customers billing cycle despite selling the energy throughout the month. The start and stop date for a billing cycle might not coincide with that of a calendar month. Therefore, to close the accounting books at the end of the calendar month, utilities must estimate customer usage and corresponding revenue during the portion of the calendar month that has not been billed yet. Traditionally, utilities have relied upon either a regression model-based approach or a Prior Unbilled method to estimate the current months unbilled usage and revenue, which at times can yield financial results outside reasonable limits. However, with the availability of smart meter data on a daily or hourly basis, utilities should be able to accurately calculate the unbilled usage rather than estimating it. The problem is that the daily or hourly meter readings might include erroneous and missing data that needs to be corrected and validated prior to being used in the financial entries. This paper proposes an analytical framework to correct and validate the daily or hourly meter readings using meter taxonomy data, meter operation data, billing cycle information, and tariff class information. Finally, unbilled energy usage and corresponding revenue are calculated and presented using SAS® Visual Analytics.
Prasenjit Shil, Ameren
Tom Anderson, SAS
Improving Game Strategies with Data Analytics: The Case of Euroleague Professional Basketball
We analyzed 10 years of Euroleague games, with approximately 238,000 shots that were attempted for a score. We looked at factors such as shooting coordinates, players age and height, home versus away games, player positions, and so on, to develop a predictive model for the success probability of a shot. The results can be used for recruitment strategies to maximize the success probability of a team.
Raha Akhavan-Tabatabaei, Sabanci University
Cem Yigman, Sabanci University
Improving Scheduling and Strategic Planning at J. P. Morgan Chase's Card Production Center
Session 2621JP Morgan Chases manufacturing-based card production center (CPC) is beginning to leverage its vast array of production data and take advantage of the benefits provided through analytics using end-to-end solutions based on SAS®. Due to the complex nature of their operation, CPC desired a more efficient process to support strategic planning and scheduling, strategy related to unit cost and revenue generation, and optimizing resource allocation. By using the CPM procedure within the SAS/OR® suite, we built a model that enables the business to schedule tasks and test various strategic scenarios subject to priority, resource, and time constraints, across a variety of products within their desired planning horizon. In order to operationalize the solution and demonstrate business value before investing in costly IT resources, we developed a user interface integrating SAS® Stored Processes, the SAS® Add-In for Microsoft Office, and customized Visual Basic for Applications (VBA) code. The user interface enables our non-technical users to harness the power of SAS modeling, submit customized scenarios, run the model, and receive customized output in an on-demand, user friendly environment. The project demonstrates the importance of starting small and focused when introducing analytics into a new environment. The success and business buy-in from this initial project is leading to additional refinements and an expansion of the model capabilities to address new use cases.
Jay Carini, JPMorgan Chase
Indication of Irregularities in the Accumulation of Public Sector Jobs in the State of Ceará, Brazil
The state of Cear, Brazil, is saving about eight million dollars per month in financial resources. This savings is due to the implementation of a data analysis and analytical intelligence project using SAS® Fraud Framework for Government that identifies irregularities (frauds) in the accumulation of public sector jobs by its public servants. The Court of Audit of the State of Cear (TCE-CE) is the public institution responsible for the control, supervision, and judgment of the application of public resources for the State of Cear. It promotes ethics in public management to guarantee the exercise of full citizenship for all citizens of Cear. In general, a public servant in the state of Cear must work in only one public sector job. Exceptions exist, such as the constitutional provision that enables teachers and health professionals to accumulate public positions up to a maximum of two, if they prove the compatibility of their positions with the allowable workload (maximum of 60 hours per week). However, it has been verified that many public servants are violating the law and working in several institutions, public or private, with a workload higher than that allowed by law, damaging the quality of public services. This work presents the methodology and procedures used in the implementation of SAS® software solutions, which enabled the public coffers of the State of Cear to save more than eight million dollars per month.
Jose Silva, Tribunal De Contas Do Estado Do Ceara
Raimundo Filho, Tribunal De Contas Do Estado Do Ceara
Innovate with Data: Use Cases of SAS® in a Modern Hadoop, AI, Machine Learning, and IoT World
The buzz around machine learning (ML), artificial intelligence (AI), and the Internet of Things (IoT) has reached fever pitch! This session focuses on business outcomes and innovation that are being realized by early adopters of these transformational technologies. This session presents research and case studies on key trends in ML, AI, and IoT analytics space worldwide. This session is aimed at business and senior technology leaders looking for guidance in the rapidly changing landscape of AI, ML, and IoT analytics. Results of exclusive primary research are presented at this session. The following topics are covered: Key business drivers and outcomes achieved Using cloud services and APIs to accelerate innovation Architecture patterns to integrate with in-house SAS® solutions Scaling and operationalzing insights across the enterprise Usage of open-source software (for example, Hadoop, Spark, Python, and R) to complement proprietary tools Comparison of the SAS user community and other user communities Roadmap and methodology to agile implementation
Raj Dalal, BigInsights
Innovating Government Decision-Making through Analytics
Session 1952Everything that a government agency does revolves around decision-making. While the types of decisions can be highly varied, from strategic decisions about agency budget priorities to split-second operational decisions on the battlefield, decisions are at the center of what governments do. Understanding how we make decisionsand even more importantly, how to make them betteris critical for successful execution of government agency missions. The challenge is that science over the past 50 years has shown that humans are terrible at making rational decisions. From how we understand uncertainty, to how we intuitively understand numbers, to inherent biaswe make decisions that, much of the time, are not rational. In this session, Dr. Bennett motivates the need for innovation in decision-making, drawing from historical examples and his time in government. He reviews aspects of cognitive decision science in an interactive way, highlighting our decision-making irrationality, and then highlights the impacts and challenges of irrational decision-making, using real-world examples from health care, geopolitics, and other domains. Finally, analytics is discussed as an important source of evidence for innovation in the way governments make decisions.
Steve Bennett, SAS
Insights from a SAS Technical Support Guy: A Deep Dive into the SAS® ODS Excel Destination
Session 2174SAS is a world leader in the area of data analytics, while Microsoft Excel, with over 30 million active users, is a leader when it comes to spreadsheet packages. Excel spreadsheets are heavily used for calculations, information organization, statistical analysis, and graphics. SAS® can leverage the power of its world-class analytics and reporting capabilities to produce stylistic and highly functional Excel spreadsheets by using the Output Delivery System (ODS) Excel destination. This paper, relevant to anyone who uses Microsoft Excel, offers insights into the depths of the ODS Excel destination by illustrating how you can customize styles in Microsoft Excel worksheets and discusses common layout and reporting questions (including limitations). In addition, the discussion covers useful applications for automating and executing Excel worksheets. After diving deep into this discussion about the ODS Excel destination, you should understand the behavior and capabilities of the destination so that you can create aesthetic and effective Excel worksheets.
Chevell Parker, SAS
Insights into Using the GLIMMIX Procedure to Model Categorical Outcomes with Random Effects
Session 2179Modeling categorical outcomes with random effects is a major use of the GLIMMIX procedure. Building, evaluating, and using the resulting model for inference, prediction, or both requires many considerations. This paper, written for expert users of SAS® statistical procedures, illustrates the nuances of the process with two examples: modeling a binary response using random effects and correlated errors, and modeling a multinomial response with random effects. In addition, the paper provides answers to common questions that are received by SAS Technical Support concerning these analyses with PROC GLIMMIX. These questions cover working with events and trials data, handling bias issues in a logistic model, and overcoming convergence problems.
Kathleen Kiernan, SAS
Integrating SAS® Analytics into Your Web Page
Session 2145SAS® Viya® adds enhancements to the SAS® Platform that include the ability to access SAS® services from other applications. Whether your application is in Python, Java, Lua, or R, you can now access SAS analytics and integrate them directly in your application. You can even use REST APIs. In this session, we look at using the REST APIs in SAS Viya to execute SAS® Cloud Analytic Services (CAS) actions and embed them in an application, which in this case is a web page. Examples include uploading a table, performing a SAS analytic procedure and displaying the output, and publishing a report. This method provides you with much greater flexibility and customization for building dashboards and reporting sites. Through this session, you will gain an understanding of how you can embed analytics from SAS Viya into your very own application.
David Hare, SAS
Integrating SAS® and Data Vault
Session 1898In this paper, we look at the popular modeling technique called Data Vault, and its integration with SAS®. The technique was conceived by Dan Linstedt in 1990 and presents a practical and easy-to-understand method for modeling a business-into-data-architecture solution. We introduce the various artifacts of Data Vault, how to put them together, how to automate data integration into it, and finally how it fits into your SAS analytical solution.
Patrick Cuba, Cuba BI Consulting
Integrating SAS® and Elasticsearch: Performing Text Indexing and Search
Session 2900Integrating Elasticsearch document indexing and text search components expands the power of performing textual analysis with SAS® solutions. Information technology, digitization, social connection, modern data storage, and big data accelerate unstructured text data production. Understanding the advantage in processing textual data and extracting underlying information provides valuable insights, setting an edge in competition, as seen in e-commerce, internet companies, communications media, marketing, health care, and across many industrial sectors. This paper covers the benefits of architecting and implementing Elasticsearch applications alongside SAS solutions. The first section presents an overview of Elasticsearch and common use cases. The paper demonstrates indexing SAS data sets into Elasticsearch NoSQL index, writing SAS codes to pass Elasticsearch REST APIs, and storing search query results. The final section demonstrates the use of Elasticsearch Kibana to further complement data visualization and business intelligent reporting capability with SAS analytics.
Edmond Cheng, Booz Allen Hamilton, Inc.
International Dates and Times: Around the World with SAS®
Session 2764Data is global. Language, characters, customs, and conventions change across borders, virtual and physical. SAS® has met this 21st century challenge with National Language Support (NLS). The LOCALE and DFLANG options interact with formats and informats to meet any storage and reporting needs without the need for extensive programming. This paper provides an overview of the standard and the NLS capabilities within SAS to help users with their global data.
Derek Morgan, PAREXEL International
Interpreting Black-Box Machine Learning Models Using Partial Dependence and Individual Conditional Expectation Plots
Session 1950One of the key questions a data scientist asks when interpreting a predictive model is How do the model inputs work? Variable importance rankings are helpful for identifying the strongest drivers, but these rankings provide no insight into the functional relationship between the drivers and the models predictions. Partial dependence (PD) and individual conditional expectation (ICE) plots are visual, model-agnostic techniques that depict the functional relationships between one or more input variables and the predictions of a black-box model. For example, a PD plot can show whether estimated car price increases linearly with horsepower or whether the relationship is another type, such as a step function, curvilinear, and so on. ICE plots enable data scientists to drill much deeper to explore individual differences and identify subgroups and interactions between model inputs. This paper shows how PD and ICE plots can be used to gain insight from and compare machine learning models, particularly so-called black-box algorithms such as random forest, neural network, and gradient boosting. It also discusses limitations of PD plots and offers recommendations about how to generate scalable plots for big data. The paper includes SAS® code for both types of plots.
Ray Wright, SAS
Introducing SAS® Decision Manager 5.1 with Business Rules for SAS® Viya®
SAS® Business Rules Manager 1.2 was introduced in 2012 for customers who need software for building and managing their collections of valuable conditions and actions that drive business decision-making. These rules are combined with advanced analytical models and deployed into batch and real-time processing systems. SAS® Decision Manager 5.1 brings full decision processing to SAS® Viya®. This paper tours those new features, including brand new editors for interactively building business rules, graphical process flow decisions, and reference data management; integration with SAS® Visual Analytics and SAS® Model Manager; computations based on the SAS® Cloud Analytic Services Server; model deployment to multiple operational servers; and compatibility with open-source environments such as Python. SAS® Decision Manager is now cloud-ready, multi-tenant-capable, and provides a full REST API for custom application integration.
Chris Upton, SAS
Steve Sparano, SAS
David Duling, SAS
Introducing SAS® Model Manager 15.1 for SAS® Viya®
Session 2284SAS® Model Manager has been a popular product since 2006 for customers who need software for managing their collections of valuable analytical models. SAS® Model Manager includes functions for organization, search, editing, testing, deployment, and monitoring performance. Some customers have thousands of models. SAS® Viya® brings a new version of SAS® Model Manager with a modern set of features and capabilities. This paper tours those new features, including a new web application for managing models and projects; integration with SAS® Visual Statistics, SAS® Model Studio, SAS® Decision Manager, and SAS® Event Stream Manager; computations based on the SAS® Cloud Analytic Services (CAS) Server; model deployment to multiple operational servers; and compatibility with open-source environments such as Python. SAS® Model Manager is now cloud-ready, multi-tenant-capable, and provides a full REST API for custom application integration.
Glenn Clingroth, SAS
Chengwen Chu, SAS
Steve Sparano, SAS
David Duling, SAS
Introduction to ETL with SAS®
Ever wondered what extract, transform, load (ETL) actually means? This e-poster covers a general introduction to ETL and the key subsystems of each stage. Also provided are implementation examples in SAS®, as well as real-life examples of using ETL.
Vasilij Nevlev, Analytium Ltd
Invoiced: Using SAS® Text Analytics to Calculate Final Weighted Average Price
Session 2257SAS® Contextual Analysis brings advantages to the analysis of the millions of electronic tax notes issued in the industry and improves the validation of taxes applied. Tax calculation is one of the analytical challenges for government finance secretaries. This paper highlights two items of interest in the public sector: tax collection efficiency and the calculation of the final weighted average consumer price. SAS® Contextual Analysis enables the implementation of a tax taxonomy that analyzes the contents of invoices, automatically categorizes a product, and calculates a reference value of the prices charged in the market. The first use case is an analysis of compliance between the official tax rateas specified by the Mercosul Common Nomenclature (NCM)and the description on the electronic invoice. (The NCM code was adopted in January 1995 by Argentina, Brazil, Paraguay, and Uruguay for product classification.) The second use case is the calculation of the final weighted average consumer price (known as the PMPF). Generally, this calculation is done through sampling performed by public agencies. The benefits of a solution such as SAS Contextual Analysis are automatic categorization of all notes and NCM code validation. The text analysis and the generated results contribute to tax collection efficiency and result in a more adequate reference value for use in the calculation of taxes on the circulation of goods and services.
Alexandre Carvalho, SAS
Is Your Data Viable? Preparing Your Data for SAS® Visual Analytics 8.2
Session 1826We all know that data preparation is crucial before you can derive any value from data through visualization and analytics. SAS® Visual Analytics on SAS® Viya® comes with a new rich HTML5 interface on top of a scalable compute engine that fosters new ways of preparing your data upfront. SAS® Data Preparation that comes with SAS Visual Analytics brings new capabilities like profiling, transposing or joining tables, creating new calculated columns, and scheduling and monitoring jobs. This paper guides you through the enhancements in data preparation with SAS Visual Analytics 8.2 and demonstrates valuable tips for reducing runtimes of your data preparation tasks.
Gregor Herrmann, SAS
Just Enough SAS® Cloud Analytic Services: CAS Actions for SAS® Visual Analytics Report Developers
Session 2000SAS® Visual Analytics includes all of the point-and-click functionality required to load data, manage data, and perform other back-end work necessary to make visualizations efficient and effective. But there are a handful of critical tasks that are sometimes just easier to do with a few lines of code, especially if your end goal is to automate that process. While there is a substantial and growing codebase for SAS® Cloud Analytic Services (CAS) actions and SAS® Viya® procedures, SAS Visual Analytics report developers need to focus on the creation and implementation of reports. What we need is just enough CAS code to simplify the back-end work. This paper presents some of the most common, most useful CAS actions that can be run from any code window in SAS Viya, including SAS® Studio. These bare-bones code examples include: loading a data set to a CAS server, using CAS actions to transform data from a wide to a tall structure, and other formatting operations that make data reporting-ready. Using minimal code, the paper demonstrates how the scored output of a model based in SAS Viya can be lifted into CAS and made ready for SAS Visual Analytics reporting. Finally, the paper describes exactly how coded back-end actions affect SAS Visual Analytics, including how to verify that back-end processes are working and how to ensure that SAS Visual Analytics settings for refresh and automated data re-load are set to take advantage of automated back-end processes implemented in code.
Michael Drutar, SAS
Key-Independent Uniform Segmentation of Arbitrary Input Using a Hash Function
Session 1755Aggregating or combining large data volumes can challenge computing resources. For example, the process can be hindered by the system limits on utility space or memory and, as a result, either fail or run too long to be useful. It is a natural inclination to try solving the problem by segregating the input records into a number of smaller segments, process them independently, and combine the results. However, in order for such a divide-and-conquer tactic to work, two seemingly contradictory criteria must be met: First, in order to aggregate or combine the data correctly, no segment can share its key-values with the rest; and second, the segments must be more or less equal in size. In this paper, we show how a hash function can be used to achieve aggregation for arbitrary input with no prior knowledge of the distribution of the key-values among its records. Effectively, the method renders any task of aggregating or combining data of any size doable by splitting its input into large enough number of segments. The trade-off is the need to partially re-read the data. However, it is a rather small price to pay for making a failing or endlessly running task finish on time.
Paul Dorfman, Dorfman Consulting
Don Henderson, Henderson Consulting Services, LLC
Latest and Greatest: Best Practices for Migrating to SAS® 9.4
Session 2117SAS customers benefit greatly when they are using the functionality, performance, and stability available in the latest version of SAS®. However, the task of moving all SAS collateral such as programs, data, catalogs, metadata (stored processes, maps, queries, reports, and so on), and content to SAS® 9.4 can seem daunting. This paper provides an overview of the steps required to move SAS collateral from systems based on SAS® 9.2 and SAS® 9.3 to the current release of SAS® 9.4.
Alec Fernandez, SAS
Leigh Fernandez, SAS
Lazy Programmers Write Self-Modifying Code OR Dealing with XML File Ordinals
Session 2454The XML engine within SAS® is very powerful but it does convert every object into a SAS data set with generated keys to implement the parent/child relationships between these objects. Those keys (Ordinals in SAS speak) are guaranteed to be unique within a specific XML file. However, they restart at 1 with each file. When concatenating the individual tables together, those keys are no longer unique. We received an XML file with over 110 objects resulting in over 110 SAS data sets, which our internal customer wanted concatenated for multiple days. Rather than copying and pasting the code to handle this process 110+ times, and knowing that I would make mistakes along the wayand knowing that the objects would also change along the wayI created SAS code to create the SAS code to handle the XML. I consider myself a Lazy Programmer. As the classic Real Programmers sheet tells us, Real Programmers are Lazy. This session reviews XML (briefly), SAS® XML Mapper, XML Engine, techniques for handing the Ordinals over multiple days, and finally discusses a technique for using SAS code to generate SAS code.
David Horvath, PNC
Let's Get FREQy with Our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic
As programmers, we are often asked to program statistical analysis procedures to run against the data. Sometimes the specifications we are given by the statisticians outline which statistical procedures to run. But other times, the statistical procedures to use need to be data dependent. To run these procedures based on the results of previous procedures output requires a little more preplanning and programming. We present a macro that dynamically determines which statistical procedure to run based on previous procedure output. The user can specify parameters (for example, fshchi, plttwo, catrnd, bimain, and bicomp), and the macro returns counts, percents, and the appropriate p-value for Chi-Square versus Fisher Exact, and the p-value for Trend and Binomial CI, if applicable.
Lynn Mullins, PPD
Richann Watson, DataRich Consulting
Leverage Custom Geographical Polygons in SAS® Visual Analytics
Discover how you can explore geographical maps using your own custom map regions. SAS® Visual Analytics supports a number of predefined geo codecs, including various country and subdivision lookups. However, often your own custom polygons or shape files draw exact boundaries for the regional overlay you are trying to explore. From generic sales regions, floor plans, or even pipe linesthere are many use cases for custom polygons in visual data analysis. Using custom regions is now easier than ever with user-interface-driven support for importing and registering these custom providers. This paper demonstrates not only the different types of custom providers that are supported, but also shows how to leverage custom polygons within SAS Visual Analytics by showcasing industry examples.
Murali Nori, SAS
Falko Schulz, SAS
Leveraging Multivariate Testing for Digital Marketing Using SAS® Enterprise Guide®
Session 2746The most popular method for in-market tests is the A/B test, but the method that has the power to drive much stronger insights is the multivariate test. This paper explains the advantages of adopting multivariate testing in a direct mail campaign for digital marketing over other traditional methods, explaining how multivariate testing uncovers the hidden data insights that go unnoticed due to lack of aggregate testing potential of traditional methods. The paper lays out different approaches to assisting strategic business decisions by taking a narrow approach to campaign data. Consider a direct mail campaign for which the targeted content is sent to customers. The response rate of these customers is analyzed by testing it against the control group using the simple A/B test to determine whether the targeted content causes the response rate to increase. It is observed that customers receiving targeted content are more likely to respond as compared to customers receiving standard content. If this test is re-run using multivariate testing by controlling the best customer effect, it is observed that there is a significant difference between both groups due to best customer effect and not due to targeted content. Multivariate testing has the power to control multiple factors concurrently and to measure the performance of a campaign accurately, in less time, and with minimum effort.
Vanadana Reddy, AAA Auto Club Group
Jui Salunkhe, Oklahoma State University
Leveraging SAS® In-Database Technology to Improve the Efficiency of Data Analytics
Data volume and data sources are growing at an unprecedented pace in health care industries, and working with big data is often time consuming and challenging. To gain efficiency, we are able to run and optimize key SAS® procedures and analytic processes within the Teradata database. This paper explores the SAS® In-Database processing advantages, the most frequently used SAS In-Database analytical and reporting procedures (which executed directly inside the Teradata database), and the comparison of efficiency between direct SAS analytic processing and SAS In-Database processing with Teradata. Finally, Sample programs are provided for each SAS In-Database analytical procedure that we explore.
Michael Santema, Kaiser Permanente
Qing Yuan, Kaiser Permanente
Don Mccarthy, Kaiser Permanente
LinAcc: A SAS® Macro for Assay Linearity and Accuracy Determination
Session 2563Linearity and accuracy are essential metrics for a linear response assay. Linearity can be defined in various ways, while accuracy is defined as the log observed difference from the perfect accuracy line. The LinAcc SAS® macro packages both linearity and accuracy in one easy-to-use implementation on SAS.
Jesse Canchola, Roche Molecular Systems Inc.
Pari Hemyari, Roche Molecular Systems Inc.
Location Matters: Evidence from Spatial Econometric Analysis of Opioid Prescription Rates
A substantial number of Medicare patients are at risk of opioid abuse. A Centers for Medicare and Medicaid Services (CMS) analysis identified approximately 225,000 beneficiaries who received potentially unsafe opioid dosing. Among the beneficiaries with the highest number of opioid prescriptions filled in 2012, 23 opioid prescriptions were filled at an average cost of $3500 per beneficiary. A key to understanding beneficiary data effectively is to identify high-risk geographic clusters by applying location analytics based on historical data, demographic data, and health trends. Location analytics blends health data and socio-economic data with geographic data to reveal the location of opioid abuse among the Medicare population. Specifically, location analytics that includes spatial econometric modeling (like the SPATIALREG procedure) combined with SAS® Visual Analytics on SAS® Viya® is a powerful and easy-to-use solution to identify high-risk clusters. It can better equip government agencies to effectively allocate resources where they are needed most to protect beneficiaries as well as boost the integrity of the government programs intended to help them. This paper shows how to apply location analytics to find improper prescriptions made by Medicare. It uses SAS® data management, data exploration, modeling, and reporting capabilities to identify patterns and relationships in data that address risks in Medicare and ultimately ensure a timely and adequate response.
Jonathan Mccopp, SAS
Manuel Figallo, SAS
Guohui Wu, SAS
Looking Beyond the Model with SAS® Simulation Studio: Data Input, Collection, and Analysis
Discrete-event simulation as a methodology is often inextricably intertwined with many other forms of analytics. Source data often must be repaired or processed before being used (indirectly or directly) to characterize variation in a simulation model. Collection of simulated data needs to coordinate with and support the evaluation of performance metrics in the model. Or it might be necessary to integrate other analytics into a simulation model to capture specific complexities in the real-world system that you are modeling. SAS® Simulation Studio is a component of SAS/OR® software that provides an interactive, graphical environment for building, running, and analyzing discrete-event simulation models. In a broader sense, it is also an integral part of the SAS® analytic platform. This paper illustrates how SAS Simulation Studio enables you to tackle each of these discrete-event simulation challenges. You have full control over the use of input data and the creation of simulated data. Strong experimental design capabilities mean you can simulate for all needed scenarios. In addition, you can embed any SAS analytic programoptimization, data mining, or otherwisedirectly into the execution of your simulation model.
Edward Hughes, SAS
Emily Lada, SAS
Machine Learning Bots for Fraud, Waste, and Abuse
Session 2007One of the most exciting developments has been the advent of machine learning bots for combating fraud, waste, and abuse of data. In this presentation, Mike presents a machine learning surveillance SAS® bot solution. SAS bots have been deployed to identify suspicious, anomalous, and fraudulent data across a number of domains. Through continuous monitoring and learning, bots can be deployed to augment existing surveillance strategies, identify areas that have been under-surveilled, as well as uncover new and emerging trends.
Michael Ames, SAS
Macro Method to Use Google Maps and SAS® to Geocode a Location by Name or Address
Google Maps is a very useful tool in finding driving directions between addresses and defining a geographic representation of an address that can be used to find straight-line distances between two addresses. Often, geocoding a location can be useful in research. Using macro and SAS® DATA step code that can be developed using SAS® 9.3 on Microsoft Windows, the latitude and longitude of a location can be found by searching the HTML code resulting from a Google Maps search.
Laurie Smith, Cincinnati Children's Hospital Medical Center
Macro that Can Get Geo Coding Information from the Google Maps API
This paper introduces a macro that can automatically get the geo coding information from the Google Maps API for the user. The macro can get the longitude, latitude, standard address, and address components like street number, street name, county or city name, state name, ZIP codes, and so on for the user. To use the macro, the user needs to provide only simple SAS® input data. The macro then automatically gets the data and saves it to a SAS data set for the user. This paper includes all the SAS codes for the macro and provides the input data example to show you how to use the macro.
Ting Sa, Cincinnati Children's Hospital Medical Center
Macros I Use Every Day (And You Can, Too!)
Session 2500SAS® macros are powerful tools that can be used in all stages of SAS program development. Like most programmers, I have collected a set of macros that are so vital to my program development that I find that I use them just about every day. In this presentation, I demonstrate the SAS macros that I developed and use myself. I hope they will serve as an inspiration for SAS programmers to develop their own set of must-have macros that they can use every day.
Joe Deshon, Boehringer Ingelheim Animal Health
Make SAS® Enterprise Miner® Play Nicely with SAS® Viya®
With a few simple tricks, your SAS® Enterprise Miner(tm) on SAS®9 will run seamlessly with your new SAS® Viya® installation. You can enjoy the perks of your old, familiar interface and your existing SAS® Enterprise Miner(tm) projects, PLUS the advantage of faster, distributed, multi-threaded processing on SAS Viya. We walk you through all the steps needed to make this work effortlessly, and we provide the code that will help you. Why choose, when you can have the advantages of both SAS Enterprise Miner on SAS®9 AND SAS Viya?
Beth Ebersole, SAS
Making Data More Familiar: Region and Language Settings with SAS® Visual Analytics
Session 2085Do you like using SAS® with English text in the user interface, but miss date, time, datetime, and numeric formats that are familiar to you? SAS® Visual Analytics has a solution for you! Beginning in SAS Visual Analytics 8.2, you can separate the language your UI is displayed from the regional preferences used to format your data. By giving you more options to control your environment, SAS is giving you more power to know around the globe!
Scott Dobner, SAS
Management and Usage of User-Defined Formats in the SAS® Cloud Analytic Services Server
Session 2032Since the dawn of SAS®, user-defined formats have been created and applied to data values. Formats are used to control the written appearance of data values, or, in some cases, to group data values together for analysis. SAS® Cloud Analytic Services (CAS) server actions support the creation and management of user-defined formats. Other CAS server actions use the formats to manipulate data and apply formats to generate tables or reports. Learn the basic concepts you need to create and manage format libraries. Get a deeper understanding of how a library can be global and local to a CAS session. Recognize the role of the format search list when applying formats.
Denise Poll, SAS
Managing Complex SAS® Metadata Security Using Nested Groups to Organize Logical Roles
Session 1789SAS® metadata security can be complicated to set up and cumbersome to maintain over time. Often, organizations need to manage many itemized groups and roles across multiple logical roles, especially with some industry-specific SAS solutions. In some cases, user groups and roles have no standards or framework to follow, which forces SAS administrators to guess in order to get an initial security model and process in place. To make matters even more complicated, permissions could be mutually inclusive across logical roles. This paper presents a scalable methodology to manage any number of itemized groups and roles within SAS metadata by providing a standardized template and approach using nested groups that roll up into customizable logical roles. This approach can be scaled for larger organizations by layering more nested groups as the number of users and groups grow. Once deployed, the SAS administrator needs only to place SAS user accounts in top-level groups based on logical roles versus managing many groups.
Stephen Overton, Overton Technologies
Managing the Capital Adequacy Process (CAP) Using SAS®
Session 2193The Dodd-Frank Wall Street Reform and Consumer Protection Act requires the 30+ domestically significant banks to conduct mid-year and annual stress testing to assess their capital adequacy under different scenarios. This exercise consists of various stress testing processes, which are rectified through the banks process narratives, supervisory expectations, and internal controls related to the processes. Process flows choreograph the actual capital adequacy process (CAP) operationalization cycle. Compliance with supervisory expectations related to capital adequacy, which are mentioned in regulations SR 15-18 and SR 15-19, are carried out through a workflow-managed CAP. SAS offers a solution, SAS® Qualitative Assessment Manager, that lets banks manage and keep track of these expectations, processes, related control tests, and subsequent findings.
Sukhbir Dhillon, SAS
Managing the Expense of Hyperparameter Autotuning
Session 1941Determining the best values of machine learning algorithm hyperparameters for a specific data set can be a difficult and computationally expensive challenge. The recently released AUTOTUNE statement and autotune action set in SAS® Visual Data Mining and Machine Learning automatically tune hyperparameters of modeling algorithms by using a parallel local search optimization framework to ease the challenges and expense of hyperparameter optimization. This implementation allows multiple hyperparameter configurations to be evaluated concurrently, even when data and model training must be distributed across computing resources because of the size of the data set. With the ability to both distribute the training process and parallelize the tuning process, one challenge then becomes how to allocate the computing resources for the most efficient autotuning process. The best number of worker nodes for training a single model might not lead to the best resource usage for autotuning. To further reduce autotuning expense, early stopping of long-running hyperparameter configurations that have stagnated can free up resources for additional configurations. For big data, when the model training process is especially expensive, subsampling the data for training and validation can also reduce the tuning expense. This paper discusses the trade-offs that are associated with each of these performance-enhancing measures and demonstrates tuning results and efficiency gains for each.
Patrick Koch, SAS
Brett Wujek, SAS
Oleg Golovidov, SAS
Manipulating Statistical and Other Procedure Output to Get the Results That You Need
Many scientific and academic journals require that statistical tables be in a specific format (for example, the American Psychological Association [APA] style). This paper shows you how to change the output of any SAS® procedure to conform to any journal or corporate style. You'll learn how to save data from any SAS procedure, change the data format using the DATA step, and dynamically create a format based on your data. You'll also use the SAS Output Delivery System (ODS) inline formatting functions and style overrides, and produce several output formats, including HTML, RTF (for Microsoft Word), PDF, and Microsoft Excel files. Finally, you'll learn how to create and deliver the output on-demand using SAS server technology. This paper is appropriate for all SAS skill levels.
Vince Delgobbo, SAS
Mapping Roanoke Island Revisited: An OpenStreetMap (OSM) Solution
In a previous presentation, SAS® was used to illustrate the difficulty of and solutions for mapping small pieces of coastal land, which are often removed from map boundary files, to smooth boundaries. Roanoke Island one of the first areas of the current United States to be mapped (1585) and was used as an example since it is smoothed out of many current maps. While these examples isolated Roanoke Island, they didn't provide detail beyond city names on the map. Originally limited to SAS® Visual Analytics, SAS® now makes background maps that have street and other detail information available for SAS/GRAPH® software using open-source map data from OpenStreetMap (OSM). This paper reviews the previous solutions, and then looks at how to map Roanoke Island using SAS/GRAPH and OSM.
Barbara Okerson, Anthem
Maximizing Metadata Management
Session 2610Delving into the world of metadata can initially seem daunting, but there is a near unlimited amount of value that can be derived from it. Do you want to validate that each physical table is associated with only one metadata table? Do you want to make sure that every job in a flow is connected and that there is only one start job and stop job per flow? Do you want to toggle an option for all transpose transformations? The answers to all these questions and more lie within the metadata if you just know where to look. This presentation first covers the key terminology and concepts involved in how the metadata is stored in the metadata tree. After describing the basic building blocks required to harvest metadata, they are then used to validate whether metadata objects are conforming to specific rules. Finally, an overview of how this can be integrated into an overall metadata reporting solution is given.
Lewis Mitchell, Barclays
Amit Patel, Barclays
Measuring Response to the Ultimate Driving Machine: Consumer Sentiments and Brand Advertising
Increasingly, customers use social media and other Internet-based applications (for example, review sites) to voice their opinions and to express their sentiments about brands. These reviews profoundly influence brand performance, either directly (by affecting consumer behavior) or indirectly (by generating positive or negative word-of-mouth through online social networks). We present a methodology that can be used to collect data from popular brand review sites and discussion boards and then analyze customer feedback. Strategic implications of brand sentiments are discussed as we explore the influence of actual product experience and brand expectations (that are shaped by promotional strategy) on brand sentiments. In order to demonstrate the utility of our methodology, we first collected data using customized Python code from Edmunds.com, which is a popular review site where car owners post detailed evaluations of the cars they have purchased. To get a long-term perspective of brand strengths and weaknesses, data was collected for car models released from 2012 to 2017 from several brands. These detailed, unstructured, and textual reviews were then analyzed using the principles of text mining and sentiment analysis. After identifying the strengths and weaknesses of each brand, we explored the relationship between a brands marketing campaign and its sentiments.
Amit Ghosh, Cleveland State University
Goutam Chakraborty, Oklahoma State University
Praveen Kumar Kotekal, Oklahoma State University
Medical Appointments: Show/No-Show
Session 2878Sometimes people make a doctors appointment, and then they do not show up. Past studies show that some 30% of people do not show up for their appointment. This is a huge loss for doctors as they lose their earnings for those missed appointments. On the other hand, patients who really wanted an appointment as soon as possible were not able to get one. So there are two losses: the financial loss for the doctor and the loss of an appointment for the person in need. This paper can help clinics and hospitals in understanding what attributes are associated with individuals who do not show up for their appointment. Using SAS® Enterprise Miner(tm), we try to predict what factors are responsible for a No Show. The paper also examines the ways by which we can reduce no-shows. The data was obtained from the Kaggle website and contains 300,000 medical appointments and 15 variables.
Shubham Panat, Oklahoma State University
Merge with Caution: How to Avoid Common Problems When Combining SAS® Data Sets
Session 1746Although merging is one of the most frequently performed operations when manipulating SAS® data sets, there are many problems that can occur, some of which can be rather subtle. This paper examines several common issues, provides examples to illustrate what can go wrong and why, and discusses best practices for avoiding unintended consequences when merging.
Joshua Horstman, Nested Loop Consulting
Mile-High Visual Analytics: Ways to Enhance Reports and Dashboards
The recent versions of SAS® Visual Analytics include enhancements to several useful features that can elevate your reports and dashboards from good to great. This presentation describes workable solutions to common obstacles faced by report developers and data scientists. Topics include the use of parameters, when and how to calculate items, adding dynamic chart titles, and creating hierarchies to add drill-down functionality. This presentation is suitable for users of all experience levels and demonstrates how to optimize SAS Visual Analytics to elevate your reports and dashboards to the next level.
Scott Leslie, MedImpact
Tricia Aanderud, Zencos
Mine Mass Fragments Using SAS® and Python for Metabolite Identification of Antibody-Drug Conjugates
Metabolite identification of antibody-drug conjugate (ADC) and peptide drugs using mass spectrometry is challenging due to the complexity of their metabolism and catabolism reactions and the lack of computing tools for mining complicated fragment patterns from high-resolution mass spectra. Mass fragmentation data are enormous, and thus manual interpretation methods routinely used for small molecules are time-consuming and inefficient. Previously, we reported on an application based on SAS®, AIR Binder, for dynamic visualization, data analysis, and reporting of preclinical and clinical drug metabolism assays (PharmaSUG 2017, MWSUG 2017). We further expanded the application for mining raw accurate mass spectra data. Python scripts were developed for fragment searching and iteration. SAS was then used as a platform to integrate searching results and generate visualization solutions for molecular structure elucidation via SAS macros and various Output Delivery System (ODS) graphics. Searching algorithms for the prediction of accurate mass were based on metabolite anchor regions such as warhead and linker of ADCs, and growing regions that are adjacent amino acids, conjugates with catabolism products, and variations produced by Phase I and II metabolic reactions. Overall, this novel SAS solution in combination with Python scripts was a success for large mass spectra fragment mining of ADC and peptide metabolites with significantly improved productivity and efficiency.
Hao Sun, Independent Consultant
Kristen Cardinal, Independent Consultant
Minimum Information for Training a Classifier
Session 2515Classifier accuracy is extremely important and can be improved by increasing the size of the training data set. However, in experimental studies it might be very costly to survey cases; therefore, limiting sample size to a minimum is essential. Sometimes very large data sets might not contain enough information, and additional computer resources do not improve accuracy. Stopping at the optimal iteration saves computer time and sampling costs. For this reason, a sequential method of training classifiers can be of use. This paper proposes a sequential method that seeks to sample the minimum number of observations necessary to train a classifier to estimate the feasible minimum rate of misclassification, the Bayes error. Using SAS/IML® Studio, this method of classifier training proves ideal as it gives the researcher more control over the process by specifying when the sequential procedure should be stopped. It is not restricted to any single method of classification, and it never seeks to obtain an unfeasibly low misclassification rate.
Catherine Halsey, University of Pretoria
Frans Kanfer, University of Pretoria
Salomon Millard, University of Pretoria
Mitigating the Effects of Class Imbalance Using SMOTE and Tomek Link Undersampling in SAS®
Many standard learning algorithms have trouble adequately learning the underrepresented class in imbalanced data sets. Altering the training data with sampling methods can make it easier for classifiers to learn the class of interest. Two such methods are SMOTE, which generates synthetic minority class examples, and Tomek link undersampling, which clears majority class examples from class boundaries. Both methods were implemented in SAS® along with a combination of SMOTE followed by Tomek link undersampling (SMOTE+Tomek). Using a data set of credit card fraud transactions where the class of interest was either fraud (minority class) or not fraud (majority class), the efficacy of these techniques was tested by training four classifiersa random forest, a neural network, a support vector machine, and a rule induction classifieron training data sets that were processed using each method, and testing them on a validation set. The performance of the classifiers on the validation set was assessed using the ROC index, precision, recall, and the ratio of false negatives to false positives (FN/FP). SMOTE and SMOTE+Tomek were the most effective preprocessing methods for improving the detection of fraudulent transactions in the credit card data set. Both methods improved recall and lowered the FN/FP for every classifier, indicating improved sensitivity to fraud. At the same time, SMOTE and SMOTE+Tomek improved the ROC index, indicating an improved ability to distinguish fraud from non-fraud.
Kyle Biron, Kennesaw State University
Jonathan Boardman, Kennesaw State University
Ryan Rimbey, Kennesaw State University
Xuelei Ni, Kennesaw State University
Model Life Cycle Automation Using REST APIs
SAS has created a new release of its model life cycle management offerings. This release provides an open, RESTful API that enables easier and more customizable integrations. Applications, including third-party applications, can communicate with the model life cycle management system through the RESTful web service APIs. RESTful integration has become the new standard for web service integration. SAS® APIs conform to the OpenAPI specification, so they are well documented and easy to understand. This article walks readers through the major steps of the model management life cycle: build, registration, compare, test, compare, approve, publish, monitor, and retrain. Models that can participate in the model management life cycle are created in SAS® Enterprise Miner(tm), SAS® Visual Data Mining and Machine Learning, or in open-source modeling tools such as the Python scikit-learn toolkit or Google TensorFlow. Using the latest releases of SAS® Model Manager, SAS® Workflow Manager, and SAS® Workflow Services, this article shows how to create custom model life cycles that integrate with various SAS and open-source services. After reading this article, readers will have a clear understanding of model life cycle management and how to start creating their own integrated custom model life cycles by calling SAS REST APIs.
Glenn Clingroth, SAS
Wenjie Bao, SAS
Chuyang Wan, SAS
Bevan Li, SAS
Chengwen Chu, SAS
Model Selection Using Information Criteria (Made Easy in SAS®)
Today's statistical modeler has an unprecedented number of tools available to select the best model from a set of candidate models. Based on a focused search of SAS/STAT® procedure documentation, at least 15 procedures include one or more information criteria as part of their default output. It is, however, unusual for applied statistics courses to discuss information criteria beyond smaller is better. The focus of this breakout session is twofold. First, I provide an accessible conceptual overview of what information criteria are and why they are useful for model selection. Second, I demonstrate how to use Base SAS® functions to pull information criteria into a convenient and easy-to-read format.
Wendy Christensen, University of California
Model Selection with Higher-Order Interactions in SAS® MIXED and GLIMMIX Procedures
It is common to model a longitudinal outcome using a linear mixed effect model or generalized linear mixed effect model. For example, the effect of traumatic brain injury on behavioral outcomes over time might be moderated by genetics and family environment, resulting in a four-way interaction of TBI (versus no TBI), gene, family environment, and time since injury. It is tedious to manually select variables involving high-order interactions due to the number of terms and the hierarchical structure of the terms in the model. A user-friendly SAS® macro, INTERACTION_SELECT, performs backward selection of fixed effects including high-order interactions with user-specified random effects using SAS® 9.4 procedures MIXED and GLIMMIX. This macro supports user-specified initial model structure including continuous, categorical, user-forced predictors and their two-way or higher interactions. At each step, type III tests of fixed effects that are not involved in higher order terms are used as criteria to eliminate predictors. After model selection, significant (p < 0.05) predictors that are not elements of any higher order interactions and all their lower order predictors are included in the optimal model. Model selection fit statistics, AIC, AICC, BIC for PROC MIXED, or -2 Res Log Pseudo-Likelihood and Generalized Chi-square for PROC GLIMMIX are summarized graphically. Likelihood ratio test is performed on the full model versus the final model.
Yin Zhang, Cincinnati Children's Hospital Medical Center
Nanhua Zhang, Cincinnati Children's Hospital Medical Center
Model-Based Fiber Network Expansion Using SAS® Enterprise Miner® and SAS® Visual Analytics
Charter Communications is the second largest cable service provider in the nation and has started making huge investments to expand its fiber network in order to maintain a competitive edge in the industry. Thus, it is very important that this investment is managed in the most efficient way by using a model-driven strategy to accomplish the following: identify the best go-to markets for prospects; determine the cost of expanding the fiber network to each business location; and get the maximum ROI for the company. Spectrum Enterprise uses SAS® Enterprise Guide®, SAS® Enterprise Miner(tm), and SAS® Visual Analytics to create and deploy logistic and linear regression models to help achieve these goals. The models help create a focused segmentation strategy for marketing campaigns, resource allocation for the telesales group, and an expansion strategy for the network construction team nationwide. Our SAS® platform provided an integrated approach for delivery of model output, along with the technical help provided by SAS consultants to fine-tune models. The marketing team has been able to improve campaigns performance by over 30%, and the network construction team has been able to identify thousands of new buildings with the best ROI (with only 5% rejection rate), which was deemed unreasonable until now.
Nishant Sharma, Spectrum Enterprise
Monte Carlo K-Means Clustering
Session 1689One of the most difficult issues with K-means clustering is knowing whether the algorithm has found true clusters of data. The reason for this difficulty is that the clusters found depend heavily on the starting initial seeds (starting points) of the clusters. This work shop shows why clusters returned from a K-means clustering algorithm might not be optimal. Then, a demonstration shows how to increase the likelihood of getting good clusters from the data. The demonstration uses a SAS® macro inside SAS® Enterprise Miner(tm). Data is clustered using the clustering node. The macro collects and saves the number of clusters and the cluster centers. The process is run numerous times until a sufficiently large data set is obtained. From this data set, the optimal number of clusters is determined. Then, the cluster centers collected by the SAS macro are clustered. These cluster centers will be closer to the true or optimal clusters in the data.
Donald Wedding, Sprint Corporation
Multi-Factor Authentication with SAS® and Symantec VIP
Organizations with strict security requirements might require that SAS® integrate with a multi-factor authentication (MFA) solution such as Symantec Validation Identification Protection (VIP). While most organizations already secure VPN access with MFA, the need for it in a SAS environment might stem from the sensitive nature of the data SAS is accessing or from compliance-related requirements. MFA requires that an authenticating user not only provide a valid user name and password, but also a one-time password in the form of a security code. Or the user might need to approve a push notification sent to a registered mobile application. This paper shows how components in both SAS® 9.4 and SAS® Viya® can be configured to integrate with Symantec VIP, so that any authentication attempt into SAS would require that a user not only successfully enter a user name and password, but also successfully respond to the other configured authentication factors.
Jody Steadman, SAS
Michael Roda, SAS
Multiple Regression Diagnostics in SAS®: Predicting House Sale Prices
Being able to predict a variable of interest through multiple regressions is a powerful tool, but how can you tell if the model you have chosen is actually useful? And if there is a potentially better model, how do you know where to begin with adjustments? In this exploration, we look into the different possible analyses in judging a regression models overall effectiveness using data on house sale prices in King County, Washington. There were a total of 21,613 observations, with twelve potential independent variables and house sale price as the target prediction. Using various SAS® procedures, we are able to examine the significance of a preliminary full linear model, obtain a reduced model through elimination of insignificant predictors, and inspect various diagnostic plots and specific output values for variable collinearity, sample normality, and outliers. All analysis was completed in SAS® Studio. Our technical aim was to be able to accurately predict house sale prices using a multiple regression model with various characteristics of the resident property acting as the predictor variables, but the bigger goal for this study was to understand the specific methods used to verify sample assumptions made for regression analysis, which involve assessing predictor interaction (collinearity), resemblance of the sample to a normal distribution, and outlier impact to validate model adequacy and improve the prediction tool.
Karen Bui, University of Central Florida
My Experiences in Adopting SAS® Cloud Analytic Services into Base SAS® Processes
Session 1710The SAS® Platform is greatly enhanced by SAS® Cloud Analytic Services (CAS) and SAS® Viya®. As a Base SAS® programmer, I want to share my experiences in adopting CAS into Base SAS processes that prepare data for modeling, reporting, and visualizations that are enabled for CAS.
Steven Sober, SAS
My Top 10 Ways to Use SAS® Stored Processes
SAS® Stored Processes are a powerful facility within SAS®. Having recently written a book about SAS Stored Processes, I have discovered the 10 best ways to use them so that I can illustrate their different abilities and how to maximize them. I explain how to run almost any code from your web browser, how to use SAS Stored Processes to export data from SAS to systems like Tableau, how to optimize their use for thousands of users, how to build mobile applications, build visualizations like those seen in SAS® Visual Analytics, how to make web services to integrate with other clients, and much more. All of these techniques are not generally well known, although they are not complex. If you don't already understand these techniques, then adding these to your skill set will enable to you to achieve much more.
Philip Mason, Wood Street Consultants Ltd.
Mylan's EpiPen Controversy: Leveraging Text Analytics During a PR Crisis
Session 2659The impact of social media on businesses is huge as the number of people sharing their unbiased opinions on platforms like Twitter has increased drastically over the past few years. Both good and bad publicity can spread to the masses in a matter of few seconds. Mylan N.V., an American global pharmaceutical company that acquired the right to market the EpiPen, is one such company that has been appearing on social media for the past year. People tweeted their concerns when the price of EpiPens increased drastically in 2016, when more than 80,000 EpiPens were recalled in March 2017 across multiple countries, and in April 2017 when Sanofi filed an antitrust lawsuit against Mylan over EpiPens. The primary objective of this research is to determine the attitude of people toward the company and the industry, and whether the prior reputation of the company (post price hike) influenced how people reacted to the EpiPen product recall and lawsuit. To assess the opinion of people toward the company over time, I collected the data from Twitter using the Google Twitter API. I used SAS® Enterprise Miner(tm) and SAS® Sentiment Analysis Studio to determine the overall sentiment and the topics mentioned in the tweets. This analysis can help the company stay ahead of the competition and retain its customer base, especially when generic versions of an EpiPen competitor are available at cheaper prices.
Nikhila Kambalapalli, Oklahoma State University
Natural Disaster Planning and Recovery Using SAS® Visual Analytics on SAS® Viya®
Planning for natural disasters, evacuation processes, and recovery are always top priorities for local governments and relief organizations. They must mobilize in an instant to set up shelters and provide supplies for victims. Then, after a disaster, the often lengthy recovery phase begins. They must work to help people return to their homes by ensuring that areas are safe, while also working to restore power and communications. Using SAS® Visual Analytics on SAS® Viya® and integrating with Esri tools, local governments and relief organizations can use location analytics to analyze storm predictions, search for shelter locations in order to plan evacuation routes, and enrich their existing data with Esri demographic data to stay informed and quickly make mission-critical decisions.
Mary Osborne, SAS
Adam Maness, SAS
Navigating the Analytics Life Cycle with SAS® Visual Data Mining and Machine Learning
Extracting knowledge from data to enable better business decisions is not a single step. It is an iterative life cycle that incorporates data ingestion and preparation, interactive exploration, application of algorithms and techniques for gaining insight and building predictive models, and deployment of models for assessing new observations. The latest release of SAS® Visual Data Mining and Machine Learning accommodates each of these phases in a coordinated fashion with seamless transitions and common data usage. An intelligent process flow (pipeline) experience is provided to automatically chain together powerful machine learning methods for common tasks such as feature engineering, model training, ensembling, and model assessment and comparison. Ultimate flexibility is offered through incorporation of SAS® code into the pipeline, and collaboration with colleagues is accomplished using reusable nodes and pipelines. This paper provides an in-depth look at all that this solution has to offer.
Brett Wujek, SAS
Susan Haller, SAS
Jonathan Wexler, SAS
Navigating through Code with the DATA Step Debugger
How would you know if there is a logic error in your program? What is a good way to determine whether there is a logic error in the program? Have you ever run an intricate DATA step and the results are not as you expect? Viewing the SAS® log does not help you debug the program because the data is valid and no errors appear in the log. The DATA Step Debugger in SAS® Enterprise Guide® provides a nice interactive way to watch what's going on in the DATA step and to quickly identify data and logic errors. It enables you to control the execution of a DATA step program, step through your program line-by-line, or suspend execution of your program at selected statements. It not only enables you to watch code execute, but you can also change the values manually in the program data vector as your program is running. The DATA Step Debugger is a useful tool for all SAS users, from the beginner to advanced programmer. This hands-on presentation demonstrates how to use the new DATA Step Debugger in SAS Enterprise Guide to identify logic errors even in the most complex DATA steps. Attendees in the workshop have the opportunity to get hands-on experience of debugging SAS DATA steps with this interactive tool.
Mina Chen, Roche (China) Holding Ltd.
New Frontiers in Pricing Analytics
Pricing and promotion decisions in today's marketplace are becoming increasingly dynamic. Consumers are shopping in an information-rich environment where buying decisions are made across multiple channels and at an extremely rapid pace. Therefore, manufacturers and retailers need to react with agility in response to consumer pattern and competitive changes. Traditional regression-based causal models need to evolve into trend-spotting algorithms and rapid-recommendation engines that offer a response to these changing trends. In this session, we provide an insight into the state-of-play in pricing and promotion analytics, with a specific focus on the Consumer Packaged Goods industry. We focus the session on (a) providing an overview of the latest developments in artificial intelligence and machine learning in the price-promo space, (b) illustrating recent use cases where some of these innovations have been implemented, and (c) in an open discussion forum, providing a perspective on where this area is headed over the next 35 years.
Sharat Mathur, IRI
New Location Analysis and Demographic Data Integration with SAS® Visual Analytics and Esri
Session 1801Location is an important part of business data. Business always happens somewhere. With smart devices like phones, wearables, and Internet of Things (IoT) devices all producing location information, the ability to analyze where things happen helps create a better understanding of business. SAS continues to expand the location analytics capabilities that are offered in SAS® Visual Analytics on SAS® Viya®. This paper presents the new capabilities that can help you use the location information in business data. There are a variety of situations where location data is vital. Often business data can contain a large number of data points, sometimes in the order of 100,000 or a million. The preferred method for visualizing that many points on a geo map is to use geo-clustering. Another situation is when you are analyzing customers in a given location. Adding demographic will improve insights. If you are performing a task such as analyzing crimes around a location in a city, using travel-time analysis and travel-distance analysis can provide valuable insights. If you need to represent various points of interest on geo maps, using symbols or icons helps make the map clearer. This paper demonstrates how to use these capabilities in SAS Visual Analytics to solve real use cases for demographic data integration with Esri, geo-clustering, symbols, travel-time analysis, and custom regions.
Murali Nori, SAS
Falko Schulz, SAS
Himesh Patel, SAS
New Ways to Incorporate Continuous Predictors in Generalized Linear Models
Generalized linear models (GLMs) are used in many fields because they accommodate responses with a variety of distributions and predictors that can be either continuous or categorical. In the insurance industry, GLMs are often used for ratemaking to estimate the relative risk between policyholders. Traditionally, we specify linear or polynomial effects for continuous predictors. However, alternative approaches are available for incorporating continuous predictors when the relationship between the predictor and the response is complex and unknown. One approach is to bin the predictor, apply a weight of evidence (WOE) transformation, and specify the binned predictor as a linear effect. Another approach is to specify spline effects for continuous predictors. This approach provides greater flexibility in relating the predictor to the response. Spline effects are supported by the GENSELECT and GAMMOD procedures in SAS® Viya®, which fit GLMs and generalized additive models. This paper compares these approaches using zero-inflated insurance data and shows when each approach is appropriate.
Angela Wu, State Farm Insurance Company
Anthony Salis, State Farm Insurance Company
Gordon Johnston, SAS
Non-metadata Methods to Keep Passwords and Sensitive Strings out of SAS® Source Code and Logs
Session 1736SAS® provides metadata-based methods to keep most passwords and encryption keys out of source code and logs. In this session, we provide a brief overview of these methods, and then describe alternative methods that do not require or depend on metadata features provided by SAS. The first method is the simplest: function-style SAS macros that supply sensitive and confidential strings. For security, these macros must be written in a certain way, and the executable file for the compiled macro must be vetted to ensure that no sensitive data is visible. This method is relatively secure when used in open code, but it might not be secure if the invocation of the compiled macro is inside a wrapper macro with MPRINT enabled. The second method is an extension of work by Sherman and Carpenter (2009): a secure, compiled, encrypted macro that assigns librefs to relational databases and Open Database Connectivity (ODBC). The macro uses an encrypted SAS data set that contains the login parameters. The user of the macro does not know, and cannot acquire, the user ID and password for the target database. This is a very complex macro with extensive error checking. A table that summarizes the various metadata and non-metadata approaches is also provided.
Thomas Billings, MUFG Union Bank
ODS PDF Accessibility in SAS® 9.4M5: Going Beyond the Basics to Create Advanced Accessible Reports
Session 2124SAS® 9.4M5 covers the basics of PDF accessibility, and many simple reports might be accessible without any additional work. However, if you produce reports that use advanced reporting features and accessibility is a requirement, you might need to provide additional manual accessibility remediation beyond the automatic accessibility features provided in SAS 9.4M5. This paper identifies advanced features that might require remediation and teaches you how to address those gaps using Adobe Acrobat Pro.
Greg Kraus, SAS
ODS PDF Accessibility: How SAS® 9.4M5 Enables Automatic Production of Accessible PDF files
Session 2129No longer do you need to begin the accessibility challenge with a raw, inaccessible PDF. SAS® 9.4M5 now provides a framework for creating accessible PDF files. This paper not only explains the automatic accessible features provided by SAS 9.4M5, but it also shows you how to use SAS® programming changes to improve the accessibility of the generated PDF files.
Daniel Oconnor, SAS
Woody Middleton, SAS
OpenID Connect Opens the Door to SAS® Viya® APIs
Session 1737As part of the strategy to be open and cloud-ready, SAS® Viya® services leverage OAuth and OpenID Connect tokens for authentication. OpenID Connect and its base technology OAuth 2.0 provide a simple security framework built on the HTTP protocol and are quickly becoming the de facto standard for public APIs. This paper describes how SAS Viya uses these standards and demonstrates how developers and administrators can use simple commands such as curl to authenticate to SAS Viya APIs.
Michael Roda, SAS
Optimization Modeling with Python and SAS® Viya®
Python has become a popular programming language for both data analytics and mathematical optimization. With SAS® Viya® and its Python interface, Python programmers can use the state-of-the-art optimization solvers that SAS® provides. This paper demonstrates an approach for Python programmers to naturally model their optimization problems, solve them by using SAS® Optimization solver actions, and view and interact with the results. The common tools for using the optimization solvers in SAS for these purposes are the OPTMODEL and IML procedures, but programmers more familiar with Python might find this alternative approach easier to grasp.
Sertalp Cay, SAS
Jared Erickson, SAS
Optimizing Inventory of Slow-Moving Products Using SAS® Optimization
Session 2794Aiming to use data and science to set service-level goals, Advance Auto Parts engaged CoreCompete to deliver a fully integrated service-level (Inventory) optimization system using SAS® Inventory Optimization and SAS/OR® software. The system has the ability to run inventory simulations and execute large-scale optimization for service-level goal optimization, leveraging the batch services on Amazon Web Services. A very large mixed integer optimization problem for inventory cost reduction is solved using the OPTMODEL procedure. The solution has the ability to recommend optimized service-level goals at the SKU/location level. System design integrates a simple Microsoft Excel user interface, data processing in Apache Hadoop, and optimization in SAS® in the cloud and the dashboards in SAS® Visual Analytics in order to review results. The end-to-end process flow for implementing simulations and optimization in large scale is discussed in this paper.
Lokendra Devangan, CoreCompete
Malleswara Sastry Kanduri, CoreCompete
Richard Dansoh, CoreCompete
Amy Mcarthur, Advance Auto Parts
Optimizing Red Hat Global File System 2 on SAS® Grid Manager
Session 1929Global File System 2 (GFS2), developed by Red Hat, Inc., is one of the most popular shared file systems for use with SAS® Grid Manager. This paper serves as a one-stop shop for understanding GFS2 installation precepts as they relate to SAS® grid operations and performance. It also covers GFS2 cluster limitations, recommended LUN and file system construction and sizes, underlying hardware bandwidth and connection requirements, software placement on SAS grid nodes, and GFS2 tuning for the Red Hat Enterprise Linux operating system, versions 6 and 7.
Tony Brown, SAS
Organizational Contracts in Analytics of Things Ecosystems
The impending convergence of big data, analytics, and the Internet of Things (IoT) has been referred to as an era of the Analytics of Things (AoT). Companies working in AoT ecosystems like smart cities and inter-organizational collaborations will build business relationships through contracts related to the sharing (or not) of data and analytics resources. Data and analytics ownership issues are already challenging the legal system, and with the proliferation of AoT, these challenges will only become more pervasive. This research addresses inter-organizational data and analytics ownership contracts. Findings provide guidance for both ecosystem owners and renters of multi-tenant AoT ecosystem resources. While recent research has reported that data and analytics sharing provide a significant innovation advantage, what we know and understand about ownership contracts lags far behind these current findings.
Michael Goul, Arizona State University, W. P. Carey School of Business
Oscars 2017: Text Mining and Sentiment Analysis
Session 2846It has always been fascinating how the number of awards shows has been increasing year after year. It is the enormously positive response of the audience that keeps the stage shows coming. We know that the sentiments of people play a crucial role in deciding the prospects of a particular event. This paper provides crucial insights on how people sentiments could determine the success or failure of a show. The paper describes text mining of peoples reactions toward the 2017 Oscars in general and a sentiment analysis regarding the best picture mix-up using SAS® Sentiment Analysis Studio. This paper aims to determine the success of an awards show based on individual sentiments before, during, and after the show. This information can provide a better picture of how to handle any unwanted circumstances during the event. We can conclude from the 2017 Oscars that the sentiments of the people were more positive or neutral, indicating that the excitement surrounding the show will overshadow the effects of any unwanted events. I have done a comparison of a statistical model with the rule-based model to gauge the performance and efficiency of the models.
Karthik Sripathi, Oklahoma State University
Outputting Your Data to Microsoft Excel Is Inevitable. So Is the SAS® ODS Excel Destination
Microsoft Excel is one of the most used tools wherever data is used, stored, or analyzed. Many SAS® users often resort to producing Excel output and working outside of the SAS environment to finesse the deliverables...until now. With the SAS® Output Delivery System (ODS) Excel destination, data manipulation and visualization is now possible. Native Excel files and graphs can now be created and customized. It was just a matter of time before this magnificent tool became a reality. ODS Excel is here to stay! Novice programmers with little or no experience at all with ODS output to experienced professionals will instantly experience its benefits. This e-poster demonstrates with easy-to-follow steps how to deliver your data from SAS to Excel. Users will realize time-saving benefits by pre-defining their preferences and avoid performing manual and repetitive tasks such as creating multiple sheets, adding color, titles, graphs, headers, footers, and so on.
Claudia Hauck, Educational Testing Service
Hezekiah Bunde, Educational Testing Service
Paper SAS1991:2018 Causal Mediation Analysis with the CAUSALMED Procedure
Important health care and policy decisions often depend on understanding the direct and indirect (mediated) effects of a treatment on an outcome. For example, does a youth program directly reduce juvenile delinquent behavior, or does it indirectly reduce delinquent behavior by changing the moral and social values of teenagers? Or, for example, is a particular gene directly responsible for causing lung cancer, or does it have an indirect (mediated) effect through its influence on smoking behavior? Causal mediation analysis deals with the mechanisms of causal treatment effects, and it estimates direct and indirect effects. A treatment variable is assumed to have causal effects on an outcome variable through two pathways: a direct pathway and a mediated (indirect) pathway through a mediator variable. This paper introduces the CAUSALMED procedure, new in SAS/STAT® 14.3, for estimating various causal mediation effects from observational data in a counterfactual framework. The paper also defines these causal mediation and related effects in terms of counterfactual outcomes and describes the assumptions that are required for unbiased estimation. Examples illustrate the ideas behind causal mediation analysis and the applications of the CAUSALMED procedure.
Michael Lamm, SAS
Yiu-Fai Yung, SAS
Wei Zhang, SAS
Parallel Programming with the DATA Step: Next Steps
Session 2184The DATA step has been at the core of SAS® applications and solutions since the inception of SAS. The DATA step continues its legacy, offering its capabilities in SAS® Cloud Analytic Services (CAS) in SAS® Viya®. CAS provides parallel processing using multiple threads across multiple machines. The DATA step leverages this parallel processing by dividing data between threads and executing simultaneously on the data assigned to each thread. The LAG function in the DATA steps RETAIN statement and the automatic variable _N_ require special attention when running DATA step programs in CAS. This paper describes each of these features, discusses how you can continue to use them with CAS, and compares performance with that in SAS® 9.4.
David Bultman, SAS
Jason Secosky, SAS
Penalized Variable Selection and Quantile Regression in SAS®: An Overview
Session 1822This friendly and picture-oriented paper is about some new procedures for modeling using penalized variable selection and some procedures for building models that are a richer description of your data than you can get from ordinary least squares (OLS). The four procedures that we cover are REG, GLMSELECT, QUANTREG, and QUANTSELECT. The paper explains theory and gives examples of SAS® code and output for four procedures. The new penalized methods that are discussed in this paper help, but only help, find a parsimonious model. They help us by using algorithms that are more stable, continuous, and computationally efficient than stepwise methods.
Russ Lavery, Independent Contractor
Performing Machine Learning Techniques in a Contextual Marketing Scenario
Although information for identifying high-potential customers is a key piece in a Customer Value Management strategy, it is hardly available in real-world databases. This paper describes an analytical framework developed to find the next best offer for each of the over five million customers in the mobile prepaid business of Telefonica Chile. This framework is based on the analytical life cycle methodology proposed by SAS® best practices, which promotes business understanding as a crucial control point in the use of advanced analytics in the real world. A supervised learning approach using artificial neural networks in SAS® Enterprise Miner(tm) is used to predict purchase behavior in a window of time based on the state of a set of variables. Then the best offer for each customer is assigned, based on the scoring propensity for every single product. The customer-product data set is used as input for SAS® Real-Time Decision Manager to accelerate the consumption of the customers available balance every time a customer tops up or uses his handset. Finally, as the next step in this process, the generation of predictive models for other business objectives is being planned. In this sense, counting on SAS® Viya® benefits through SAS® Visual Statistics and SAS® Visual Data Mining and Machine Learning, we can improve the experience of use in a production industrial environment with more flexibility and higher performance of the analysis involved.
Francisco Capetillo, Telefónica Chile
Alvaro Velasquez, Telefonica
Personal Lending: What is Customer Price/Credit Optimization? Is Experimental Design Inevitable?
Session 1774What are customer price and credit optimization? The author has faced this challenging question numerous times with various analytical and business professionals. On the surface, the question itself can appear straightforward to analytics and business management facing the complex decisions around customer value management. However, in reality, variations in the understanding of the customer price/credit optimization concept are quite vast and have led to a high degree of confusion around the optimization concept within the financial industry. The main objective of this presentation is to define the concept of customer pricing/credit optimization properly. The experimental design plays the key role in the definition of the optimal solution, and there is no a way around that. Eventually, any candidate for an optimal pricing/credit solution must be taken through a proper validation based on a proper experimental design. We illustrate cases with examples of experimentally designed test campaigns on real lending portfolios. Examples of pricing and credit optimization solutions are discussed in detail. This presentation is a follow-up to the authors previous presentation at SAS Global Forum 2016 about customer credit and pricing optimization. The critical tool for the authors optimization solutions is the effect/uplift modeling. The effect modeling tool, as well as other necessary tools for optimal solutions, have been developed using SAS/STAT® software.
Yuri Medvedev, Bank of Montreal
Persuading with Data Stories: Is Jaws Just Misunderstood?
Session 2607Lets face itthe media has given sharks a bad rap, portraying them as villains and creating a culture that fears the fin. Sources like the International Shark Attack File, the Global Shark Attack File, and OCEARCH provide us with staggering amounts of data every day. This data gives us the ability to map shark attack occurrences all over the world, to determine the activity that brought on the attack, whether the attack was provoked, and, ultimately whether it was a fatal attack. We can use the data to tell data stories about this animal species that fascinates so many of us. So, the prevailing question is, can we reduce fear over shark attacks using data? We explore this possibility using SAS® Visual Analytics 8.1 on SAS® Viya®.
Jaime D'agord, Zencos
Phonetic Search Helps To Fight Risks
Session 2490Phonetic search can be used across various areas if businesses would like to detect any potential risk to their customers based on various incidents, suspicious events, or negative news hits in the media. However, one might not receive exact or complete information, which means that searching with proper spelling and getting a correct result is a challenge. We implemented a process that is capable of searching based on pronunciation with spelling distance rather than using exact spelling. Basically, this process is based on two SAS® algorithms: one is a phonetic search with a customized Soundex algorithm, and the other is a fuzzy match using the Levenshtein algorithm, associated with scores to improve accuracy. This solution empowered searching utility with higher efficiency in less time because the total complexity has been encapsulated at an abstraction layer in the program. As a result, users need to follow only the simplest of steps to obtain the end result.
Yaobin Chen, HSBC
Flyman Wu, HSBC
Planning for Migration from SAS® 9.4 to SAS® Viya®
Session 2391SAS has delivered SAS® Viya®, the new open architecture powered by Cloud Analytic Services. It is a change in the fundamental methodology of installing SAS®. Moving away from SAS depots, installation processes, and migration of user metadata is discussed in this presentation, as well as maintaining dual environments for SAS Viya and SAS® 9.4. SAS Viya installations and migrations from small to large and from on-premises to the cloud are also discussed.
Don Hayes, DLL Consulting Inc.
Spencer Hayes, Cached Consulting LLC
Michael Shealy, Cached Consulting LLC
Plot Your Custom Regions on SAS® Visual Analytics Geo Maps
Session 2885SAS® Visual Analytics geo maps support country and state geographies out-of-the-box. It can be programmed to plot counties, cities, and ZIP code tabulation areas as well. However, we are often asked if we could plot an organizations custom regions on the map. It might be the sales territories or any nonstandard geographic regions of interest, defined in terms of standard geographic areas: ZIP code, county, or state. This paper outlines how you can plot such custom regions on a SAS Visual Analytics geo map.
Jitendra Pandey, Electrolux Home Products Inc.
Power Analysis for Generalized Linear Models Using the New CUSTOM Statement in PROC POWER
Session 1983The CUSTOM statement that was added to the POWER procedure in SAS/STAT® 14.2 extends the scope of supported data analyses to include generalized linear models and other extensions of existing capabilities. It works in concert with an exemplary data set and the SAS/STAT procedure that you plan to use for the eventual data analysis. This paper explains the method and demonstrates it for a variety of data analyses, including Poisson regression, logistic regression, and zero-inflated models. It also discusses how you can use CUSTOM statement options to refine the method for sample size inflation/deflation or for extra covariates.
John Castelloe, SAS
Predicting Crime Incidents in Los Angeles for the Year 2017
Session 2741Crime poses a grave threat to the serenity of a city. This threat gets more acute with the increase in population and size of a city. The core objective of this paper is to predict the potential number of crime incidents for a city by using historical crime data and to provide insights for predictive first-responder planning and law enforcement. This investigation examines the historical crime data for the city of Los Angeles, California for the time period of January 1, 2010 to July 21, 2017 and ongoing. The data set that was used has been publicly made available by the Los Angeles Police Department. It consists of over 1.4 million observations and 26 variables. Primary exploratory data analysis provided basic insights such as the district where crime is most prevalent (District 363), the most commonly used weapon (strong arm - hand fist) and the most common crime occurrence time (noon). Trend analysis revealed that the crime rate had significantly increased after 2014 and that Battery - Simple Assault was the most common crime. SAS® Enterprise Guide® and R were used for data cleaning, exploration, and data modeling. Work is ongoing to develop statistically valid models to predict crimes for Battery (Simple Assault) using various models such as ARIMA, ETS, and Prophet. In addition, forecasting methodologies such as simple exponential smoothing models, Holts smoothing model, additive and multiplicative models, and Winters model will also be explored for accuracy.
Zaid Shaikh, Oklahoma State University
Predicting Major League Baseball Game Outcomes
Session 2875The game of baseball can be explained as a Markov process, as each state is independent of all states except the one immediately prior. In this research, probabilities are calculated from historical data on every state transition from 2011 to 2016 for Major League Baseball games, and are then grouped to account for homefield advantage and offensive player ability. Simulation is applied to mimic games or entire seasons of baseball teams. For a specific game, the results give the probability of a win and expected runs for each team. For a season, the results of the simulation give a teams overall winning percentage. Predictions can be used by managers, players, and fans for varying interests.
Justin Long, Slippery Rock University
Prediction and Interpretation for Machine Learning Regression Methods
Session 1967The last 30 years have seen extraordinary development of new tools for the prediction of numerical and binary responses. Examples include the LASSO method and elastic net for regularization in regression and variable selection, quantile regression for heteroscedastic data, and machine learning predictive methods such as classification and regression trees (CART), multivariate adaptive regression splines (MARS), random forests, gradient boosting machines (GBM), and support vector machines (SVM). All these methods are implemented in SAS®, giving the user an amazing toolkit of predictive methods. In fact, the set of available methods is so rich it begs the question, When should I use one or a subset of these methods instead of the other methods? In this talk I hope to provide a partial answer to this question through the application of several of these methods in the analysis of several real data sets with numerical and binary response variables.
Richard Cutler, Utah State University
Preparing Students to Take the Base SAS® Certification Exam, Training Students to Think Like SAS®
Session 1889Preparing students to the take the Base SAS® Certification exam is challenging in a formal classroom setting. The diverse backgrounds of the students combined with the unique features of SAS® requires careful planning and insightful examples. The course is part of the core required in the Master of Science in Applied Statistics (MSAS) program at Kennesaw State University. The execution of the course has changed over time due to the experiences of the students and feedback from industry. The topics discussed in this presentation include an overview of the population that takes the class at Kennesaw State University, the technology available to the students, the current organization of the course, the materials provided by KSU, the materials and support provided by SAS, as well as the pitfalls that were encountered in the past that informed some of the current design decisions. Examples of the materials are incorporated into the presentation, as well as challenges faced by the students. The exam can be proctored as part of the class, which also presents an interesting dynamic. The experience, in general, is very similar to a coach preparing an athlete to compete. Much of the mental preparation is on the studentwe can only create an environment that exposes the individual to the challenges that they will face.
Herman Ray, Kennesaw State University
Prescriptive Analytics: Using Optimization with Predictive Models to Find the Best Action
Predictive models are now commonplace, with many applications in the areas of credit risk, customer relationship management (CRM), and marketing. They are used to find the most likely outcome resulting from specific actions taken by organizations in their interaction with their clients. These models have proven to be effective in improving the efficiency and performance of businesses over the last 20 years. The challenge these organizations face now is how to optimize the deployment of predictions made by these models when there is more than one possible model to follow. This presentation reviews the use of mathematical optimization techniques and how they can be used as a second layer on top of predictive models in order to find the best action. In the presentation, the basics of mathematical optimization are reviewed using simple examples, and these simple cases are extended to cover real problems encountered in marketing and credit risk. Then, a case study is presented to show how optimization was used by a North American bank to find the best collections channel. For this specific problem, several predictive models were developed to predict the likelihood of successful collections using a specific collections channel (such as telephone, SMS text message, email, letter, and so on). The predictions from these models were then used in a mathematical optimization formulation to find the best channel per account.
Mamdouh Refaat, Angoss
PROC SORT (then and) NOW
Session 2773The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than the IBM procedure SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data. These days, PROC SORT has fallen out of favor; after all, PROC SQL enables merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: Is the SORT procedure still relevant to any other than the SAS novice or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear yes. PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper discusses those enhancements.
Derek Morgan, PAREXEL International
Producing a Format Library and Test Data for Case Report Forms Using a Data Definition Table
In clinical trials, a data definition table (DDT) is a document that lists a variety of information about each case report form (CRF), such as form number, variable name, variable label or description, data type, length, and codes. When imported into SAS®, it can be used to accomplish a variety of tasks. This paper provides a workflow for creating a format library and test data for CRFs by using SAS and the DDT in Microsoft Excel. Both creating a format library and test data are necessary pieces to gather at the beginning of a study to eliminate the need to re-create formats in each of your SAS programs (reduce code replication) and to start programming when real study data is not available yet (quicker delivery reporting time), respectively. Other uses for the DDT include creating the SAS code for the codebook, annotated CRFs, and analysis file documentation.
Ellen Dematt, Department of Veterans Affairs
Suad El Burai Felix, Department of Veterans Affairs
Ellen Dematt, Department of Veterans Affairs
Rebecca Horney, Department of Veterans Affairs
Ranking Between the Lines: A Macro for Interpolated Medians
Session 2508This presentation goes through the process of creating a macro to find interpolated medians in Base SAS®. While medians are a great way to summarize the center of skewed data, but when collected data comes from an ordinal scale or is drawn from a very small range in possible values, they are not ideal. Two medians can be exactly the same but come from data that is weighted very differently. For that reason, the median does not always accurately represent the center and shape of the data as accurately as it could. Interpolated medians, on the other hand, can better represent whether the weight of the data is above or below the true median. Therefore, they not only tell you something about the center of the median, but also describe the shape of the data. This presentation explains the origins, calculations, and uses of interpolated medians. I then show an example of where the interpolated median succeeds where the regular median falls short by showing two sets of data with the same possible range of values but very different samples. I show that they have the same median and show how the interpolated median becomes much more descriptive and a better measure in this situation. Then I go through the code I used to develop the %MACRO to calculate interpolated medians, along with the tests I ran to validate it against known data sets and known interpolated medians to ensure that the %MACRO functions correctly and is an easy-to-use way to calculate interpolated medians in SAS®.
Joe Lorenz, Grand Valley State University
Reading, Wrangling, Visualizing, and Modeling the Surface Temperature of the Great Lakes
Session 2870The Great Lakes are the largest group of freshwater lakes on the planet and contain 21% of the worlds fresh water. Data on the daily temperatures of the individual Great Lakes is stored on the internet in a collection of text files maintained by the National Oceanic and Atmospheric Administration within the U.S. Department of Commerce and is updated daily. SAS® was used to read this data from the internet and create attractive, reproducible, and informative reports. In this paper, we illustrate how SAS can be used to read data from the internet, how to wrangle and combine the data to get it ready for analysis, how to work with Julian dates, and how to use the Output Delivery System to make reproducible reports. We also explore how to make attractive visualizations using the SGPLOT procedure, and how to use polygon files to make an animated map using the GMAP procedure. Furthermore, we show how SAS/STAT® software can be used to explore relationships and build predictive models between lake temperatures, land temperatures, and snowfall amounts in a city to the west of Lake Michigan. These data and examples work very well for illustrating statistical computing techniques to students in a classroom or for tools for self-study.
Laura Kapitula, Grand Valley State University
Real-Time Image Processing and Analytics Using SAS® Event Stream Processing
Session 2103Image processing is not new and has been here for a while. But now, with the advent of the Internet of Things, this has become one of the most promising and exciting applications of real-time streaming analytics technologies. Historically focused on character or face recognition, the image-processing world has evolved to include enhanced predictive and prescriptive analytics applications using real-time streams of rich image and video content. From predictive maintenance in manufacturing to health care or security, there are countless examples of image analytics domain applications. In this session, we explore how to implement image analysis with SAS® Event Stream Processing and how to deploy image-processing analytic models created with SAS® Viya®.
Frederic Combaneyre, SAS
Recent SAS® 9.4 Middle Tier Platform Updates for Fun and Profit
The SAS® middle tier includes the middle-tier infrastructure software such as SAS® Web Server, SAS® Web Application Server, Java Message Service broker, and the system management software such as SAS® Environment Manager, all of which are critical to all SAS products with a web frontend and all SAS middle-tier services. Since the last SAS® Global Forum, we have made a lot of updates and significant improvements to the middle tier that are driven by customer requirements and input from consultants and SAS technical support. We believe it is extremely important to provide a detailed review of the updates so that SAS administrators can better manage their SAS middle-tier environment and satisfy their corporate IT requirements with improved confidence and fun. This paper first reviews the key components of the SAS middle tier, and then discuss updates made in recent SAS® 9.4 releases (mainly SAS® 9.4M4 and SAS 9.4M5). We specifically discuss the following topics: changes to SAS Web Server, SAS Web Application Server, and Java Message Service broker; changes to SAS® Private Java Runtime Environment; a new process for updating your horizontal cluster node the right way; SSL library updates to address your concerns about OpenSSL vulnerability; how to preserve your manual SSL configurations; improvements to hardening your environment to enforce TLSv1.2 protocol; and changes to system management and SAS® Environment Manager.
Zhiyong Li, SAS
Qing Gong, SAS
Reducing Customer Attrition with Machine Learning for Financial Institutions
Session 1769As financial institutions market themselves to increase their market share against their competitors, they understandably focus on gaining new customers. However, they must also retain (and further engage) their existing customers. Otherwise, the new customers they gain can easily be offset by existing customers who leave. Happily, predictive analytics can be used to forecast which customers have the highest chance of leaving so that we can effectively target our retention marketing toward those customers. This can make it much easier and less expensive to keep (and further cultivate) existing customers than to enlist new ones. A previous paper showed a simple but fairly comprehensive approach to forecasting customer retention. However, this approach fails in many situations with more complicated data. This paper shows a much more robust approach using elements of machine learning. We provide a detailed overview of this approach and compare it with the simpler (parametric) approach. We then look at the results of the two approaches. With this technique, we can identify those customers who have the highest chance of leaving and the highest lifetime value. We then make suggestions to improve the model for better accuracy. Finally, we provide suggestions to extend this approach to cultivate existing customers and thus increase their lifetime value. Code snippets are shown that function for any version of SAS® but they require SAS/STAT® software.
Nate Derby, Stakana Analytics
Reducing Traveling Times for the Cobb County Fire Department
Session 2439The Cobb County Fire Departments (CCFD) 8-minute emergency response time doubles the National Fire Protection Associations (NFPA) 4-minute standard, measured at the 90th percentile of all emergencies. This project aims to reduce CCFDs response time by focusing on the travel times of their various emergency vehicles. Currently, there are 29 fire stations and 272 fire zones within Cobb County, with each fire station being responsible for a pre-defined set of fire zones. We investigate whether fire zones and stations can be realigned to reduce travel times by analyzing historical response time data from September 2015 to August 2016. CCFD historical data reveal which fire station actually responded to each incident, as well as the related travel time. Google Maps is then used to check the response times from neighboring fire stations to determine whether a different fire station could have responded more quickly to the same incident. The comparison between historical and Google travel times reveals the location and frequency of disagreement between the historical and Google recommended fire stations. Fire zones can then be reassigned to different fire stations to reduce future traveling times. Results vary for each fire station, but they show that there is room for improvement in the way that CCFD currently responds to emergencies. Python is used to connect to the Google Maps Distance Matrix API, and SAS® is used for resulting analyses.
Bogdan Gadidov, Kennesaw State University
Yiyun Zhou, Kennesaw State University
Regime-Switching Models: Capturing Structural Changes in Time Series
Session 1879Stock market conditions, government policy changes, or even weather patterns can be regarded as stochastic processes that are driven by unobserved regimes. A powerful tool to explore these behavioral patterns is the regime-switching model (RSM) that is offered in the HMM procedure and the associated action in SAS® Econometrics software. This model, which is widely used in finance, economics, science, and engineering, has two characteristics: it allows different parameter values for different regimes, and it models the transition probabilities between regimes. These characteristics enable it to fully capture the structural changes in the time series. This paper uses two examples to illustrate how you can use RSMs to better understand the regime patterns in your data and improve your economic analysis. The first example demonstrates how regime-switching autoregression (RS-AR) models help you characterize the volatility and dynamics of stock returns. The second example examines the relationship and movement between the Japanese yen and the Thai baht by using regime-switching regression (RS-REG) models.
Xilong Chen, SAS
Ji Shen, SAS
Regression Model Building for Large, Complex Data with SAS® Viya™ Procedures
Analysts who do statistical modeling, data mining, and machine learning often ask the question, I have hundreds of variableseven thousands. Which should I include in my regression model? This paper describes SAS® Viya(tm) procedures for building linear and logistic regression models, generalized linear models, quantile regression models, generalized additive models, and proportional hazards regression models. The paper explains how these procedures capitalize on the in-memory environment of SAS Viya, and it compares their syntax, features, and output with those of high-performance regression modeling procedures in SAS/STAT® software.
Weijie Cai, SAS
Robert Rodriguez, SAS
Remodeling Your Office: A New Look for the SAS® Add-In for Microsoft Office
Session 1864Millions of people spend their weekdays in an office. Occasionally they hang a new picture, or add a new piece of furniture. Eventually their office gets so full they need to reorganize it to find items quicker and work more efficiently. After 15 years, that time has come for the SAS® Add-In for Microsoft Office. This paper reviews the new user interface for the SAS Add-In for Microsoft Office. A redesigned ribbon enables quicker access to the most common operations. Where there used to be several task panes that each served a specific purpose, the task panes are now blended together into one home page that enables users to easily discover, execute, and manage SAS® content. Come see these enhancements and many more in the new version of the SAS Add-In for Microsoft Office.
Tim Beese, SAS
Deva Kumar, SAS
Results of the Application of SAS® for Fraud Identification in the Public Administration of Brazil
Maxtera, a member of the CDS group, received the SAS® Channel Partner award in 2016. With its innovative initiatives in the areas of data analysis and analytical intelligence, Maxtera provides its clients with information that generates knowledge, which contributes to better decisions and new actions in their business area. Several public institutions in Brazil at the federal, state, and municipal levels, in partnership with CDS/Maxtera, use the SAS® tools to identify signs of irregularities (fraud) in their businesses in the areas of revenue recovery, benefit payments, government procurement, education, public safety, among many other business areas. In less than two years, more than 100 million dollars have already been recovered or prevented from migrating to various types of people and criminal organizations. This experience has aroused the interest of many public administrations, and CDS/Maxtera has used SAS solutions and tools to contribute to the success of these projects, applying methodologies and procedures for data analysis and analytical intelligence. This e-poster presents the financial results achieved in the projects, methodologies, and procedures used by CDS/Maxtera clients with the application of SAS tools. Our goal is to contribute our experience and learning with other SAS partners present at SAS® Global Forum 2018.
Leonardo Aguirre, Maxtera
Sergio Cortes, Maxtera
Aline Emidio, Maxtera
Leda Salles, Maxtera
Regina Oliveira, CDS/Maxtera
Retail Product Bundling: A New Approach
Session 1728Affinity analysis is referred to as Market Basket Analysis in retail and e-commerce outlets application. It determines how often items are purchased together and the best possible groupings of products that are regularly bought by customers in order to understand their behavior and preferences. This analysis enables you to design an effective marketing campaign and is appropriate for several kinds of decisions, such as cross-selling products, bundle of products, and product placement, and can provide guidance for website and loyalty program design. A product bundling strategy can help you either to sell two or more products or services together, or to optimize marketing content such as product catalogs, and social media and internet advertising. The output of the Market Basket node in SAS® Enterprise Miner(tm) generates vast numbers of rules that enable us to analyze discovered patterns that are often used as business guidelines for marketing strategies. However, this is not enough to identify the exact decreasing sequential co-occurrence of the set of items that embodies the bundle. To overcome this limitation, a new approach, implemented as a new SAS Enterprise Miner node, is proposed to identify a bundle. This approach simplifies the selection of items and adds richer information to marketing initiatives.
Bruno Nogueira, Youman Mind Over Data
Ridge Regression and Multicollinearity: An In-Depth Review
Session 2825Multicollinearity is the phenomenon in which two or more identified predictor variables in a multiple regression model are highly correlated. The presence of this phenomenon can have a negative impact on the analysis as a whole and can severely limit the conclusions of the research study. This paper reviews and provides examples of the different ways in which multicollinearity can affect a research project, how to detect multicollinearity, and how one can reduce it through ridge regression applications. This paper is intended for any level of SAS® user.
Deanna Schreiber-Gregory, Henry M Jackson Foundation for the Advancement of Military Medicine
Risk Pathways: Using Machine Learning Techniques of SAS® Viya® to Understand Customer Risk Drivers
Session 2675With the new SAS® Visual Analytics and SAS® Visual Data Mining and Machine Learning capabilities on SAS® Viya®, institutions can more quickly and easily understand drivers of customer risk in a visually appealing way. Analysts can identify new opportunities for stakeholders to identify potential gains or losses by understanding drivers of risk and detecting new risk factors as they emerge over time. Using a combination of supervised and unsupervised machine learning methods, businesses can further understand and improve their own definition of customer risk as an organization. In this talk, Zencos shares its expertise in identifying high-risk attributes, understanding the relative importance of each driver, and recognizing key combinations of factors associated with risky behavior through an anti-money laundering case study.
Leigh Ann Herhold, Zencos
Robust Principal Component Analysis: Two Analyses in One
Home telematics is a new and growing field in the insurance industry. Whereas traditional rating variables, such as amount of insurance and type of home construction, describe a home by its structure, telematics variables describe how the home is used. Telematics variables are derived from sensors placed in the home, such as thermometers, motion detectors, and smoke detectors. The data from these sensors are summarized in many different ways in order to provide potential model variables. For insurance companies, the ultimate goal is to use these data to predict future insurance losses for a group of homeowners policies. However, such high-frequency, high-dimensional data are usually full of outliers, so the data require preprocessing. Two objectives of this preprocessing are to determine which homes behave as outliers and to reduce the dimensionality of the predictor variables. The RPCA procedure in SAS® Visual Data Mining and Machine Learning software uses robust principal component analysis to accomplish these objectives. Principal component analysis (PCA) is a common approach to reducing the dimensionality of a matrix. The robust part of this analysis involves splitting the original data matrix into a low-rank matrix and a sparse matrix before performing PCA. The low-rank matrix contains the normal portion of the variables, and the sparse matrix contains the outlier portion of the variables. You can use both these outputs from PROC RPCA to get a better understanding of the data and to prepare the data for modeling.
Anthony Salis, State Farm Insurance Company
Zohreh Asgharzadeh, SAS
Kyungduck Cha, SAS
Robust Tuning for Machine Learning
Tuning of models has become a very important topic. Machine learning models, especially neural nets, have many hyperparameters that might impact the outcome of the modeling process dramatically. Today, some machine learning products, such as SAS® Visual Data Mining and Machine Learning, include autotuning features. This paper discusses (1) different tuning criteria of the robust model, (2) possible approaches toward optimal tuning such as full factorial, screening, response surface methodology, Taguchi L designs, and stochastic design of experiments, and (3) their practical implementation in SAS® Enterprise Miner(tm). Similar to the Nelder-Mead technique, optimal tuning can include an iterative design of experiments, extending the factors levels that reach their limits at interim optimal points. Applying the Taguchi inner array for model hyperparameters and the outer array for validation partition setups and randomizations creates the possibility of using dual response surface methodology to find a robust optimal solution that acknowledges variance across outer arrays. Robust experiment design could allow for substantial subsampling of the training and testing data sets to accelerate the tuning process while controlling both signal and noise aspects. In addition to achieving model optimization, robust tuning provides insights on how model structure and hyperparameters influence the model performance. The latest enables learning by tuning.
Alex Glushkovsky, BMO Financial Group
Role and Value of Visual Analytical Insights for Decision-Making throughout the Global Supply Chain
Session 2790As the retail world has shifted to an omnichannel environment with ever-growing competition, have you ever found yourself overwhelmed by how best to maximize revenue while at the same time improve the operational efficiencies? In this session, you learn how Levi Strauss Co. is modernizing and complementing the Supply and Sourcing Management reporting process by incorporating predictive analytics and decision-making insights to inform and make accurate actions throughout their global supply chain, with a vision of expanding into other areas in the future including the Levi Strauss Co. Global End to End Planning Process and with supplier partners.
Shantanu Samanta, Levi Strauss & Co
Paul Reynolds, Levi Strauss & Co
Role of Preventive Care and Lifestyle Changes in Chronic Diseases
Chronic diseases such as diabetes and hypertension are very widely prevalent in the world. They are responsible for a significant number of deaths each year and treatment for such chronic diseases accounts for high health care costs. Research has shown that these diseases can be proactively managed and prevented while lowering the health care cost. Using SAS® Analytics solutions, we mined a data sample of close to one million patient profiles to identify who can benefit from preventive care and lifestyle changes. Later in the analysis we correlate to associate those patient profiles with their socio-economic data to shed light on a prevention management strategy.
Asish Satpathy, University of California
Satyajit Behari, Femi National Laboratory
Run-Time Validation of SAS® Jobs: A Threesome of Error-Throwing Macros
Session 1793SAS® programming can feel fraught with danger. Each time a program is executed, there is a risk that an error in the source data or code will lead to undetected erroneous results. Run-time validation is a defensive programming technique designed to decrease that risk by adding code to detect and report errors as a program runs. This paper presents three utility macros for run-time validation of SAS jobs: %DupCk detects duplicate key values in a data set; %Assert detects values in a data set that are invalid or unexpected; and %CheckRecordCounts detects the accidental deletion of records. By increasing the detection rates of common errors, these utilities increase the programmers confidence in their results. Principles of run-time validation are discussed, as are principles of macro design encountered during the development of the macros.
Quentin Mcmullen, Siemens Healthineers
Running Parts of a SAS® Program While Preserving the Entire Program
The Challenge: We have long programs that accomplish a number of different objectives. We often want to run only parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: 1) Part of it has run already, and the program timed out or encountered an unexpected error. It takes a long time to run, so we don't want to re-run the parts that ran successfully; 2) We don't want to re-create data sets that were already created. This can take a considerable amount of time and resources, and can also occupy additional space while the data sets are being created; 3) We need only some of the results from the program currently, but we want to preserve the entire program; 4) We want to test new scenarios that require only subsets of the program.
Stephen Sloan, Accenture
Running SAS® Viya® on Oracle Cloud without Sacrificing Performance
SAS® customers now have the opportunity to move workloads to the cloud, benefiting from lower operating expenses (OPEX) and maintenance costs, reduced risk, and, most importantly, keeping analytics close to the data. This can be accomplished without sacrificing performance, predictability, security, and the overall user experience. After extensive joint development and testing with SAS, Oracle can provide a seamless cloud platform for SAS® Viya® and SAS® 9.4 leveraging Oracle Cloud Infrastructure. This cloud platform delivers scalable analytics performance, delivering speeds equal to or better than an on-premises solution. In this session, we discuss the parameters of the joint testing, why Oracle is able to provide a superior cloud platform for analytics, and how a SAS customer is leveraging the Oracle cloud platform and the benefits that they are experiencing.
Dan Grant, Oracle
RWI, Not REI: A Robust Report Writing Tool for Scaling Your Toughest Mountaineering Challenges.
The degree of customization required for different kinds of reports and analyses used in presentations, documents, and spreadsheets varies from one organization or department to another. Often the results needed can be achieved by using a point-and-click tool. If that is not possible, then a coded approach is required. The amount of syntax required increases depending on the SAS® procedure that you choose. The PRINT, TABULATE, and REPORT procedures offer successively greater customization capabilities. However, to achieve complete flexibility, a new tool from SAS called the Report Writing Interface (RWI) is required. In the Output Delivery System RWI, which is new in SAS® 9.4, you write code in a DATA step, and then combine it with any other kind of DATA step statement to construct complex report structures currently not achievable with other tools. In addition, RWI enables you to incorporate style customization anywhere. The syntax uses a common dot notation that programmers from multiple languages are familiar with. This seminar examines capabilities, examples, advantages, and disadvantages of this latest reporting methodology.
Robert Durie, SAS
SAS in Style: Customizing Solutions and Reports with SAS© Theme Designer
SAS® Theme Designer enables customers to create their own custom application and report themes and tailor the visual look of the applications they run and the reports they generate. Users can specify colors, fonts, and images, and the visual changes are presented within an embedded preview application or within an actual application (when editing a theme within application context). Users can customize their applications and reports to create a unique organizational theme, or they can customize their applications and reports based on an individual business requirement. This visual customization provides for the customers desire for colors, general branding, and logo integration. This paper explores the process of using SAS Theme Designer in conjunction with other applications such as SAS® Visual Analytics. This paper also highlights the process of creating and modifying application themes, and previewing application color, font, and image changes within applications such as SAS Visual Analytics.
Ronald Page, SAS
Sherry Parisi, SAS
SAS® 9.4 on Microsoft Windows: Unleashing Kerberos on Apache Hadoop
Session 1878Do you maintain a Microsoft Windows server providing your organizations SAS® 9.4 environment? Are you struggling to get Kerberos authentication out to Apache Hadoop working correctly for your SAS® Enterprise Guide® or SAS® Studio users? This paper outlines the key steps for enabling Kerberos authentication between your users, your SAS 9.4 deployment on Microsoft Windows server, and your Hadoop data sources. Learn about the configuration changes you need to make, and also learn about some effective trouble-shooting techniques to get your environment working and your users happy.
Stuart Rogers, SAS
SAS® Analytics Innovation Lab at La Trobe University: Taking Machine Learning Innovation to Industry
SAS® Analytics Innovation Lab was established at La Trobe University Business School in 2015 as part of the SAS® academic program in Australia and New Zealand. The research lab has developed new cutting-edge technology in machine learning and text analytics, with applications in healthcare, social media analytics, security, federated search, Internet of Things (IoT) stream analytics, and video captioning. The SAS Analytics Innovation Lab is part of the new Research Centre for Data Analytics and Cognition, where the key focus is to carry out academic research targeting innovative solutions to business and real-world problems. The collaboration with SAS and use of SAS tools have provided the researchers with an industry-known platform and technologies that can be used as a Trojan horse to introduce new research to industry. Several examples of how SAS technologies and new research from the SAS Analytics Innovation Lab have provided innovative practical solutions are presented.
Damminda Alahakoon, La Trobe University
SAS® Analytics Pro Running in a Docker Container in a Cloud Environment
This e-poster provides a discussion of why should you use a Docker container to run SAS® Analytics (or any other software application). Topics include a business case, a value proposition, an architectural review, and building and deploying the container. There is also a demo of using SAS Analytics in the container.
Brian Bream, Collier IT
Frank Daniels, Collier IT
SAS® and Python: The Perfect Partners in Crime
Session 2597Python is often one of the first languages that any programmer studies. In 2014, Python was named as the most popular introductory teaching language for US universities, and in 2017 it was named third in the list of popular programming languages taught within the UK. In April 2017, SAS introduced the SASPy project, a module that can be installed on top of the Jupyter Notebook to enable the Jupyter Notebook to connect to SAS® 9.4 or SAS® Viya® workspace sessions. This SAS® workspace session connectivity enables any authenticated users to utilize many of the procedures available within the SAS environment, the majority of which are also available by using the Python programming language. This enables users to have the freedom to code in a language of their choice with the view of achieving the same results. SASPy also includes a method that can convert any Python coding that is submitted into the SAS coding equivalent. This method enables SAS coders and Python coders to work side by side, in addition to aiding those Python coders in their learning of the SAS programming language.
Carrie Foreman, Amadeus Software Limited
SAS® Arrays and Macros Make Processing Claims with Multiple Conditions Easier
Session 2683The ability to translate medical claims data into actionable insights is an important step in deriving knowledge useful to make inferences about the healthcare system. A single patient can have numerous diagnosis codes per encounter. Developing methods to efficiently process this information is necessary in order to provide meaningful health statistics to providers that help them to better understand the needs of their populations with multiple conditions. SAS® array statements and the macro facility help to streamline this process in an efficient way. This paper discusses, through examples, how Base SAS® techniques such as RETAIN, FIRST, LAST, DO loops, and the LAG function can work together with SAS arrays and macros to transform a claims data file into population-level summaries.
Shavonne Standifer, Truman Medical Center
Stephanie Thompson, Datamum
SAS® Cloud Analytic Services Actions: A Holistic View
Session 1981SAS® Cloud Analytic Services (CAS) provides the analytics that power SAS® Viya®. Just as SAS® provides a wide range of capabilities through its extensive array of procedures, so CAS services are supplied by a wide (and growing) variety of action sets. And just as PROCs each define their own grammar and output tables, so CAS actions define their own programming interface in the form of input parameters and results. But the differences between conventional PROCs and CAS actions are also illuminating. Action execution occurs in parallel on the CAS cluster with reference to a server-side environment (a session, caslibs, tables). And when writing scripts or applications to invoke CAS actions, SAS and other popular industry programming languages are provided an equal footingwith each programming language client customized for its own ecosystem. This paper is a complete introduction to CAS actions and explains how actions execute; introduces the programming concepts common to all actions and to all client languages; and provides a brief tour of the programming interfaces for SAS, Java, Python, R, and REST. A complete understanding of CAS actions will be beneficial to IT as they seek to implement, manage, and operate SAS Viya. And it is essential for SAS Viya programmers, such as data scientists, who want to solve specific problems by invoking CAS analytics directly.
Mark Gass, SAS
SAS® Configuration Management with Ansible
A SAS® environment can be very complex under the covers, with many configuration options that can affect everything from performance through to security. Having automations in place that ensure that only authorized settings are in effect (for both SAS and the operating system) can lighten the load of a SAS administrator as well as that of IT. Ansible can take care of ensuring that your SAS prerequisites are met and continue to be met, while also making sure that only tested and authorized configuration changes are implemented.
Michael Dixon, Selerity
Cameron Lawson, Selerity
Michael Dixon, Selerity
SAS® Credit Scoring for Banking: An Integrated Solution from Data Capture to Insight
Session 2751The banking sector experiences increased demands related to risk assessment because of the Basel Capital Requirements. Credit modeling and scoring is an important component of estimating the capital requirement, and banks face various challenges and needs related to this modeling. SAS® Credit Scoring for Banking is an integrated solution that enables detailed analysis and improved prediction of credit risk, with these challenges and needs in mind. This session is based on experiences gained from implementing SAS Credit Scoring for Banking for a series of banks. We discuss the solution architecture in the context of challenges and needs related to credit modeling. The architecture overview highlights the SAS® software included in SAS Credit Scoring for Banking and the stages at which the different software comes into play. We consider the benefits of SAS Credit Scoring for Banking and see how the solution has the flexibility of modification to include additional data sources and existing models. The session ends with examples of model monitoring and reporting in the solution.
Ewa Nybakk, Capgemini
SAS® Enterprise Application Consolidation
SAS® applications are used in wide-ranging lines of business in enterprise organizations today. SAS architecture provides interfaces that enable their use in a range of different use cases. In the past, these applications were built on different SAS components with various user interfaces and varying underlying operating systems with different operational procedures. With the trend toward IT consolidation, SAS components and SAS applications that were developed in the last decades are in the scope of the ongoing consolidation processes around the globe. Some of these applications were developed based on technologies that would be called legacy today. Nevertheless, for businesses in which they are still in use, they play vital roles in their business processes. These applications need an application infrastructure that both fits the application and is sustainable for the future in terms of regulation and security. This paper gives you insights into the technical and governmental challenges that occur during a consolidation process. It focuses on building up a platform, how to deal with change management, and describes the different steps you have to take in order to govern application consolidation.
Jan Bigalke, Allianz Technology SE
Anne Van Den Berge, Allianz Technology SE
SAS® Environment Manager: A SAS® Viya® Administrator's Swiss Army Knife
The latest version of SAS® Viya® brings with it a wealth of new capabilities, and administrators are by no means left out of the party. The version of SAS® Environment Manager that accompanies SAS Viya 3.3 significantly ratchets up what a SAS® administrator can accomplish with this next generation HTML5-based web application. See new features in its top-level dashboard, including an at-a-glance widget that provides the health of all SAS Viya machines, services, and service instances in a hierarchical heat map. Witness the new Logged Issues widget highlighting recent errors with details to help diagnose and head-off escalating problems. Learn about all the new SAS Environment Manager menu options, including viewers for the following: Licensed Products showing expiration dates and grace periods for all deployed products Logsconsolidating logging from SAS services with advanced filtering and summarization Machinesgraphing CPU and memory consumption, and listing specific metric checks and supported services Schedulingmonitoring and editing scheduled jobs Finally, discover additional features like an advanced data explorer for importing data sources, as well as the ability to create and manage user-defined formats. In general, gain an appreciation for all the new blades SAS has added to its multi-purpose Swiss Army knife for administration. Gain an understanding of advanced features and sharpen your skills in monitoring and managing a SAS Viya 3.3 environment.
Trevor Nightingale, SAS
Michelle Ryals, SAS Institute Inc.
SAS® Fraud Framework and MCMC in Government Estimation of Improper Payments of Social Benefits
Brazils Social Security monthly budget is about US$12 billion, paying benefits to 34 million people. In Brazil, the Federal Court of Accounts is responsible for auditing these expenses. In the last few years, the Tribunal de Contas da Unio (TCU) has found several improper payments, and has acted alongside the Social Security Agency to reduce such losses. In this paper, we describe how SAS® Fraud Framework was used to estimate and monitor the evolution of the percentage of improper payments of social benefits granted. The TCU team conducted 685 interviews with experts from different areas (auditing or control). Based on the analysis of this data, they generated triangular probability distribution per respondent, applied Markov chain Monte Carlo (MCMC) simulations and fitted probability distributions (Kernel and Beta), aiming to estimate the incidence of irregularities. The reliability level of the instrument was verified by cluster analysis and by the use of retest items in the interviews. The results showed that 20% to 38% is the expected percentage of improper payments in Brazilian Social Security, according to the perception of experts regarding the incidence of all possible types of irregularities. Analyzing by type of benefit or region, the Northern Region and the benefits to the poor elderly showed the highest averages (29% and 27%). These procedures resulted in the estimation of approximately $18 million improper monthly payments.
Rodrigo Hildebrand, Tribunal de Contas da União
Aloisio Dourado Neto, Federal Court of Accounts (TCU - Brazil)
SAS® Grid Infrastructure on Amazon Web Services
SAS® Grid Computing is a shared, centrally managed analytics computing environment that features workload balancing and management, high availability, and fast processing. A SAS® grid environment helps you incrementally scale your computing infrastructure over time as the number of users and the size of data grow. It also provides rolling maintenance and upgrades without any disruption to your users. We have built a quickstart with Amazon Web Services (AWS) and SAS, which launches the infrastructure for the SAS grid. This quickstart is for IT infrastructure architects, administrators, and DevOps professionals who are planning to implement or extend their SAS workloads on the AWS Cloud. It deploys the infrastructure for implementing SAS Grid Computing and related SAS components on Amazon Elastic Compute Cloud (Amazon EC2) instances and uses security groups, a virtual private cloud (VPC), subnets, and AWS Elastic Load Balancing to provide security and availability. A SAS grid environment in the cloud provides the elasticity and agility to scale your resources as needed. The quickstart automatically builds and configures the required infrastructure for SAS Grid Computing application installation, thereby reducing the dependency on your IT team. The effort required to plan, design, and implement the infrastructure is eliminated.
Siddharthkumar Prasad, CoreCompete
SAS® Grid Manager for High Availability: Implementation Best Practices
Session 2155SAS customers appreciate SAS® Grid Manager for the many benefits it provides to their SAS® environment. Enhancing the availability of their critical services is certainly one of the most popular benefits. This paper presents the best practices for implementing highly available services and leveraging the new failover orchestration capabilities provided by the latest release of SAS Grid Manager.
Edoardo Riva, SAS
SAS® Macros for Computing Causal Mediated Effects in Two- and Three-Wave Longitudinal Models
Mediation analysis is a statistical technique for investigating the extent to which a mediating variable transmits the effect of an independent variable to a dependent variable. Because it is used in many fields, there have been rapid developments in statistical mediation. The most cutting-edge statistical mediation analysis focuses on the causal interpretation of mediated effects. Causal inference is particularly challenging in mediation analysis because of the difficulty of randomizing subjects to levels of the mediator. The focus of this paper is on updating three existing SAS® macros (%TWOWAVEMED, %TWOWAVEMONTECARLO, and %TWOWAVEPOSTPOWER, presented at SAS® Global Forum 2017) in two significant ways. First, the macros are updated to incorporate new cutting-edge methods for estimating longitudinal mediated effects from the Potential Outcomes Framework for causal inference. The two new methods are inverse-propensity weighting, an application of propensity scores, and sequential G-estimation. The causal inference methods are revolutionary because they frame the estimation of mediated effects in terms of differences in potential outcomes, which align more naturally with how researchers think about causal inference. Second, the macros are updated to estimate mediated effects across three waves of data. The combination of these new causal inference methods and three waves of data enables researchers to test how causal mediated effects develop and maintain over time.
David Mackinnon, Arizona State University
Matthew Valente, Arizona State University
SAS® Middle-Tier Performance Optimization
SAS® middle-tier performance needs to be optimal to provide users with the best possible experience as well as to support growth in number of users. Optimal performance requires tuning. However, due to differences in the SAS web applications that are deployed as well as differences in the mix of applications in use, the tuning that is optimal for one deployment might not be optimal for another deployment. This session covers the tools that you can use to understand what's going on in a SAS middle tier, as well as the techniques you can use to tune the SAS middle tier once you understand what's going on.
Glenn Horton, SAS
SAS® ODS EXCEL Destination: Using the START_AT Suboption to Place Your Data Where You Want It
Session 2582The SAS® ODS Excel destination statement option OPTIONS has many suboptions. One of those suboptions is START_AT, which enables you to place your output anywhere on the open output Microsoft Excel worksheet. I will show you how to put your starting output data element into any row and column in the open worksheet.
William Benjamin Jr, Owl Computer Consultancy LLC
SAS® Studio: A New Way to Program in SAS®
SAS® Studio is an important new interface for SAS®, designed for both traditional SAS programmers and for point-and-click users. For SAS programmers, SAS Studio offers many useful features not found in the traditional Display Manager, including integrated syntax help, one-touch code formatting, and the ability to drag-and-drop variable and data set names into your code. SAS Studio runs in a web browser. You write programs in SAS Studio, submit the programs to a SAS server, and the results are returned to your SAS Studio session. SAS Studio is included in the license for Base SAS®, is the interface for SAS® University Edition, and is the default interface for SAS® OnDemand for Academics. Both SAS University Edition and SAS OnDemand for Academics are free-of-charge for non-commercial use. With SAS Studio becoming so widely available, this is a good time to learn about it.
SAS® Visual Analytics 8.2: What's New in Reporting?
SAS® Visual Analytics 8.2 extends the unified SAS® user experience to data exploration and reporting, adds new features to report content, and introduces exciting new report objects. The user experience design presents a familiar and consistent user interface for navigation at the report, page, and object levels. New features enrich report content such as improvements in handling data, enhancements in graphs and tables, and more geographic capabilities. New report objects in the release include the Key Value object to provide infographic-like treatment, the parallel coordinates plot for more analytical visualization, and Data-Driven Content to manipulate externally created content within the point-and-click interface of the report.
Rajiv Ramarajan, SAS
Riley Benson, SAS
SAS® Visual Analytics: Text Analytics Using Word Clouds
Session 1687There is a limit to employee skills and capacity to efficiently analyze volumes of textual unstructured data in a manner that provides actionable business insight. SAS® provides several tools that provide the capacity to explore text: SAS® Text Miner, SAS® Sentiment Analysis, and SAS® Visual Analytics. SAS® Visual Analytics word clouds expand analytical capacity by making text analytics and sentiment analysis easier to use, thereby putting these powerful tools in the hands of a much broader audience. Using a point-and-click interface, SAS® Visual Analytics word clouds can explore larger volumes of unstructured data, identify patterns and create usable reports to help define policies, and take actions that improve business operations.
Jenine Milum, Citi
SAS® Visual Forecasting: A Cloud-Based Time Series Analysis and Forecasting System
Session 2154SAS® Visual Forecasting, based on SAS® Viya®, is the next generation SAS® product for forecasting. It provides a new resilient, distributed, scripting environment for cloud computing that provides time series analysis, automatic forecast model generation, automatic variable and event selection, and automatic model selection. SAS Visual Forecasting features a new graphical interface that is centered on the use of pipelines, a new microservices-based architecture, and a new fast, scalable, and elastic in-memory server environment based on SAS® Cloud Analytic Services (CAS). It provides end-to-end capabilities to explore and prepare data, apply various modeling strategies, compare forecasts, override statistical forecasts, and visualize results. The workflow framework for model generation and forecasting is shared with SAS® Visual Data Mining and Machine Learning and SAS® Visual Text Analytics. Forecast analysts and data scientists can also access the power of SAS Visual Forecasting though a flexible and powerful programming environment.
Jerzy Brzezicki, SAS
Joe Katz, SAS
SAS® Visual Investigator and SAS® Visual Analytics: Bridging the Gap between Subject Matter Experts
Session 2107Health care payers constantly work in the world of big data. Because of the insights that big data can provide, payers are increasingly required to use it to improve outcomes for their patients, lower costs, and prevent fraud, waste, and abuse (FWA) in their networks. This enormous task requires contributors from many areas of the organization, such as strategically minded executives, clinicians, auditors, and data scientists. Together, SAS® Visual Investigator and SAS® Visual Analytics provide a toolset that enables users to bridge the gaps between the individuals in these different roles. This paper examines how advanced analytics, coupled with visualizations, make complex analytics readily available to subject matter experts across the health care industry.
Emily Chapman-Mcquiston, SAS
SAS® Viya® Service Layer Architecture Overview
Session 2272The SAS® service layer has been completely re-architected from the ground up in SAS® Viya®. The monolithic web application server used in SAS®9 has been replaced by modular microservices, which act as building blocks to help form a loosely coupled, scalable, and secure platform. This presentation focuses on the various technologies used within SAS Viya, from service registration to an event delivery architecture model that enables services to communicate with one another. Learn about a few of the key microservices included within SAS Viya and the benefits they provide.
Eric Bourn, SAS
SAS® Viya®: Architect for High Availability Now and Users Will Thank You Later
You have SAS® Viya® installed and running for your business. Everyone loves it so much, they are using it more than you initially anticipated. Fortunately, you had the forethought to architect for high availability. Now the process of adding additional nodes is relatively simple, and you don't have to lose sleep as you keep your growing user community happy. This paper guides you through the process of creating high availability for the SAS Viya services, microservices, and web applications that support it.
Jerry Read, SAS
SAS® Viya®: The Beauty of REST in Action
The introduction of SAS® Viya® opened SAS® and its industry-leading machine learning algorithms to the open-source community. With the ability to connect open-source interfaces such as Lua, Python, and R, as well as REST APIs to SAS® Cloud Analytic Services (CAS), this new engine becomes a powerful tool to include in any analytical toolbox. Using CAS actions, users can easily surface data and analytical model results to a WebApp through REST API connectivity. This capability is one key aspect that SAS Viya offers to provide real-time analytics. REST APIs provide a way to access SAS Viya over HTTP, and CAS capabilities are more efficient in communicating between front-end and back-end applications. Zencos will showcase leveraging the capabilities of the CAS Server and connecting to REST APIs to surface data for real-time decision making using a case study.
Sean Ankenbruck, Zencos
Grace Heyne Lybrand, Zencos
SAS/STAT® 14.3 Round-Up: Modern Methods for the Modern Statistician
Session 1844The latest release of SAS/STAT® software has something for everyone. The new CAUSALMED procedure performs causal mediation analysis for observational data, enabling you to obtain unbiased estimates of the direct causal effect. You can now fit the compartment models of pharmacokinetic analysis with the NLMIXED and MCMC procedures. In addition, variance estimation by the bootstrap method is available in the survey data analysis procedures, and the PHREG procedure provides cause-specific proportional hazards analysis for competing-risks data. Several other procedures have been enhanced as well. Learn about the latest methods available in SAS/STAT software that can modernize your statistical practice.
Maura Stokes, SAS
SASLint: A SAS® Program Checker
Session 2543Linters are programs that carry out static source code analysis, detecting certain bugs, coding rule deviations, and other issues. Linters also detect redundant or error-prone constructs that are nevertheless, strictly speaking, legal. Potential performance optimizations can also be checked by linters. This paper covers creating a modular linter for the SAS® language consisting of a parser module, an analysis module that includes a list of rules, and a reporting module that displays issues found. It is possible to include and exclude rules, as well as develop your own rules, all of which makes the linter very flexible for any team with its own list of requirements regarding the source code and programming standards. The parser for SAS language grammar is based on the ANTLR Java parser. The tool is written in Java and SAS, which is why it can be integrated into any SAS environment.
Igor Khorlo, Syneos Health
Scalable Cloud-Based Time Series Analysis and Forecasting
Session 2027Many organizations need to process large numbers of time series for analysis, decomposition, forecasting, monitoring, and data mining. The TSMODEL procedure provides a resilient, distributed, optimized generic time series analysis scripting environment for cloud computing. It comes equipped with capabilities such as automatic forecast model generation, automatic variable and event selection, and automatic model selection. It also provides advanced support for time series analysis (in the time domain or in the frequency domain), time series decomposition, time series modeling, signal analysis and anomaly detection (for IoT), and temporal data mining. This paper describes the scripting language that supports cloud-based time series analysis. Examples that use SAS® Visual Forecasting software demonstrate the use of this scripting language.
Michael Leonard, SAS
Thiago Quirino, SAS
Seeing the Forest for the Trees: Part Deux of Defensive Coding by Example
As statisticians and programmers, SAS® is part of our daily life. Through assessing patterns, data quality, programming data sets, analysis displays, or developing simulations, we need to determine the best ways to conduct our daily work, enabling us to see the forest for the trees. This paper provides guidance on quality defensive programming, efficient coding, as well as good programming concepts. Programming no nos are also discussed. The concepts discussed will enable us to navigate through the trees; that is, seeing the trees for the forest. We might have been programming in SAS for weeks, months, years, or decades. Regardless, we should continue to expand our skills and continue learning and updating our techniques. With this paper, we provide reminders for paths lost in the past, as well as new tips to help us clear the brush from the trail. This paper is part deux of Defensive Coding by Example (2015), quenching our thirst for adventure in the great SAS hinterland.
Nancy Brucken, Syneos Health
Donna Levy, Syneos Health
Sentiment Analysis of Netflix and Competitor Tweets to Classify Customer Opinions
Session 2708With more than 310 million users worldwide, Twitter is an important source for data generation for social media analytics. Each day, Netflix and its competitors within the video on-demand space have thousands of tweets where users share their opinions. By analyzing the content of the tweets, companies can learn more about their customers and their likes and dislikes. It can help make business profitable by tracking how well social media campaigns are paying off and how many leads turn prospects into customers. Analyzing and making sense of this vast amount of unstructured data will help Netflix make better-informed decisions to maintain their competitive edge. In this study, we analyze tweets for Netflix and its competitors such as Amazon Prime Video, Hulu, and HBO Now. We demonstrate the use of multiple SAS® tools to analyze large numbers of tweets and generate quick summaries, identify different categories of tweets, and classify reviews. Over 32,000 tweets were captured over five days, and SAS® Enterprise Miner(tm) was used to identify commonly used terms and to categorize similar tweets into groups. SAS Enterprise Miner was used to analyze customer sentiment by classifying reviews into positive, neutral, and negative, based on its content. The sentiment analysis feature in SAS® Visual Analytics gives a quick overview through word clouds for each text topic, helping us to understand customer opinions and use the information to improve the business.
Rucha Jadhavar, Oklahoma State University
Agastya Komarraju, Sam's Club
Sentiment Analysis on YouTube Movie Trailer Comments to Determine the Impact on Box-Office Earnings
Session 2719The video-sharing website YouTube encourages interaction between its users via the provision of a user comments facility. This was originally envisaged as a way for viewers to provide their feedback about the videos, but now it is used for other communicative purposes like sharing ideas, paying tributes, social networking, and answering queries. This study seeks to examine and categorize the types of comments made by YouTube users on popular Hollywood movie trailers to understand how the sentiments of these users can impact first-day revenue. This examination can also show the trend of box office earnings based on the sentiments after the movie is released for the running week. We used the SAS® Enterprise Miner(tm) rule builder model, which gave us an accuracy of 88%, and scored the YouTube data to get sentiments. Also, SAS® Text Miner was applied to generate topic clusters and concept links for the comments to give us an idea of what the users liked or disliked in the movie. The results can help distributers and movie makers to determine the response rate for the movie by understanding the comments on the trailers. Once the movie is released, the-next day earnings can be predicted by looking at the present-day sentiments.
Rishanki Jain, Oklahoma State University
Seven Agile Methods that Help Deliver Visualizations Agilely (and without Resorting to Being AdHoc!)
Using Agile methods to deliver applications is a commonplace approach these days. But when you try to apply Agile techniques to delivering data, analytics, and visualizations, a whole set of new challenges arise that affect whether you are able to deliver a production-ready solution every 24 weeks. This session takes you through seven repeatable Agile techniques that deliver completed visualizations every three weeks. Shane covers the WHY, the HOW, and the WHAT for each of these steps. At the end of the session, you will have a set of artifacts that you can start to leverage in your next project. These steps have been discovered, defined, and refined by Shane over the last four years, based on a number of AgileBI customer projects in New Zealand (the land of hobbits and kiwifruit). The seven Agile methods covered in this session are: Defining information products to set the scope Modelstorming business events to identify the data requirements Applying Agile data modeling techniques to structure data quickly Wireframing to gather visualization requirements Using Acceptance Test-Driven Development to ensure that you build it right Delivering three visualization iterations in three weeks Developing the T-shaped skills required to build an AgileBI team
Shane Gibson, OptimalBI
Shorter Waiting Time, Better Emergency Healthcare: Forecasting Stockholms' Emergency Room Visits
Stockholm County Council (SCC) was into nudging long before it won professor Richard Thaler a Nobel Prize. SCC needed to find out if it was possible to nudge its population to go to the emergency department (ED) at a less variable rate, as this would improve the SCCs resource planning, shorten ED queues exponentially, and improve health outcomes at a lower cost for the entire population. Paradoxically, the demand for emergency healthcare does not vary randomly over time even though accidents and diseases (to a high extent) do. This is an effect of individual and social traits of the potential ED patientsthat is, the population of a region. In the EDs, days after holidays or Mondays are usually busy, while Christmas Eve and Friday lunch are not. While this pattern is stable throughout weeks, months, and years, the extent of busyness and calmness is often not known. If the public received information about estimated patient load at the EDs across the county, they would have the opportunity of attending an ED with lower load, which naturally is in their interest. By using SAS®, we use an ensemble of a wide array of models to forecast expected ED visits for the next 72 hours in several hospitals in Stockholm, Sweden. The models are analyzed, evaluated, and compared. The forecasts are useful for resource allocation in the EDs and to nudge future ED visitors to go to the right ED at the right time.
Martin Nordberg, Stockholm County Council / Södersjukhuset Hospital
Oskar Eriksson, SAS
Shredding Your Data with the New DS2 RegEx Packages
Session 2249DS2s latest packages, PCRXFIND and PCRXREPLACE, wrap the functionality of previous regular expression functions into sleek new packages. These just-in-time compiled regular expressions (or RegEx) can be used in multi-threaded environments to maximize throughput, while the object-oriented APIs simplify your connection to the powerful world of RegEx. Practical examples showcase using the new packages to execute RegEx on large data sets and include tips and techniques to get the most out of RegEx in distributed environments. We take a look at analyzing and filtering data sets using the new packages, as well as using the fundamentals of text analytics to make processing large jobs faster. To top things off, we do the work with whichever smiling emoji best suits you, since the new packages are ready to handle all of your social media and international text-handling needs.
Will Eason, SAS
Simple Methods for Repeatability and Comparability: Bland-Altman Plots, Bias, and Measurement Error
Session 1815While a Pearson correlation coefficient can be a quick and easy measure to compare two measurement methods or examine repeatability, it is not the most appropriate nor does it give you insight into bias. Performing linear regression or tests for differences between the means is also not the best approach for determining whether two methods are comparable or a measurement is repeatable. Examining the difference between the measurements might not offer insight into the accuracy of the methods, and a Pearson correlation coefficient is not a measure of agreement but a measure of association. Altman and Bland (1983) suggested a graphical method and two statistical tests to examine repeatability of measurement or whether two measurement methods produced similar results. The graphical method, called a Bland-Altman plot, is a plot of the difference versus the average of two different measures with y-reference lines at two standard deviations (SD) or three SD limits of the difference. A Bland-Altman plot allows for assessment of the magnitude of disagreement, both error and bias. The statistical tests suggested by Altman and Bland are a test for zero bias and a test of independence of the bias (difference between the methods) and magnitude (average of the methods) of the measure. A systolic blood pressure example is used to show how to perform the statistical tests and create the Bland-Altman plot using SAS/STAT® PROC TTEST, Base SAS® PROC CORR, and ODS Statistical Graphics SGPLOT.
Maribeth Johnson, Augusta University
Jennifer Waller, Augusta University
So Many Date Formats: Which Should You Use?
Nearly all data is associated with some date information. In many industries, a SAS® programmer comes across various date formats in data that can be of numeric or character type. An industry like pharmaceuticals requires that dates be shown in ISO 8601 formats due to industry standards. Many SAS programmers commonly use a limited number of date formats (like YYMMDD10. or DATE9.) with workarounds using scans or substring SAS functions to process date-related data. Another challenge for a programmer can be the source data, which can include either date/time or only date information. How should a programmer convert these dates to the target format? There are many existing date formats (like E8601 series, B8601 series, and so on) and informants available to convert the source date to the required date efficiently. In addition, there are some useful functions that we can explore for deriving timing-related variables. For these methods to be effective, one can use simple tricks to remember those formats. This poster is targeted to those who have a basic understanding of SAS dates.
Kamleshkumar Patel, Rang Technologies
Jigar Patel, Rang Technologies
Dilip Patel, Rang Technologies
Vaishali Patel, Rang Technologies
Some Tricks When Plotting Graphic Images Using PROC TEMPLATE in SAS® Enterprise Guide®, Part III
Session 2545Without affecting the wholeness of Part I and Part II, I split the part of tricks out into Part III to present and explain issues that might confuse you when you use the information from the previous parts in your SAS® code. These tricks include the options in the LAYOUT OVERLAY and LAYOUT LATTICE statements in the TEMPLATE procedures and the Y-axis tick values calculator. Without a clear explanation of them, developers might repeat the same mistakes as they are solving similar issues.
Kaiqing Fan, Mastech Digital Inc.
Square Peg, Square Hole — Getting Tables to Fit on Slides in the ODS Destination for PowerPoint
An output table is a square. A slide in Microsoft PowerPoint is a square. The table, being the smaller square, should fit in the bigger square slide. Right? Well, not always. Despite the programmers expectations, some tables will not fit on the slide created by the ODS destination for PowerPoint. It depends on the table. For instance, tables with, say, more than 10 rows or more than 6 columns might end up spanning multiple slides. But, just as with the popular children's toy, by twisting, turning, or approaching the hole from a different angle, you can get the peg in the hole. This paper discusses three programming strategies for getting your tables to fit on slides: changing style attributes to decrease the amount of space needed for the table, strategically dividing one table into multiple tables, and using ODS output data sets for greater control over the structure of the tables. Throughout this paper, you will see examples that demonstrate how to apply these strategies using the popular procedures TABULATE, REPORT, FREQ, and GLM.
Jane Eslinger, SAS
Streaming ETL of High-Velocity Big Data Using SAS® Event Stream Processing and SAS® Viya®
Session 1679A typical ETL happens once in a day. How can you handle those use cases when ETL needs to happen in minutes? This paper presents a design using SAS® Viya® and SAS® Event Stream Processing to fetch large volumes of data from a data source, process it in the flow itself, and load it into Hadoop files (sashdat format) at a frequency that can be as high as every 15 minutes. The data source used here is Amazon Elastic Compute Cloud (Amazon EC2) performance data fetched hourly and Amazon EC2 metadata fetched every 15 minutes. The amount of data generated every hour by a medium to large Amazon Web Services (AWS) consumer environment can push the traditional ETL process to its limit. In this design, data is fetched by a distributed data collector running on SAS Viya microservices. The streaming data is transformed using SAS Event Stream Processing and is written to SAS® Cloud Analytic Services (CAS) tables temporarily using the SAS Event Stream Processing CAS adapter. These tables are then integrated with existing Hadoop files using CAS actions. The design handles scaling and failover safety so that the data collection can be a long-running continuous process. The architecture is generic enough not only to collect the instance data of other cloud vendors but also data from any other similar distributed source.
Joydeep Bhattacharya, SAS
Manish Jhunjhunwala, SAS
Supercharge Your Dashboards with Infographic Concepts Using SAS® Visual Analytics
Session 2069A humans attention span is shorter than that of a gold fishabout eight seconds is all you have to capture their attention and create a reason for a viewer to stay on your dashboard. Therefore, a dashboards visual appeal is even more important today than ever before, and this is where infographic concepts make a difference. Infographics deliver information with clarity and simplicity. Data is everywhere, and more report designers are using infographic elements to better communicate insight from the data. The boardroom can now benefit from what has become mainstream on popular news sites and social networks online. This paper shows you how to create infographic-inspired dashboards and reports that can be shared and dynamically explored by your teams using SAS® Visual Analytics on SAS® Viya®. Supercharge your existing dashboards and reports with easy drag-and-drop wizards, while still providing the performance, repeatability, and scalability on massive data that your enterprise demands. This session looks at how the latest enhancements in SAS Visual Analytics enable users to design and create infographic-style dashboards and reports like never before. You learn tips and techniques to get the most from your SAS Visual Analytics software that you can apply back at the office. You will leave this session with the perfect balance of creative ideas and practical examples to better engage your entire organization with high-impact data visualizations.
Travis Murphy, SAS
Falko Schulz, SAS
Supercharging Data Subsets in SAS® Viya® 3.3
Session 1680SAS® Viya® 3.3 provides a powerful new table indexing capability that has the potential to significantly improve performance for data handling and analytics actions submitted to SAS® Cloud Analytic Services (CAS). Effective use of table indexes can simplify big data scenarios, improve usability, and reduce computing resource requirements. Programming examples using the CAS procedure are presented using SAS Viya capabilities integrated into the latest release of SAS®9. This information is beneficial to SAS programmers, data scientists, data engineers, and others involved in configuring and deploying big data applications within SAS®9 and SAS Viya. The following examples are included: understanding CAS table indexes determining which table variables to index iIndexing while loading data into CAS indexing variables in CAS action output tables using the new CAS Index action data subset operations using indexes table size and computing resource considerations performance comparisons
Brian Bowman, SAS
Swimming Lessons for the Data Lake: Becoming a Marketing Analyst for the Next Ten Years!
It isn't just hurricanes and global warming that are causing rising waters. Data is flooding into our businesses, and analysts are struggling to stay afloat. For many companies, the Data Lake is a reservoir to contain this new influx; it offers an untamed location for new data to flow. Many analysts today are standing on the edge of a lake...unsure of whether to dive in or to stay safe on the sides, continuing to work in the structured data sets they know so well. They know the lake might contain new discoveries, but they are not sure their swimming skills are sufficient. For each of the past two years, Emma Warrillow of Data Insight Group Inc. has provided SAS® Global Forum attendees with entertaining advice about becoming a better analyst. This session continues that theme and looks to the future for analysts. What skills do you need now and in the future? While the focus is on the marketing analyst, all analysts will benefit from the practical down-to-earth advice Emma shares from her nearly 30 years of experience in data analytics. She also taps into the thoughts of industry leaders and many of her peers to help predict the future and how to best prepare for it.
Emma Warrillow, Data Insight Group Inc. (DiG)
Table Look-Up Techniques: Is the FORMAT Procedure the Best Tool for the Job?
SAS® programmers have used user-written formats via the FORMAT procedure to perform table look-ups for as long as PROC FORMAT has been available. There is no question that it is a viable techniquebut is it the best way to attack the problem? This paper and associated presentation looks at how PROC FORMAT can be used to facilitate table look-ups. The computer resources necessary to execute this approach are examined and contrasted with alternate approaches such as the DATA step MERGE statement and SQL JOIN.
Andrew Kuligowski, HSN
Take a Dive into HTML5
The SAS® Output Delivery System (ODS) provides several options to save and display SAS graphs and tables. You are likely familiar with some of these ODS statements already, such as ODS RTF for rich text format, ODS PDF for portable document format, and ODS HTML for hypertext markup language. New to SAS® 9.4, users have several additional ODS statements to choose from; among them is ODS HTML5. The ODS HTML5 statement differs from ODS HTML in that it uses HTML version 5 instead of HTML version 4. As such, the user can now take advantage of several new features introduced to the language. ODS HTML5 also uses a different default graphics output device, trading out the old standby portable network graphics (PNG) for the up-and-coming scalable vector graphics format (SVG). Throughout the paper, we focus on some of the differences between ODS HTML and ODS HTML5, with a focus on computer graphics. The goal is to convince you to switch to ODS HTML5 and start using SVG as your graphical output of choice.
Brandon George, Spectrum Health
Nathan Bernicchi, Spectrum Health
Paul Egeler, Spectrum Health
Taming Change: Bulk Upgrading SAS® 9.4 Environments to a New Maintenance Release
Session 1825Upgrading a SAS® 9.4 installation to a newer maintenance release is often a small project on its own. Mass upgrading hundreds of servers across several dozens of environments with demanding users and within narrow maintenance windows requires a streamlined process and a high level of automation. This paper discuses upgrade strategies, outlines a bulk upgrade process, and shares experiences from the field. It is also a logical continuation of the SAS® Global Forum 2017 paper 814-2017, Platform la Carte: An Assembly Line to Create SAS® Enterprise BI Server Instances with Ansible, in which we discussed bulk installations of SAS 9.4.
Javor Evstatiev, EVS+C
Test-Driven Data Science: Writing Unit Tests for SASPy Python Data Processes
Code needs tested == True. Weve all been there; tweak this parameter, comment another line outbut how do we maintain the expected function of our programs as they change? Unit tests! SASPy is a Python interface module to SAS®. It enables developers and data scientists alike to leverage SAS and Python by using both APIs to build robust systems. Oh yeah, we can submit SAS code too! This talk walks through a test-driven development (TDD) approach to leveraging a data pipeline powered by SAS with Python. The pytest framework and unittest library are covered.
Stephen Siegert, SAS
Text Analysis Accuracy and Ease in SAS® Text Miner versus the Python NLTK Sentiment Analysis Package
Session 2519With machine learning conquering many facets of data analysis, jobs that used to be time consuming are now streamlined into simple tasks. One aspect is text analytics. With many companies receiving thousands of open-response complaints daily, text analytics helps companies know exactly what their customers need and how to best address those needs without spending hours reading through each individual response. In higher education, institutions are constantly collecting data about their students experiences at the university. Much of this data is in the form of free-form text responses. With text analytics, institutions are able to cut down the time spent analyzing this data by more than half and still have the same accuracy as they would if they analyzed it by reading and coding it manually. In this paper, we discuss the process of using text analytics to code through and analyze open-response survey data in both SAS® Text Miner and the Natural Language Toolkit (NLTK) in Python, and compares the two methods in regards to accuracy and user friendliness. We also discuss the other applications and benefits of using text analytics at institutions that need to access large amounts of information that is stored in the form of qualitative data both effectively and efficiently.
Jacob Braswell, BYU Institutional Planning and Assessment
Text Analysis and Cluster Analysis of Airplane Crashes from 1908 to 2009
Session 2789Generally speaking, flying on commercial airlines is considered very safe. But there is some risk in flying. Although instances of plane crashes are rare, they lead mostly to fatalities. According to The Telegraph (a United Kingdom newspaper), the odds of death per total number of passengers flown is 1 in 6 million. Although the year 2015 is considered as the safest year in the aviation history, there were 16 fatal crashes, leading to the deaths of 560 passengers. The year 2016 resulted in 19 fatal crashes, leading to the deaths of 325 passengers. Even though the aviation industry adopted many precautions to minimize such tragedies, incidents of crashes continue to happen. This paper clusters fatalities into several segments based on analyzing a text summary for each crash. The text summary is released by the government after a crash is reported. Finding the major reason associated with these casualties based on this text summary is the primary objective of this paper. I identified the fatalities segmented by the phase of flight such as Take Off (16%), Initial Climb (14%), Climb (13%), Cruise (16%), Initial Approach (12%), Final Approach (13%), and Landing (12%). I also segmented the cause of fatal airplane crashes into clusters such as total pilot error (53%), other human error (6%), weather (12%), mechanical failures (20%), sabotage (9%) and other causes (1%). An open data set from Kaggle containing 5,268 airplane crashes with fatalities of 105,000 was used.
Ritesh Kumar Vangapalli, Oklahoma State University
Text Analytics Lessons Learned: When MAUDE Doesn't Talk to You
Data scientists are trained in analyzing data, specifically unstructured text data. Often, they need to mine insights from unstructured text data in unfamiliar industries. This paper articulates the lessons we learned as we analyzed data and successfully partnered with subject matter experts to inform and guide our dive into the medical device industry. Using SAS® Viya®, we text mined the narratives submitted with medical device failure reports to the US Food and Drug Administration (FDA). These reports are available in the Manufacturer and User Facility Device Experience (MAUDE) database. As you embark on your next text analytics project, use our lessons learn to strengthen your partnerships and insights.
Grace Heyne Lybrand, Zencos
Reid Baughman, Zencos Consulting, LLC
Ingrid Lundberg, Boston Scientific
Michael Swanson, Boston Scientific
The Anatomy of Clinical Trials Data: A Beginner's Guide
An audience that wants to learn more about clinical trials will appreciate this comprehensive introduction to the role of a SAS® programmer in Clinical Trials Phases I-III. The presentation covers the role of Clinical Data Interchange Standards Consortium (CDISC) and the implications for capturing and reporting clinical trials data. Those who have just begun their careers in the pharmaceutical industry will also benefit from attending this presentation. If you have been working in the pharma industry for some time, this will be a refresher and could lead you to discover hidden gems. The presentation begins with an introduction to human clinical trials. A short history of the evolution of standards in clinical trials is provided. The construction of data sets based on current CDISC data standards are the next step. The talk is a microcosm of a clinical trial study. It covers study protocol, electronic case report forms (eCRFs, used to capture data), and statistical analysis plans (SAPs), but focuses more on the Study Data Tabulation Model (SDTM) and Analysis Data Model (ADaM). The result is an appreciation of how such standardization leads to a more streamlined production of study-related tables, listings, and graphs.
Venky Chakravarthy, BioPharma Data Services
The Art of Accurate Reports (with Examples from SAS® Enterprise Guide®)
Session 1799Many times, we find ourselves with an overwhelming amount of data at our fingertips. This paper can aid the user in creating accurate reports in SAS® Enterprise Guide® in a simple top-down approach using four steps; Envisioning, Planning, Creating, and finally, Testing. For this paper, we work with two data sets from the Florida Fish and Wildlife Commission.
Victoria Garcia, Florida Fish and Wildlife
The Art of Defensive Programming
Session 1791This paper discusses how you cope with the following data scenario: The input data set is defined in so far as the variable names and lengths are fixed, but the content of each variable is in some way uncertain. How do you write a SAS® program that can cope appropriately with data uncertainty?
Philip Holland, Holland Numerics
The Baker Street Irregulars Investigate: Discoveries Using Perl Regular Expressions and SAS®
The Baker Street Irregulars were a rag-tag group of street urchins who would gather intelligence from the noisy streets of London to help Sherlock Holmes in his quests. In this workshop, although we do not have the expert guidance of Holmes or the help of Wiggins, we work through examples of using SAS® DATA step Perl regular expressions (commonly called PRX functions) to see how we can gather intelligence from otherwise noisy text data. Attendees should have a good basic understanding of the DATA step; no other specific experience is needed.
Lisa Mendez, IQVIA Government Solutions
Peter Eberhardt, Fernwood Consulting Group Inc.
The Cox Hazard Model for Claims Data
Session 2445The central piece of claim management is claims modeling. Two strategies are commonly used by insurers to analyze claims: the two-part approach that decomposes claims cost into frequency and severity components, and the pure premium approach that uses the Tweedie distribution. In this paper, we evaluate an additional approach to claims analysistime-event modeling. We provide a general framework to look into the process of modeling of claims using the Cox hazard model. This model is a standard tool in survival analysis for studying the dependence of a hazard rate on covariates and time. Although the Cox hazard model is very popular in statistics, in practice, data to be analyzed often fails to hold assumptions underlying this model. This paper proposes an approach to overcome assumptions violations. This paper is also a case study intended to indicate a possible application of Cox hazard model to workers compensation insurance, particularly occurrence of claims (disregarding claims size).
Tanya Kolosova, InProfix Inc
The DO's of Fantasy Football
Session 1874Should I change my starting lineup? This is the dreaded question all 40 million fantasy football participants have asked at one time or another. In particular, a players injury report can make or break a fantasy football participants lineup for the week. The unlikely combination of fantasy football with DO loops can assist in alleviating this concern. Often overlooked due to their simplicity, DO loops are a vital key for your SAS® programming. When used correctly, they are a main driver for data or code productivity. In this paper, we discuss the use of DO loops with SAS features such as macro variables and text notifications. Each of these features plays a role in fulfilling your business needs at a faster and more efficient rate.
Alexandria Mccall, SAS
The Ensemble of Neural Network and Gradient Boosting for the Prediction of Customer Profitability
Session 2618This paper illustrates a two-stage approach for predicting customer profitability. The first stage is to build a dichotomous model to predict the customers likelihood of future purchase. The second stage is to build a model, with continuous target variable, to predict the conditional future profit generated by the customer given that he would make a purchase. Both stages involve the use of the gradient boosting and neural network data mining techniques. In each stage, various ensemble combinations are tried and the one resulting in the lowest validation average squared error is chosen to be the stage model winner. The two model winners are subsequently used jointly for the prediction of future profit. In this analysis, Base SAS® is used for data manipulation and SAS® Enterprise Miner(tm) 13.2 is used for predictive modeling. It is evident that this two-stage modeling approach is robust in predicting customer profitability. Managerial implications are highlighted.
Sunny Lam, ANN Inc.
The Function Selection Procedure
Session 2390The function selection procedure (FSP) finds a very good transformation of a continuous predictor to use in binary logistic regression. FSP was developed in the 1990s for applications in biostatistics. The methodology is fully presented in the book by Royston and Sauerbrei (2008). In connection with their book, Royston and Sauerbrei provided a SAS® macro to implement FSP. This SAS macro has many advanced features but is designed for the analysis of one variable at a time. A more efficient approach is needed for large-scale applications in marketing and credit risk. This paper presents an alternative macro, %FSP_8LR, which efficiently processes multiple predictor variables (for example, 50) with minimal passing of the data. Additionally, the methodology of FSP is extended in %FSP_8LR to the cumulative logit model. This paper includes a simulation study that explores and amplifies the significance testing component of FSP, which was developed in the 1990s for the binary logistic regression case.
Bruce Lund, Magnify Analytic Solutions
The Game Is Played Away from the Ball
This presentation is designed to enhance awareness of the principles and techniques of establishing and maintaining excellent service quality in the technology domain. The discussion takes a unique approach, using basketball terminology, to motivate and remind employees and managers in the technology area about the importance of taking care of the little things of service.
Lee Manzer, Oklahoma State University
The Hunt for Gravitational Waves and How to Visualize What No Person has Seen Before
In 1915, Dr. Albert Einstein published his general theory of relativity, which predicted that cataclysmic events create ripples in the fabric of spacetime, called gravitational waves. One hundred years later, on September 14, 2015, scientists of the Laser Interferometer Gravitational-Wave Observatory observed gravitational waves for the first time, confirming a major prediction of Einsteins general theory of relativity. The signal was named GW150914. The first part of this talk focuses on the experience of a computer scientist surrounded by astrophysicists during the time of the detection and announcement of gravitational waves. The talk focuses on the detectors at the Laser Interferometer Gravitational-Wave Observatory, an engineering marvel, and the activities following the detection until the press conference February 11, 2016. The second part is about the visualization of scientific data. The visualization of scientific data can help to analyze and explore the data in ways that cannot be achieved with analytical methods. The talk explores the design of visualization systems and how to visualize data for the expert and the general audience.
Hans-Peter Bischof, Rochester Institute of Technology
The Impact of Positive School Experiences and School SES on Depressive Symptoms in Chinese Children
Session 2551This paper investigates the effects of teacher support, school connectedness, and school socioeconomic status (SES) on youth depressive symptoms. Data was collected from a sample of 881 students in Grade 6 from 10 primary schools in Northwest China. Hierarchical linear modeling indicated that higher levels of teacher support, school connectedness, and school SES were significantly associated with fewer depressive symptoms. Further, the relationships between school-level SES and youth depressive symptoms varied by the participants perceived level of teacher support and perceived level of school connectedness. These findings underscore the importance of positive school experiences on child psychological outcomes. Implications for future research on Chinese youth are discussed.
Yang Yue, University of South Carolina
The Ins and Outs of Internal and External Host Names with SAS® Grid Manager
Session 1775It is common to have network topologies introduce an external host name known only outside the firewall while having a different host name that is used only internally or behind the firewall. Often, third-party network products are added after the deployment of SAS® that require internal and external URLs, such as load balancers and firewalls. The SAS web applications and SAS® Environment Manager components that are needed for SAS grid administration require additional configuration when different internal and external host names are used. It is important to understand the components of SAS grid monitoring and management in SAS Environment Manager and where you should use each host name. This paper clarifies the configuration for successful communication between the components for a SAS grid in SAS Environment Manager and external URLs.
Paula Kavanagh, SAS
The Life Expectancy of Phone Numbers in Escort Ads
Most people would be surprised by the extent to which the oldest profession depends on the newest technology. Websites that sell classified ads for escort services enable human traffickers to reach customers while shielding them from law enforcement. The phone numbers that connect potential customers to the victims of human trafficking are an important clue for finding ties to the traffickers and pimps who profit from these activities. Our paper describes how we used SAS® to take the first steps toward using phone numbers to uncover the traffickers. For example, we examined the length of time that phone numbers appeared in ads, whether one phone number was used in different locations at the same time or in different locations at different times, and whether ad categories are associated with the amount of time a phone number remains active. We used a custom web-scraping program to capture the text of nearly 700,000 escort ads from backpage.com. Our analysis focused on ads posted in eight major cities and numerous smaller towns in Louisiana between 2/21/2016 and 1/26/2017. Initial (simulated) results show that about 65% of phone numbers were still active after about 3 months, about 30% were still active after 8 months, and 20-25% were active after 10 months.
James Van Scotter, Louisiana State University
Denise Mcmanus, The University of Alabama
Miriam Mcgaugh, Oklahoma State University
Joni Shreve, Louisiana State University
The New Era of Credit Risk Modeling and Validation
The core business of banking is lending. In order to make better credit decisions, banks build models to predict the likelihood of default of their borrowers (known as probability of default). Technology developments bring new opportunities to build better credit risk models in two aspects: New types of models and algorithms (machine learning or artificial intelligence) rather than the traditional logistic regression. These models are automatically updated as new data arrives. This situation challenges the nature of how to validate automated models. New sources of information and real-time updates have an important impact on the current risk profile of the borrower and enable us to move to real-time credit scoring. In my presentation, I walk you through these two aspects of credit risk modeling by comparing the performance of the traditional models and the new automated ones. This is just the beginning of a new era of credit risk modeling. Today, many banks are driving innovation to enhance their credit risk management. We are thereare you?
Boaz Galinson, Bank Leumi
The Power of Testing! An Automated SAS® Enterprise Guide® System for Testing
Over the years at Statistics Canada, the testing tasks has been distributed among the employees: unit tests are done by IT, the certification of the calculations by methodologists, and subject-matter analysts approve the process flow and data quality. Most of those tasks can be, and should be, automated; especially those tasks related to unit tests and regression tests. With the help of the SAS® Grid Manager and SAS® Enterprise Guide® flow, this paper presents an innovative way to automate testing. The main goal of this paper is to use the power of SAS® to avoid manual steps, maximize test coverage, and minimize the amount of testing. The paper discusses the nave heuristic-based methodology to automatically generate test cases for an application for which XML files are used as input metadata. As in large applications where it is impossible to cover the quality assurance of each path, the objective is to centralize tests on critical paths as well as give some ideas for automating and testing non-critical paths. Secondly, the paper focus on how to do quality assurance testing from definition test cases based on Microsoft Excel. The paper contains an in-depth explanation for how to architecture the different types of tests: unit test, test cases, test scenarios, and regression tests. Process flow, code examples, and quick tips are given. The limitation of the methodology is also presented and discussed. The final emphasis is on future research for the topic of SAS testing.
Karine Désilets, Statistics Canada
The Ultimate Data-Driven Guide to U.S. Canned Craft Beers Using SAS® Enterprise Miner® 14.2
The explosive growth in the number of available online reviews has provided important guidance for shoppers who are considering the purchase of a product. However, the number of reviews and product choices can be overwhelming. In order to alleviate the problem of information overload, the ability to filter, emphasize, and efficiently deliver relevant information to the customer becomes crucial. Furthermore, product rating prediction based on reviews can be beneficial for online shopping portals to shape their recommendation system and for marketers to generate marketing strategies. Beer is one of the most popular drinks worldwide. In recent years, with the success of microbreweries, the breadth of beer options available is massive. In this study, we provide a data-driven guide to U.S canned craft beers and conduct rating prediction based on online beer reviews. The Text Rule Builder was implemented to extract key words of interest. Similarities between users were explored by using a collaborative filter. Decision tree, linear regression, and k-means clustering were used and evaluated for rating prediction. A linear regression model was selected based on the least mean squared error.
Linyi Tang, Oklahoma State University
There's No Reason to Hide! Discovering the Benefits of Using Hidden Columns in SAS® Visual Analytics
SAS® Visual Analytics 8.2 includes many features that report designers can leverage to create dynamic, interactive reports. One of the most powerful new features available is the new hidden data role that is available for many reporting objects. While initially the concept of hidden columns or data might be a bit foreign, just a little instruction can empower report designers to leverage it and create their most dynamic SAS Visual Analytics reports yet. This paper demonstrates the various methods in which one can use hidden data in report design, and the benefits each of these methods provides to report consumers. Examples include robust mapping between multiple data sources, creating improved color-mapped display rules, and, most importantly, dynamic linking to external web resources based on the values in one or more hidden columns in a list table. Once armed with the knowledge of how to use the hidden data role in SAS Visual Analytics, report designers will be ready to create their most interactive, dynamic, and useful reports yet.
Michael Drutar, SAS
Time Series Analysis of Hate Speech in Social Media
Session 2852Mining and analysis of social media text is a powerful tool for the analysis of thoughts, preferences, and actions of individuals and populations. While commonly used today in marketing and other business applications, Data for Good researchers have begun to apply these methods to the analysis of hate speech in social media. Different channels, including Twitter, Facebook, and Google searches, are found to have distinctive characteristics that affect the types of models and analyses each can support. This paper provides a step-by-step description of how to create and deploy a Twitter API and mine the data to extract tweets with user-selected search terms, including keywords and user name of the person or organization sending the tweet. This methodology is used to investigate hate speech, modeling the time series patterns with the aim of estimating the risk of subsequent acts of violence against persons targeted by the speech. These Data for Good analyses have been performed using SAS® University Edition, a free version of SAS® available to students, professors, and non-profit researchers.
David Corliss, Peace-Work
Time Series Analysis: Did Dropping the Atomic Bomb Save Lives?
Session 2758This paper aims to show how statistical analysis can be used in the field of History. The primary focus of this paper is show how SAS® can be used to obtain a time series analysis of data pertaining to World War II. The purpose of this analysis is to test whether Trumans justification for the use of atomic weapons was valid. Truman believed that by using the atomic weapons, he would be preventing unacceptable levels of U.S. casualties that would be incurred in the course of a conventional invasion of the Japanese home islands.
Rachael Becker, University of Central Florida
Time Series Feature Extraction
Feature extraction is the practice of enhancing machine learning by finding characteristics in the data that help solve a particular problem. For time series data, feature extraction can be performed using various time series analysis and decomposition techniques. In addition, features can be obtained by sequence comparison techniques such as dynamic time warping and by subsequence discovery techniques such as motif analysis. This paper surveys some of the time series feature extraction methods and demonstrates them through examples that use SAS/ETS® and SAS® Visual Forecasting software.
Michele Trovero, SAS
Michael Leonard, SAS
Tips and Techniques for Designing the Perfect Layout with SAS® Visual Analytics
Do you want to create better reports but find it challenging to design the best layout to get your idea across to your consumers? Building the perfect layout does not have to be a rocky experience. SAS® Visual Analytics provides a rich set of containers, layout types, size enhancements, and options that enable you to quickly and easily build beautiful reports. Furthermore, you can design reports that work across different device sizes or that are specific to a particular device size. This paper explores how to use the layout system and demonstrates what you can accomplish.
Ryan Norris, SAS
Brian Young, SAS
Tips and Techniques for Using the Random-Number Generators in SAS®
SAS® 9.4M5 introduces new random-number generators (RNGs) and new subroutines that enable you to initialize, rewind, and use multiple random-number streams. This paper describes the new RNGs and provides tips and techniques for using random numbers effectively and efficiently in SAS. Applications of these techniques include statistical sampling, data simulation, Monte Carlo estimation, and random numbers for parallel computation.
Rick Wicklin, SAS
Warren Sarle, SAS
Tips and Tricks Using the SAS® Windowing Environment
If you work with SAS®, then you probably find yourself repeating steps and tasks when developing a program. The SAS® windowing environment provides a customizable toolbar and other options to help reduce the time required to perform a few repetitive tasks. In this presentation, we discuss how to use the SAS windowing environment to perform a recurring task that's a click or a word away, and provides a few tips and tricks to help increase productivity. This presentation is targeted at new SAS users who might not be aware of all the wonderful options that are available in the SAS windowing environment to help increase productivity.
Raviteja Mudunuru Venkata, The Emmes Corporation
To Catch a Thief: The Use of Analytics for Employee Fraud Detection
New and inventive schemes to defraud companies of cash seem to appear daily. As a result, advanced fraud detection and prevention solutions have garnered heightened interest recently. Base SAS® provides the means for capable SAS® programmers to develop sufficiently sophisticated processes for detecting fraud and preventing the activities of individuals or groups intent on theft. Regrettably, sometimes the simplest yet most effective fraud schemes are perpetrated by trusted employees. This paper presents an actual case of employee-led fraud, as captured on video. Using just the procedures available in Base SAS, this paper describes the technique developed for detecting similar schemes to defraud the employees company. Pros and cons of the analytics underlying the fraud detection process are explored.
Bruce Bedford, Oberweis Dairy, Inc.
To Show or Not to Show? Using SAS® Solutions to Maximize Patient Scheduling In Medical Clinics
Session 2685At busy medical clinics, it is important to maximize the number of patients seen each day. Scheduled patients that fail to show are a common problem in medical clinics and cause a decrease in revenue and disruption in daily operations. In order to combat this issue at several ambulatory clinics in the Southwest, machine learning algorithms were used to help the staff make informed decisions when scheduling patients. Using historical data, predictive models were built to identify patients that are not likely to show up to an appointment. By identifying these patients with high confidence, the staff can quickly fill these open slots to help the clinics serve more people at an efficient pace. A powerful end-to-end SAS® solution was implemented in which SAS® Office Analytics was used to prepare the historical data for modeling and to score new data based on the predictive models, SAS® Enterprise Miner(tm) was used to build, modify, and validate the machine learning algorithms, and SAS® Visual Analytics provided the mechanism to automatically load scored data into memory and populate informative dashboards. Medical staff is able to review the reports while scheduling patients and track the overall performance of the models. By using this solution, these clinics are able to serve more patients and produce additional revenue that would otherwise be lost. Due to the success of this project, the solution is being implemented at other specialty outpatient clinics across the region.
Mia Lyst, Pinnacle Solutions, Inc
Yijie Li, Pinnacle Solutions, Inc
Tools Designed to Aid in Quality Control of Administrative Data Research Projects
Session 2824Using administrative healthcare data for research, including health services, outcomes, and comparative effectiveness research, is challenging for researchers because the data sources are not designed for research purposes. These data takes different forms (for example, hospital discharge data from individual hospitals and insurance claims data from health insurers) and are often reformatted by a secondary entity, such as individual states, the government, or commercial data warehouse. In sum, these elements create substantial variation in the structure and content of the data sets. Therefore, when using administrative health databases for research purposes, the need for quality control becomes an important focus for data scientists. This paper discusses examples of quality control measures used by programmers at our organization. Specifically, we present two macros used at the start of a project by the lead programmer and two macros used by a second quality control programmer after the analytic data set has been created.
Matthew Keller, Washington University in St Louis School of Medicine
Katelin Nickel, Center for Administrative Data Research, Washington University School of Medicine
Top 10 Tips for SAS® Enterprise Miner® Based on 20 Years' Experience
Over the past 20 years that I have been using SAS® Enterprise Miner(tm) and helping analysts with it, I have learned and developed many tips and tricks for ease of use, productivity, and just plain clever implementation. In this presentation, I cover the evolution of SAS Enterprise Miner from the original SAS/AF® software application to the current version that integrates with both open-source software and with SAS® Viya®. I share my top 10 tips for getting the most from using SAS Enterprise Miner, including sharing my favorite node that no one seems to know about and how to implement more complex modeling techniques.
Melodie Rush, SAS
Tracking Down the Culprit of a SAS® Workspace Server Initialization Delay
Session 2003When a SAS® Workspace Server session is launched, many components are activated and many actions are set in motion. Any number of components, for any number of reasons, can cause a delay in the workspace server initialization step. Identifying the problem can be time-consuming. Meanwhile, users who are affected by the delays are left waiting. When users first report the warning signs of a sneaky culprit that is causing the delay, their administrators must then be swift and agile in their efforts to eliminate it. Tracking down the cause and making the necessary changes to resolve the delay is an important mission that can be extremely challenging. This paper is intended for SAS administrators, system administrators, and network administrators, and the goal is to help you work together to investigate the warning signs in order to determine whether there is an initialization delay and, if so, what is causing it. The discussion includes steps to show you how to run network diagnostic tools that can reveal the typical causes of workspace initialization delays. The tools are applicable in both Linux and Windows operating environments. In addition, the paper describes how to enlist the help of SAS Technical Support and what to expect from that process.
Jessica Franklin, SAS
Session 2674The origins of most common troublemaker records are being an outlier, missing values, and data quality. The troublemaker records often make business users doubt or even lose confidence in the systems results, the processing of which involve large volumes of data (hundreds of millions of records) passing through a SAS® engine. In this presentation, the approaches for identifying the troublemakers are discussed through real cases of credit loss estimation in the application of credit stress testing models, which impact the stressed credit loss results at the bank level. This presentation is useful both to SAS users and business analysts. SAS® Enterprise Guide® 5.1 and Base SAS® 9.3 were used in this project. This approach to identifying bad records in the given business case involved four steps. The conclusion is that Base SAS is an efficient tool for identifying troublemaker records when processing large volumes of data and provides high-quality loss estimation in stressed macroeconomic scenarios.
Flora Fang Liu, Bank of Montreal
Troubleshooting Your SAS Grid Environment
Session 2620A SAS® Grid environment provides a highly available and resilient environment for your business. The challenge is that the more complex these environments become, the harder it can be to troubleshoot incidents and proactively optimize the configuration. Efficient troubleshooting of the environment builds user confidence and enables organizations to get the most value out of their investment. This paper focuses on the strategies that should be taken to identify and resolve issues within a SAS Grid environment. It provides a best-practice approach for making modifications to the default configuration files to understand what parts of the environment are in effect. The interpretation of this output can then be used to optimize or troubleshoot the environment. This paper is for SAS administrators who wish to learn more about troubleshooting a SAS Grid environment. It covers both Microsoft Windows and UNIX topologies using both shared and individual configuration directories.
Jason Hawkins, Amadeus Software Limited
TV versus Internet: How Do Americans Consume News?
Session 3606Twenty years ago, only 12% of Americans consumed their news through the internet. Since then, innovations in the use of information and communication technologies have made news accessible in different formats to millions of people all over the world. Businesses involved in content creation and diffusion, such as media organizations and marketing firms, need to be aware of the constant change of consumer preferences to remain competitive. To this end, our goal was to examine demographic variables such as age, race, income, gender, and education, and their effect on the way Americans consume their news, more specifically through the internet or television. We performed a logistic regression analysis on a data set from the General Social Survey. Our analysis indicates that, over the last decade, age has become less impactful in predicting the source of news for Americans. Additionally, we found that those consuming their news through television tend to be older, non-white, low-income, low-educated people, whereas those consuming their news through the internet tend to be younger, white, higher educated, and higher income-earning people. We argue that organizations will find these findings useful in their effort to create relevant content and ads that appeal to specific target demographics.
Ethan Stager, California State University, Long Beach
Andrew Martinez, California State University, Long Beach
Jorge Flores, California State University, Long Beach
Jorge Nuno, California State University, Long Beach
Tweaking Your Tables: Suppressing Superfluous Subtotals in the TABULATE Procedure
Session 2585The TABULATE procedure is a great tool for generating cross tab-style reports. Its very flexible but has a few little limitations. One is suppressing superfluous subtotals. The ALL keyword creates a total or subtotal for the categories in one dimension. However, if there is only one category in the dimension, the subtotal is still shown, which is really just repeating the detail line again. This can look a bit confusing in the final output. This talk demonstrates a method to suppress those superfluous totals by saving the output from PROC TABULATE using the OUT= option. That data set is then reprocessed to remove the undesirable totals using the _TYPE_ variable, which identifies the total rows. PROC TABULATE is then run again against the reprocessed data set to create the final table. This technique highlights the flexibility of the SAS® programming language to get exactly the output you want.
Steve Cavill, Infoclarity
Uncovering Big Opportunities in Big Data
Measurable customer experience (CX) is an integral part of an enterprise CX strategy. Metrics such as sales, retention, net promoter scores (NPS), and customer sentiment are industry standards for measuring CX. However, most of these fall into the after the fact or after the experience category. While the digital industry has been successful in measuring customer behaviors through web analytics to assess CX, and more importantly, modify its applications, in real time, companies that are just undergoing digital transformation are lagging behind in understanding customer behavior since it occurs asynchronously with any digital experience. West Corporation, an Omaha-based provider of technology-enabled communication solutions, helps organizations bridge this gap by creating context-aware voice and digital applications, and providing predictive and prescriptive solutions to optimize customer experience. West Corp generates data when consumers and businesses interact with our platforms to the tune of 10 TB per day across hundreds of applications. In some cases, data must be stored for up to 12 years in a searchable format. This places West Corp in the realm of big data. West Corp Center for Data Science manages and mines this data for insights and real-time application decisions. This paper is a survey of big data analytics applications that are geared toward enriching customer experiences while driving efficiencies and automation in the customer care space.
Dmitriy Khots, West Corporation
Jeremy Wortz, West Corporation
Understanding SAS® K Functions
Session 1902SAS® offers a variety of string functions that help you get the most out of your character data. However, the traditional string functions, such as SUBSTR and INDEX, assume that the length of a string in a SAS character column is always one byte. The K functions offer the power of SAS string function handling for all of your character data, no matter how long a character might be. This paper demonstrates the intrinsic characteristics of K functions by comparing with them with the traditional SAS string functions. Your understanding of K functions will be further deepened by looking at K functions from different perspectives. Based on a full exploration of the SAS K functions, you will be able to write SAS programs that can be run in any SAS environment using data from any languageWrite it once, run it anywhere!
Leo(Jiangtao) Liu, SAS
Understanding Security for SAS® Visual Analytics 8.2 on SAS® Viya®
Session 2130Have you been using SAS® for more than a decade? Did you start using Base SAS®, then progress to more advanced tools like SAS® Enterprise Guide® and SAS® Data Integration Studio? Well, get ready for SAS® Viya®the new cloud-enabled, in-memory analytics engine developed by SAS. SAS Viya is the new engine with which SAS products are being delivered, with SAS® Visual Analytics being one of many exciting new offerings. This paper broadly discusses several security topics of SAS Visual Analytics and SAS Viya, with emphasis on application, data, and content authorization. The target audience for this paper is SAS Administrators who will be implementing SAS Viya security, but anyone looking to understand these concepts will benefit.
Antonio Gianni, SAS
Faisal Qamar, SAS
Understanding the Factors that Affect Customers' Choice of Sunscreen
Recently, we found that the sunscreen products on the US market usually have merely UV and broad spectrum protection. So we wonderedwhat are the main concerns for US consumers that lead them to purchase a specific product? The purpose of this study is to analyze the crucial factors that affect consumers preference of sun care products. We came up with several factors, collected the relative data from historical data about sunscreens, and studied the relationship between these factors and sun care product sales in each state in the US. We used SAS® Enterprise Guide® to perform the analysis. We believe that this project can help sunscreen product manufacturers come up with more accurate manufacturing strategies and marketing plans. In our research, we use data from websites and integrate the information in the form we need. According to the data analyzed, we can provide manufacturers the following three suggestions: Firstly, focus on the Southwestern region of the US and explore more types of demand for sunscreen products in this region; Secondly, focus more on daily sunscreen products, rather than other featured types of sun care products; Thirdly, pay attention and put more products on these three major conduits in order to improve their marketing strategy.
Zihan Fan, Clark University
Xiaojing Zhao, Clark University
Understanding the Influence of Day of the Week on Written Reviews
Session 2654Understanding customer needs is the most critical aspect of every business. Today most companies use customer feedback to understand customer desires. Analyzing which factors affect customer feedback is important for any company. This paper analyses the effect that day of the week has on the way a user reacts. SAS® Enterprise Miner(tm) and SAS® Sentiment Analysis Studio are used to analyze reviews written on weekdays and weekends. By understanding the differences in the opinions expressed on weekdays versus weekends, a company can have a better understanding of their customer reviews. In this paper, nearly 10,000 reviews written on Amazon.com for two products from an electronics product category are explored. The ratings given, the date on which a review is written, and the actual written review are the key fields on which the analysis is carried. The paper systematically analyses the customer reviews and shows a few scenarios in which terms that are considered as negative on weekdays are considered as positive on weekends. In one such scenario, Earbuds seemed to be an issue on weekdays for one of the products that were analyzed, whereas Earbuds was received positively during the weekend for the same product. Likewise, after carefully analyzing the weekend and weekday reviews, we can clearly understand the impact of day of the week on the negative reviews. This analysis can help a company to provide synthesized services.
Sujal Reddy Alugubelli, Oklahoma State University
University CSU: A Web-Based Space Utilization Tool for Investigating Trends in University Room Use
Session 3605Understanding utilization of existing space resources across a university campus has applications for both immediate and long-term needs. This knowledge can be used to support plans for upgrades and expansions of the physical plant of the institution. Historical data can help to understand trends in space usage, provided it can be collected and organized uniformly. Additionally, fundamental characteristics that define the potential use of the elements must be included in order to understand trends in use of specialized spaces within the physical plant. With such information in hand, an interactive, web-based dashboard for assessing campus space utilization (CSU) called University CSU was developed as a tool for visualizing and optimizing room use across a university.
Hilary Melroy, University of North Carolina Wilmington
Michelle Page, University of North Carolina Wilmington
Brittany Palmer, University of North Carolina Wilmington
Gregory Terlecky, University of North Carolina Wilmington
Unleash the Power of the REPORT Procedure with the ODS EXCEL Destination
Session 2479A new Output Delivery System (ODS) destination for creating Microsoft Excel workbooks is available with SAS® 9.4M3. This destination is an extremely easy and handy tool for producing ad hoc as well as production Excel reports. The ODS EXCEL destination has several advantages over the ODS ExcelXP tagset. With the ODS EXCEL destination, you can bring all those powerful features available with the REPORT procedure, such as predefined styles, trafficlighting, custom formatting, and compute block flexibility, straight into your Excel reports. Once you start using the ODS EXCEL destination, you will quickly realize that the PRINT procedure is not sufficient to meet all the formatting demands for your Excel reports. This paper covers various techniques that you can use with PROC REPORT and the ODS EXCEL destination, to make your Excel reports pretty and publication-ready!
Devi Sekar, RTI International
Unleashing the Power of SAS® Visual Analytics: Warranty Analytics in the Automobile Industry
Session 2905Performing warranty analytics in the automobile industry can be a cumbersome task as we need a combination of warranty claims data along with the production and billing data to understand the rates at which warranty claims are received or payments are made toward those claims. However, the data needed to carry out this type of analysis typically comes from different transaction systems. One system records the production and sales of the vehicles, and another system contains information about failures and warranty claims for these vehicles. By using a combination of SAS® products, we integrated the data from these two systems, enabling our client to perform the desired warranty analytics. We exploited various unconventional programming techniques to combine and manipulate the data into the desired form for use with SAS® Visual Analytics. This paper showcases various innovative approaches deployed by our team to treat, manipulate, and harmonize loosely related data, and effectively facilitate different types of analyses on this data using SAS Visual Analytics. A distinguished programming methodology has been used in both data manipulation (with Base SAS® and advanced SAS® coding) and report building (using parameters and calculated items). Developers and consultants can adopt these approaches to create functionalities that arent directly available in SAS® Visual Analytics 7.3 or earlier versions.
Prateek Singh, Bristlecone India Ltd
Paresh Rodrigues, Bristlecone India Ltd
Saket Shubham, BristleCone India Ltd.
Shiv Kumar, Mahindra and Mahindra
Using Arrays to Quickly Perform Fuzzy Merge Look-Ups: Case Studies in Efficiency
Session 2396Merging two data sets when a primary key is not available can be difficult. The MERGE statement cannot be used when BY values do not align, and data set expansion to force BY value alignment can be resource intensive. The use of DATA step arrays, as well as other techniques such as hash tables, can greatly simplify the code, reduce or eliminate the need to sort the data, and significantly improve performance. This paper walks through two different types of examples where these techniques were successfully employed. The advantages are discussed as are the syntax and techniques that were applied. The discussion will enable the reader to further extrapolate to other applications.
Art Carpenter, California Occidental Consultants
Using Cluster Analysis to Maximize Workplace Design Effectiveness
Session 2436While it is generally accepted in the design industry that work spaces should be thoughtfully planned around the workers that occupy them and the types of work they do, space, standards, and budget limit the number of unique workspace options that are feasible. By applying a cluster analysis method to survey data related to the type of work individuals do, job roles are categorized into work style groups with distinct workplace needs and characteristics. These work style profiles are used to inform and develop design strategies to best support the various types of workers in an organization, within spatial and budgetary parameters. This presentation outlines the considerations for selecting and combining variables for analysis, exploration, and creation of the clustering pattern, investigation of unique cluster characteristics, as well as visualization techniques related to this method. Base SAS® procedures used include the FASTCLUS, CANDISC, SGPLOT, and GLM procedures.
Renae Rich, HDR
Using Cox Proportional Hazard Model to Predict Failure: Practical Applications in Multiple Scenarios
Session 2859This presentation highlights practical applications of Cox proportional hazard modeling in multiple scenarios. Survival analysis helps in estimating time to event for a group of individuals or between two or more groups, and it helps to assess the relationship of co-variables to time to event. Survival analysis provides an added advantage over t test or regression analyses when comparing time to event because survival analysis does not ignore censoring. Survival analysis provides an advantage over logistic regression, while still comparing the proportion of events, because it does not ignore time as a factor. In this presentation, the PHREG procedure (using Coxs partial likelihood method to estimate regression models with censored data) is used to highlight practical applications in scenarios (developing a generalized model that can be used and interpreted in multiple practical business scenarios) such as predicting customer churn, predicting patient longevity on a drug or treatments, and project outcomebreakthrough/incremental innovations. The model results can be used in these scenarios because we model the effect of predictors and covariates on the hazard rate but the baseline hazard rate is unspecified. There is no assumption or knowledge of an absolute risk requirement, and the model can be used on the total population of study. This also helps in ease of comparison between different groups. Sample model result interpretations and applications are discussed in detail.
Sudeep Kunhikrishnan, EXL Service
Using Customer Lifetime Value Models in SAS®
In this session, I discuss how to use customer lifetime value (CLV) models to make better business decisions. I give an overview of the different approaches of modeling CLV and discuss when each is appropriate. I cover the simple retention, general retention, and migration models. The models are illustrated using data sets from real companies with SAS® code.
Edward Malthouse, Northwestern University
Using Graph Template Language and R for High-Quality Publication Plots
The Graph Template Language (GTL) is a powerful SAS® tool to create sophisticated plots. There are many features in GTL that one can use to build plots with high-quality visual effects. Besides SAS, R is also a frequently used tool. This paper explores some GTL techniques for generating a publication-quality graph by creating and combining a pie chart and a bar chart, fine-tuning axis and plot position, and embedding texts for clarifications. Step-by-step instructions for making this graph are shown in both GTL and R to demonstrate how certain graphics elements and effects can be accomplished using either. There are numerous software applications for plotting scientific graphs. Some people use SAS to prepare the data set and rely on other software for plotting the graph. This approach involves converting the SAS data set to other data formats to facilitate use with different software. Companies sometimes contract outside vendors for plotting scientific graphs. However, by taking advantage of the capabilities of SAS and R for generating high-quality publication plots, many of these tasks can be done in-house, which makes a good business case for time and cost savings, and for data protection.
Huei-Ling Chen, Merck
Jeff Cheng, Merck
Using Hash Tables to Manage Your Macro Language Control Files
Session 2399When dynamically controlling an application or process by using SAS® macros, it is often advantageous to use a combination of metadata control files and list processing techniques. Typically, the control file is read once into a macro variable list. But what if your information source is a large table and instead of accessing it sequentially, you need to first select specific information based on some supplied criteria? It would be very useful to be able to load the table into memory once, and then access those portions needed only at a given point in time, and to be able to do this without creating any macro variable lists. Learn how this can be done using a combination of hash tables and FCMP functions and routines, along with the macro language to create memory resident lookup tables. The use of controls is introduced in Rosenbloom and Carpenter (2015) and list processing is further discussed in Fehd and Carpenter (2007).
Art Carpenter, California Occidental Consultants
Using Information Value, Information Gain, and Gain Ratio for Detecting Two-Way Interaction Effect
Session 2528An interaction effect occurs when the impact of one attribute on a dependent variable depends on the value of another attribute. The presence of an interaction effect between two attributes often weakens, dissolves, or even distorts the predictive power of either or both attributes if used alone in predicting the outcome. Consequently, some valuable information is discarded or ignored. Examination of a limited number of attributes for an interaction effect can be done manually, but a similar task involving numerous attributes can be challenging. By employing information theory, this paper suggests a combinational use of Information Value, Information Gain, and Gain Ratio to detect a two-way interaction effect at the preliminary stage of variable screening and selection. We also expand the use of the methodology to continuous dependent variables. A SAS® process is introduced that automatically screens all attributes in pair with minimal manual handling by users. The SAS output ranks all pairs of attributes in terms of their magnitude of interaction effect and offers suggestions on variable treatment for downstream analysis or modeling.
Alec Zhixiao Lin, loanDepot
Alec Lin, Loan Depot
Using Maps with the JSON LIBNAME Engine in SAS®
Session 1734This paper serves as an introduction to reading JSON data via the JSON LIBNAME engine in SAS®. The engine includes an automap that dynamically reads the data into a generic data set. This paper goes into detail about specifying and creating your own mapeffectively how to build custom data sets from the JSON data. It covers the input options that can be stated in the JSON map, as well as the subsequent effects those changes produce on the generated SAS data set.
Andrew Gannon, The Financial Risk Group
Using Market Basket Analysis in SAS® Enterprise Miner® to Make Course Enrollment Recommendations
Market basket analysis, an example of association rule mining or affinity analysis, is used most widely in marketing to target customers by identifying the products they purchase in combination. Discovery of existing purchase patterns allows for better product placement, targeted marketing, and improved product recommendations. This data mining technique has also been applied to the analysis of credit card and other service purchases in fraud detection, medical insurance claims, and event promotion. In the fall of 2013, the University of Oklahoma implemented a flat-rate tuition, in which students enrolled in 12 or more credit hours pay a flat rate amounting to 15 credit hours of tuition and mandatory fees. Students who pay the flat rate but enroll in fewer than 15 credit hours can bank those unused hours for enrollment in the following summer term. Here we consider the application of market basket analysis to student course enrollment. Market basket analysis via the Association node in SAS® Enterprise Miner(tm) enables us to identify and capitalize on existing enrollment patterns, applying the resulting association rules to current fall and/or spring enrollment to fashion course enrollment recommendations for the coming summer term. Encouraging students to continue their studies through the academic year lends to increased retention and higher graduation rates, and students use the banked credit hours in our flat-rate tuition system that they might forgo otherwise.
Shawn Hall, University of Oklahoma
Using ODS EXCEL to Integrate Tables, Graphics, and Text into Multi-tabbed Microsoft Excel Reports
Session 2765Do you have a complex report involving multiple tables, text items, and graphics that could best be displayed in a multi-tabbed spreadsheet format? The Output Delivery System (ODS) EXCEL destination, introduced in SAS® 9.4, enables you to create Microsoft Excel workbooks that easily integrate graphics, text, and tables, including column labels, filters, and formatted data values. In this paper, we examine the syntax used to generate a multi-tabbed Excel report that incorporates output from the REPORT, PRINT, SGPLOT, and SGPANEL procedures.
Caroline Walker, Warren Rogers Associates
Using ODS TAGSETS.RTF to Fit Text and Multiple Graphs on One Page
A common challenge is to fit a lot of information on a single page of an RTF document. When the usual solutions with the Output Delivery System (ODS) RTF statement don't work, the ODS TAGSETS.RTF statement offers an alternative solution for managing space on the page efficiently. For example, it is possible to fit two graphs on one page, along with a title and subtitle for the page and captions for each graph. This can be done by using the options that come with ODS TAGSETS.RTF and by creating a style template with the TEMPLATE procedure to decrease the space between elements on the page.
Teresa Wilson, The Emmes Corporation
Using Predictive Analytics to Improve a Student Advising System
For decades, student success has been a focus in higher education. In this paper, institutional researchers discuss the predictive modeling process that could identify students at risk for failing a major STEM course at a top public university. SAS® Enterprise Miner(tm) and SAS® Visual Analytics were applied to predict and visualize student academic performance. This study provides the possibility to use predictive analytics tools as a Student Early Alert System for decision-making support.
Thanuja Sakruti, University of Connecticut
Youyou Zheng, University of Connecticut
Using SAS® 9.4M5 and the Varchar Data Type to Manage Text Strings Exceeding 32 kb
Session 2690Database systems support text fields much longer than the 32 kb limit traditionally supported by SAS®. These fields can be captured as substrings using SAS and explicit pass-through logic. However, managing the resulting substrings is often challenging. This paper shows how the varchar data type, introduced in 9.4M5, can provide improved functionality. The paper solves common management and storage issues associated with the extraction of longer strings, using the varchar field type and the V9 engine.
John Schmitz, Luminare Data LLC
Using SAS® Cloud Analytic Services with Raspberry Pi Time-Lapse Photography
Session 2340While time-lapse image collection is a trivial problem to solve, it presents us with yet another Internet of Things (IoT) data stream that can be easily integrated and built out using the SAS® Cloud Analytic Services (CAS) Image action set. Beginning with the release of the first wave of Raspberry Pi computers in 2012 (even earlier if we include the Arduino), the surge in popularity of low-cost, single-board computers has continued at a breakneck pace over the past several years. While initially intended to be used in K12 computer science education, the Raspberry Pi has been featured prominently as the core component of a massive number of electronics projects. SAS Pittsburgh, currently located on the 29th floor of PPG Place, provides an excellent vantage point for time-lapse photography of the confluence of the Allegheny and Monongahela Rivers. In this session, we walk through how to set up and integrate these two technologies.
Matthew Parmelee, SAS
Using SAS® Enterprise Guide® Projects in SAS® Studio
Youve started learning the SAS® Studio interface since it seems to be a good addition to how you already work with SAS®being able to access your stuff via your browser from other machines is useful. Opening your SAS program files (.SAS) is easy, but did you know you can also open your existing SAS® Enterprise Guide® projects (.EGP files) in SAS Studio? When you open a SAS Enterprise Guide project in SAS Studio, the process flows in the project are extracted and converted to process flows in SAS Studio. Any elements from the SAS Enterprise Guide project that are not supported by SAS Studio are either converted to different node types or are omitted from the process flow, and you can see the status in the conversion report that is created. This feature was experimental in SAS Studio 3.6 and is production in SAS Studio 3.7.
Amy Peters, SAS
Jennifer Jeffreys-Chen, SAS
Marie Dexter, SAS
Jennifer Tamburro, SAS
Using SAS® Enterprise Miner® for Categorization of Customer Comments to Improve Services at USPS
Delivering high-quality service and providing excellent customer experiences are performance outcome goals the U.S. Postal Service has established to measure corporate strategy success and continuous improvement efforts. Social media has opened the door for customer engagement and decision-making. With the help of Twitter, Facebook, and Yelp, government agencies are more informed about how customers feel about their service and experience. Using Yelp data, we text mined comments about U.S. Postal Service customer service, retail service, mail delivery, and facility services using SAS® Text Miner in SAS® Enterprise Miner(tm) 7.1. The aim of this paper is to provide ways to categorize consumer comments regarding U.S. Postal Service services in order to improve the customer experiences at stations.
Olayemi Olatunji, U.S. Postal Service Office of Inspector General
Using SAS® for Multiple Imputation and Analysis of Longitudinal Data
Session 1738This session presents using SAS® to address missing data issues and analysis of longitudinal data. Appropriate multiple imputation and analytic methods are evaluated and demonstrated through an analysis application, using longitudinal survey data with missing data issues. The analysis application demonstrates detailed data management steps required for imputation and analysis, multiple imputation of missing data values, subsequent analysis of imputed data, and finally, interpretation of longitudinal data analysis results. Key SAS tools, including DATA step operations to produce needed data structures and use of the MI, MIANALYZE, MIXED, and SGPLOT procedures are highlighted.
Patricia Berglund, University of Michigan Institute for Social Research
Using SAS® Fraud Framework for Government to Identify Fraud in Brazil's Federal Capital
Session 2723Avoiding fraud in bids, identifying links between people and companies, and inhibiting the illegal accumulation of public offices are the main issues to which SAS® Fraud Framework for Government is applied. The Tribunal de Contas do Distrito Federal is the public institution responsible for controlling the public assets and resources of the Federal Capital, promoting ethics in public management in order to guarantee the full exercise of citizenship. It has the constitutional competence to supervise and judge the good and regular application of public resources by administrators and other officials, assisting the Legislative Chamber of the Federal District in the exercise of external control. Identifying undue actions by individuals and companies in public procurement requires data organization and the application of analytical intelligence. Fraud can be discovered by identifying links between people and companies, and in the supply of products and services between groups of companies. In general, a public official of the Federal District cannot work in more than one public office. However, by applying statistical analysis, it was found that several public servants are circumventing this rule and working in other institutions. This work presents the methodology and procedures used with the implementation of a SAS® software solution in the identification of irregularities in the management of the Federal District.
Romulo Alvim, Tribunal de Contas do Distrito Federal
Rafaella Cunha, Maxtera
Vilcemar Maia Filho, Court of Accounts of Federal District
Using SAS® OnDemand for Academics: Ten Tips for Success
Session 1947SAS® OnDemand for Academics is a free resource that is offered to students, professors and teachers, and independent learners. This online delivery model provides the opportunity to learn valuable skills in SAS programming as well as tools for developing knowledge of analytics. Because SAS OnDemand for Academics is cloud-hosted, not much setup is required. This paper shares tips to maximize the chance of success with using SAS OnDemand for Academics and to minimize inconvenience. These tips minimize inconveniences such as problems logging on, running programs that use Java, and handling issues with instructor data. Drawn from years of experience helping customers, the topics include categories for account setup, dealing with data, and differences among the products that are offered. The discussion also covers what you need to know about how SAS OnDemand for Academics is distributed geographically. By sharing these tips, the teachers and professors, students, and independent learners can benefit from the experience of others.
Randy Mullis, SAS
Using SAS® Text Analytics to Assess International Human Trafficking Patterns
The US Department of State (DOS) and other humanitarian agencies have a vested interest in assessing and preventing human trafficking in its many forms. A subdivision within the DOS releases publicly facing Trafficking in Persons (TIP) reports for more than 200 countries annually. These reports are entirely freeform text, though there is a richness of structure hidden within the text. How can decision-makers quickly tap this information for patterns in international human trafficking? This paper showcases a strategy of applying SAS® Text Analytics to explore the TIP reports and apply new layers of structured information. Specifically, we identify common themes across the reports, use topic analysis to identify a structural similarity across reports, identifying source and destination countries involved in trafficking, and use a rule-building approach to extract these relationships from freeform text. We subsequently depict these trafficking relationships across multiple countries in SAS® Visual Analytics, using a geographic network diagram that covers the types of trafficking as well as whether the countries involved are invested in addressing the problem. This ultimately provides decision-makers with big-picture information about how to best combat human trafficking internationally.
Tom Sabo, SAS
Adam Pilz, SAS
Using SAS® to Estimate Lagged Coefficients with the %partitionedGMM Macro
Session 2661Longitudinal data often includes time-dependent covariates, which must be accounted for with an appropriate model. Although a number of models have been proposed to analyze time-dependent covariates, most approaches constrain the effect of each covariate on the outcome to be constant across time. Irimata, Broatch, and Wilson (2017) introduced a partitioned generalized method of moments (GMM) model that used only valid moment conditions to estimate the differing relationships within longitudinal data. This model provides insight into potential lagged effects of a given covariate on the response in a later time period. Each regression coefficient is estimated using moment conditions corresponding to the respective time period. Irimata and Wilson (2017) presented a SAS® macro for fitting this partitioned GMM model for binary outcomes using SAS/IML® software. We extended the %partitionedGMM macro to allow for either continuous or binary outcomes. In this paper, we also expanded this macro to fit time-independent covariates. The performance and use of this macro are demonstrated through the analysis of two examplesone with a continuous outcome and one with a binary outcome.
Kyle Irimata, Arizona State University
Jeffrey Wilson, Arizona State University
Using SAS® to Identify Cancer Therapy Treatments
When conducting comparative effectiveness research on cancer treatment options used in community-based oncology practices, researchers need generalizable and accurate data about the entire treatment history of patients diagnosed with cancer. To obtain the entire treatment history, programmers often need to combine and summarize data from disparate systems. We constructed an algorithm in SAS® that combines multiple data sources and characterizes the entire course of systemic therapy treatment in patients diagnosed with breast, colorectal, or lung cancer. The algorithm identifies receipt of systemic therapy, specifies the specific drugs in the first course of therapy, and determines any additional courses of therapy or switches in therapy after cancer diagnosis. We present validation results from the algorithm as well as specific SAS functions that enabled us to create this unique algorithm.
Nikki Carroll, Kaiser Permanente
Using SAS® to Teach Credit Risk: From Undergraduate Studies to Professional Education
In this talk, I describe how to design tailor-made experiences for teaching SAS® at different levels, and describe best practices for creating and teaching modules aimed at giving the participants the necessary skills to excel at delivering analytics solutions in modern financial organizations. These experiences come from credit risk courses at BSc, MSc, and the professional education level, both at universities and as part of the Knowledge Management Series modules in Credit Risk, focusing on the process of understanding the needs of financial institutions and how they match with the abilities that the students at each level possess. We answer questions such as how do the students differ and what characteristics do they have in common, and discuss which experiences have worked best when delivering these courses. Some practicalities such as the focus on GUI solutions (such as SAS® Enterprise Miner(tm)) versus teaching code for more personalized solutions, and how to respond to the new challenges that the students have while simultaneously keeping a closely designed curriculum are also discussed.
Cristian Bravo, Southampton Business School, University of Southampton
Using SAS® Visual Analytics to Explore the Western Kentucky University Twittersphere
Session 2486The purpose of this project was to analyze publicly accessible Twitter data related to Western Kentucky University (WKU). This process involved the weekly use of SAS® Visual Analytics to scrape 140-character tweets relative to a given search term. Once pulled into SAS Visual Analytics, the sentiment and text-body of the tweets were analyzed using the text analysis features of SAS Visual Analytics, and the results were recorded. Because this project was conducted on a weekly basis, the average sentiment results provided an interesting time-series perspective into the positive and negative sentiments surrounding WKU in the digital world.
Taylor Blaetz, Western Kentucky University
Using SAS® Visual Analytics to Solve Problems with Reporting Based in Microsoft Excel
Session 2480The use of Microsoft Excel as an analytics tool is pervasive and nearly inescapable in many corporate environments. While it can be an ideal tool for many basic ad hoc analyses, it can easily morph into an unscalable, error-prone, and undocumented reporting solution with significant limitations on end-user functionality. When this occurs, SAS® Visual Analytics can provide a new and improved reporting solution that addresses many of the pitfalls you encounter with Excel. This paper uses a real-world example to illustrate the advantages of moving reporting based in Excel to SAS Visual Analytics. Kaiser Permanentes Data and Information Management Enhancement (DIME) team will soon replace a highly manual, single-threaded, reporting solution based in Excel, which supports the needs our regional call center business partners, with a SAS Visual Analytics solution. Several key advantages of the SAS Visual Analytics solution are discussed, including: consolidating a plethora of Excel files into a small list of dynamic reports; greatly reducing the risk of human error in reports; reducing the number of resource hours needed to maintain and enhance reports; and providing users with the advanced functionality needed for effective decision-making.
Amanda Pasch, Kaiser Permanente
Using SAS® Visual Investigator to Enforce Model Tuning Best Practices in a Regulatory Environment
Session 1777A common process of model tuning in the Financial Crimes space is sampling data to be submitted to subject matter experts for dispositioning to determine whether activities are suspicious. Establishing these target values is a key component to tuning models and being able to produce more efficient and effective models for catching misuse of the US financial system, as well as identifying terrorist financing activity. In addition, providing traceability of the process to establish thresholds and weights for models is critical for regulated aspects of detecting suspicious activity. SAS® Visual Investigator not only provides a toolkit for data exploration and analysis, it is a great tool for implementing analytics with a workflow process attached to it. This paper describes an approach for using SAS Visual Investigator as an interface for end-to-end model building that incorporates a sampling and disposition process while enforcing a workflow and highlighting the auditability of the process.
Scott Wood, SAS
Josh Lincoln, SAS
Edwin Rivera, SAS
Using SAS/OR® to Optimize Scheduling and Routing of Service Vehicles
Session 1758An oil company has a set of wells and a set of well operators. Each well has an established amount of time required for servicing. For a given planning horizon, the company wants to determine which operator should perform service on which wells, on which days, and in which order, with the goal of minimizing service time plus travel time. A frequency constraint for each well restricts the number of days between visits. The solution approach that is presented in this paper uses several features in the OPTMODEL procedure in SAS/OR® software. A simple idea and a small change in the code reduced the run time from one hour to one minute.
Rob Pratt, SAS
Using SAS® Customer Intelligence 360 to Improve and Optimize the SAS® Web Experience
Discover how SAS uses SAS® Customer Intelligence 360 to improve and optimize the SAS web experience for prospects and customers through data collection and analysis, content targeting and personalization, and A/B testing. These activities are designed to help strategically optimize the SAS web experience for end users by delivering more relevant, targeted experiences that result in significant business for SAS.
Scott Calderwood, SAS
Mark Korey, SAS
Laura Maroglou, SAS
Using SAS® to Fit AmeriFlux Data to Ecosystem Seasonality Models
In ecosystem science research, we have several models to define season transitions in ecosystem gross productivity (GEP) and respiration (ER). These models were built with ecosystem data collected years ago. Thanks to the AmeriFlux ecosystem data community, now we have access to ecosystem data from more than 110 sites located across the Americas, compared with 15 sites in 1997. The purpose of this project is to fit the large volume of data that was not available to previous research to our existing models for model evaluation. We used the NLIN procedure for model fitting for each variable of one year at one specific flux data source. For each model fitting process, we used SAS macros to perform Grubbs test for outlier detection and removal, and the GPLOT procedure for data visualization. SAS® macros were written to automate the process of all input files, variables, and data years. Data from 132 input files with an average size of 4000 observations and 70 variables were processed, and two models were evaluated in this project.
Tracy Song-Brink, North Carolina State University
Using the CALIS Procedure in SAS® to Confirm Factors Load for Bullying Scale for LGBTQ Youth in SC
Confirmatory factor analysis (CFA) uses a statistical method to verify the latent factors structure from a set of observed variables. Lesbian, gay, bisexual, transgender, or questioning (LGBTQ) youth are at high risk for bullying in the school environment. Students who identify as LGBTQ are at greater risk for bullying than are heterosexual youth. LGBTQ youth data in South Carolina is used in this analysis. Confirmatory factor analyses were used to examine and confirm two factors for bullying. Confirmatory factor analysis used several statistical tests to examine the fit of the model. Goodness of fit indices includes Chi-square test, Normed Fit Index (NFI), Non-Normed Fit Index (NNFI), Comparative Fit Index (CFI), and root mean square error of approximation (RMSEA). The SAS® FACTOR and CALIS (covariance analysis of linear structural equations) procedures support exploratory and confirmatory analysis. Our results indicated that the model was not completely fit and could improve. However, our result revealed that all items loaded correctly into two latent factors. Coefficient alpha reliability was calculated to assess the scale reliability. Alpha coefficient reliabilities were .88,.87, and .84 for total bullying, hearing, and experience respectively.
Laura Hein, University of South Carolina/ College of Nursing
Abbas Tavakoli, University of South Carolina
Mary Cox, University of South Carolina/ College of Nursing
Using the COPULA Procedure to Simulate Multivariate Data
Session 2625Simulating data is a common task for data scientists. In our scenario, our client wanted to simulate data for purposes of resampling. Specifically, they wanted to simulate a large number of observations of multivariate data based on a small number of real observations. When faced with such a task, an analyst typically takes a univariate approach, perhaps using the UNIVARIATE procedure to produce a histogram of the data to determine a candidate distribution, as well as to obtain the best available estimates of the underlying parameters of the distribution. Then, they use one of the many distribution-specific random number generators to simulate more rows. Data generated in this way should reflect the original distribution well. But there is a serious shortcoming with this univariate approachit ignores any correlations that existed between the columns. A better method of generating such data is through the use of the COPULA procedure. When data is generated with PROC COPULA, not only do the columns within fields have the same parameters, but so do columns between fields. This paper uses SAS® code to demonstrate the need for PROC COPULA, as well as how to use PROC COPULA. It also discusses how PROC COPULA was used to better address our clients needs.
Bill Qualls, First Analytics
Using the FCMP Procedure to the Fullest: Getting Started and Doing More
Session 2403The FCMP procedure is used to create user-defined functions. Many users have yet to tackle this fairly new procedure, while others have attempted to use only its simplest options. Like many SAS® tools, the true value of this procedure is fully appreciated only after the user has started to learn and use it. You can quickly master the basics, and this enables users to move forward to explore some of the more interesting and powerful aspects of PROC FCMP. Starting with the basics of PROC FCMP, this paper also discusses how to store, retrieve, and use user-defined compiled functions. Included is the use of these functions with the macro language as well as with user-defined formats. The use of PROC FCMP should not be limited to the advanced SAS user; even those fairly new to SAS should be able to appreciate the value of user-defined functions.
Art Carpenter, California Occidental Consultants
Using the IML Procedure to Examine the Efficacy of a New Control Charting Technique
Session 2894For many years, control charts have been used to monitor processes, improve quality, and increase profitability. However, the body of literature tilts overwhelmingly toward charts monitoring normally distributed processes. In practice, the underlying distribution of a process might not follow a normal distribution, and many of those techniques might not be most effective. Mukherjee, McCracken, and Chakraborti (2015) suggested three control charts for simultaneous monitoring of the location and scale parameters for processes following the shifted exponential distribution. This study examines their proposed Shifted Exponential Maximum Likelihood Estimator-Max chart (SEMLE-max) and suggests the use of penalized maximum likelihood estimators (MLE) instead of traditional MLEs because of unbiasedness and minimum variance among unbiased estimators. The new chart, the Penalized SEMLE-max chart, is constructed using similar methodology, and simulated data is used to compare average run lengths of the proposed chart to those obtained by the original chart. Additionally, results obtained using similar methodology in R are presented to compare conclusions drawn from both software applications.
Austin Brown, University of Northern Colorado
Using the LIFETEST Procedure to Calculate an Alternative to Strike Rate in Limited Overs Cricket
Session 1788In limited overs cricket, batsmen are measured by their ability to score quickly. The most common statistic is the strike rate, which is the average number of runs scored per 100 balls faced. One drawback to the strike rate is that two batsmen with similar values might differ in their ability to sustain power. For example, both a batsman who scored 25 runs on 20 balls and one who scored 5 runs on 4 balls yielded a strike rate of 125. Strike rates also do not distinguish batsmen on their ability to avoid outs. For example, two batsmen who score 7 runs on 7 balls both have a strike rate of 100, even though one batsman was out on the seventh ball, whereas the inning ended after the seventh ball for the other. This paper proposes an alternative measure, namely expected runs at x balls, where x is a positive integer. The ability of different batsmen to sustain power are compared by plotting their expected run values against the corresponding ball count. Calculations are performed by the Kaplan-Meier Product Limit Estimator within the LIFETEST procedure in SAS/STAT® software. This procedure allows run totals in matches where the batsman was not out to be treated as censored observations. Results from the 20162017 season of the KFC Big Bash League (BBL) are analyzed. The BBL is an Australia-based Twenty20 league.
Keith Curtis, USAA
Using the ODS EXCEL Destination with SAS® University Edition to Send Graphs to Microsoft Excel
Session 2583Students now have access to a SAS® learning tool called SAS® University Edition. This online tool is freely available for academic, non-commercial use. This means it is basically a free version of SAS that can be used to teach yourself or someone else how to use SAS. Since a large part of my body of writing has focused upon moving data between SAS and Microsoft Excel, I thought I would take some time to highlight the tasks that permit movement of data between SAS and Excel using SAS University Edition. This paper is directed toward sending graphs to Excel using the new ODS EXCEL destination.
William Benjamin Jr, Owl Computer Consultancy LLC
Using UNIX Shell Scripting to Enhance Your SAS® Programming Experience
Session 2412This series addresses three different approaches to using a combination of UNIX shell scripting and SAS® programming to dramatically increase programmer productivity by automating repetitive, time-consuming tasks. Part One embeds an entire SAS program inside a UNIX shell script, feeds it an input file of source and target data locations, and then writes the SAS copy program to each protocols directory with dynamically updated LIBNAME statements. This approach turned almost a weeks worth of work into mere minutes. Part Two of the series reviews another deploy shell script that creates its own input file and determines whether a stand-alone SAS program should be deployed to each directory. This approach turned a full days worth of work into just a couple of minutes. Part Three consists of a smaller shell script that dynamically creates SAS code depending on the contents of the directory in which the program is executed. At the end of these three segments, you will have a better understanding of how to dramatically increase the productivity of your SAS programs by integrating UNIX shell-scripting into your SAS programming to automate repetitive, time-consuming tasks. None of these programs requires any input from the user once started, so the only programming required is the few minutes it takes to set up the programs.
James Curley, Eliassen Group | Biometrics and Data Solutions
V is for Venn Diagrams
Session 1965Would you like to produce Venn diagrams easily? This paper shows how you can produce stunning two-, three-, and four-way Venn diagrams by using the Graph Template Language, in particular the DRAWOVAL and DRAWTEXT statements. From my experience, Venn diagrams have typically been created in the pharmaceutical industry by using Microsoft Excel and Microsoft PowerPoint. Excel is used to first count the numbers in each group, and PowerPoint is used to generate the two- or three-way Venn diagrams. The four-way Venn diagram is largely unheard of. When someone is brave enough to tackle it manually, working out the numbers that should go in each of the 16 groups and entering the right number into the right group is usually done nervously!
Validating User-Submitted Data Files with Base SAS®
Session 1662SAS® programming professionals are often asked to receive data files from external sources and analyze them using SAS. Such data files could be received from a different group within ones own organization, from a client firm, from a Federal agency, or from some other collaborating establishment. Whatever the source, there is likely to be an agreement on the number of variables, variable types, variable lengths, range of values, and variable formats in the incoming data file. Information technology best practices require that the receiving party perform quality checks (QCs) on the data to verify that it conforms to the agreed-upon standards before the data are used in analysis. This paper presents a rigorous methodology for validating user-submitted data sets using Base SAS®. Readers can use this methodology and the SAS code examples to set up their own data QC regimen.
Michael Raithel, Westat
Weather Data Cleansing for Energy Forecasting
Session 1664Energy forecasting has become widely applicable in the utility industry for system planning and operations. One of the main drivers of electricity demand is weather. Programs such as SAS® Energy Forecasting use hourly temperature readings as an input to their forecasting models. A lack of quality temperature data can contribute to less accurate forecasts. Despite observable daily as well as yearly seasonal patterns in the data, temperature can be very volatile, making it difficult to accurately fill in missing data. In order to monitor and improve the quality of temperature data, we sought to create an algorithm in SAS® to automate the process of identifying missing and bad data, as well as imputing cleansed data into a temperature series. We examined different methods for filling in short missing series and long missing series using five years of historical data. For long missing series, we could fill in data from nearby stations (in distance and elevation) or from previous time periods of the same temperature series. We also looked at what might qualify as bad temperature data. The weather data-cleansing algorithm we built runs relatively quickly with little input from the user. The process is friendly to analysts and business users alike because it is easy to implement regardless of skill level. We think that the forecasting process can be simplified and improved with the use of our algorithm.
Aubrey Condor, UCF
Rachael Becker, University of Central Florida
Web Metrics at Scale: Using Base SAS® to Access Google Analytics APIs
Session 2120With SAS® 9.4M4 and later, its finally easy (relatively speaking) to connect to complicated APIs like those supported by Google, and to gather information with an unattended batch process. The task is made possible by recent enhancements in the HTTP procedure and the new JSON library engine in SAS®. The PROC HTTP enhancements make it easier to negotiate multi-step authentication schemes like OAuth 2. And the JSON engine makes it easier to parse JSON results into SAS data sets. Previous approaches relied on complicated techniques such as the GROOVY procedure or other tricks to call outside of SAS to drive these OAuth2 APIs and parse JSON. The recent enhancements make such tricks unnecessary and thus provide a cleaner approach with fewer moving parts. In this paper, I describe the four main steps needed to use Google APIs to gather web metrics data. The paper includes SAS code that you can adapt and use immediately with your own Google Analytics environment. I also present techniques that you can generalize to access any REST API that relies on OAuth 2 as its authentication mechanism.
Chris Hemedinger, SAS
What's New at SAS for Health Care
This paper highlights the SAS® health care solutions SAS® Real World Evidence and SAS® Episode Analytics. While seamlessly integrated with the SAS® Platform and SAS® Visual Analytics, these solutions are designed to help a wide spectrum of health care organizations (providers, payers and insurance companies, pharma, and government agencies) to easily derive health care analytics and insights from administrative claims data and patients electronic medical records. For SAS Real World Evidence, this paper covers built-in longitudinal patient views and the interactive cohort discovery tool, which can easily identify cohorts of patients with complex sequences of events and encounters (diagnoses, medical procedures, visits, lab test results, and drug exposures) from a pool of millions of patients with a query response time of seconds; pre-defined analytical models such as Readmission Analysis, Length of Stay Analysis, Risk Scores and Comorbidity Report, and Incidence and Prevalence Analysis; and how to use your own analytical models via the Add-in Builder. For SAS Episode Analytics, this paper covers chronic conditions episodes, reporting per member per month (PM/PM) costs; and running prospective costing models that help payers and states predict future costs related to payment contracts and budgets. This paper also provides an understanding of the utilization and treatment pathways needed for population health management, quality metrics, and health outcomes.
Lina Clover, SAS
David Olaleye, SAS
Laurie Rose, SAS
What's New in SAS® Data Connectors for SAS® Viya®
The latest release of SAS® Viya® provides an expanded set of features for accessing your data. This presentation provides an overview of the latest features available with SAS® data connectors. During the presentation, we discuss and provide examples of the following new product capabilities: expanded support form more data sources pushing SQL queries from SAS® Cloud Analytic Services (CAS) to databases Saving CAS tables to third-party data sources The examples are drawn from a wide variety of relational databases including, but not limited to, Microsoft SQL Server and IBM Db2.
Salman Maher, SAS
Chris Dehart, SAS
What's New in SAS® Data Management
The latest release of SAS® Data Management provides a comprehensive and integrated set of capabilities for collecting, transforming, and managing your data. The latest features include capabilities for working with data from a wide variety of environments including Apache Hadoop, the cloud, a relational database management system (RDBMS), unstructured data, streaming, Apache Spark, and the SAS® Cloud Analytic Services server. New in the release is support and integration into SAS® Viya®. This paper provides an overview of the latest features of SAS Data Management and includes use cases and examples for leveraging the latest product capabilities.
Nancy Rausch, SAS
Which Smart Phone to Choose?
Session 2836Android smartphones have over 80% of the market share in the smartphone industry. With so many new phones launched with similar price points and features, it is almost impossible for a customer to make a decision and be satisfied. One effective approach to this problem is by using the experiences of users of these products to draw insights and reach a conclusion. The experiences of these customers are best captured in the form of feedback or reviews. One of the best sources of such reviews is the largest online marketplace, Amazon. Reviews are individual perspectives, which are very diverse and cover both the positive and negative emotions of customers with regard to a product. Analyzing the details of these reviews could provide more information than just plain specifications of smartphones. Information about the performance of the touchscreen, actual abilities of the camera, music and audio experiences, and other important factors could provide insights for the phone buyer as well as for the phone manufacturer. A buyer can narrow down their choice to a single product bases on reviews of their most desired aspect of a phone, whereas manufactures can understand the limitations of their current version and come up with a better product to have success in this competitive and growing marketspace. In this paper, I analyze the overall sentiment of the reviews of smartphones that fall into similar price points.
Mohana Krishna Chaitanya Korlepara, Oklahoma State University
Who Is Likely to Succeed: Predictive Modeling of the Journey from H-1B to Permanent US Work Visa
Session 2876The purpose of this paper is to help US employers and legislators predict which employees are most likely to succeed in the US job market and therefore ensure that the time and money spent is for the most eligible ones. A permanent labor certification issued by the Department of Labor (DOL) allows an employer to hire a foreign worker to work permanently in the United States. We build several predictive models to signal who among the applicants awarded the opportunity to receive an H-1B visa, which is a temporary work permit, is likely to sustain employment and eventually succeed. Then, we do a probability measure of each H-1B visa holders that we later use as a valuation factor. The model will enable legislators and employers in the US job market to pass legislation and conduct hiring process based on and targeted toward those workers who are most likely to make the most out of the process.
Shibbir Khan, Clark University
Who's Your Daddy: Managing Parent-Child Hierarchies in SAS®
Session 1865Reporting hierarchies underpin most, if not all, business reporting solutions. Accurate hierarchies enable reporting lines to quickly drill into and deal with issues, address key performance indicator (KPI) fluctuations, and of course, receive credit for effort expended. Without an accurate hierarchy, KPI reporting in large organizations would quickly become mistrusted and suffer the dreaded loss of adoption and potentially become extinct. This paper shows you how to simply manage large, multi-level hierarchies, complete with change history, through a simple HTML5 interface built on top of the SAS® Stored Process web application. Powered by the Boemska HTML5 Data Adapter for SAS® (H54S) and the built-in hierarchical query functions of major database management systems (DBMSs) (Oracle and PostgreSQL), you can manage your business reporting hierarchy with just a few lines of Base SAS® code. This solution has been proven in production for over 5000 leaf members across 10 levels, supporting approximately 500 reporting line changes per month. Maintaining full change history delivers a full audit trail and historical roll-up functionality, this solution also supports future dated changes to enable your line managers to plan departmental changes well in advance.
Richard Collins, RWC Technical Solutions Ltd
Working Together for Public Health: Using SAS®, Esri, and Tableau to Enhance Data Presentation
Session 1949Numerous clients receive clinical services at local public health departments every day. With the current climate of providing necessary services with fewer resources, data-driven decision-making is essential in public health. Many local public health data systems are antiquated and provide limited reports to public health employees. Innovative means of assessing and visualizing data to enable public health employees to quickly access timely information specific to their clients and community will enhance efforts at the local level. This presentation demonstrates how large volumes of county-level surveillance data and client data can be analyzed using SAS® and meaningfully displayed in a dashboard using Esri ArcGIS maps and Tableau software.
Jennifer Han, Oklahoma State Department of Health
Working with Big Data in SAS®
Session 2160This paper demonstrates challenges and solutions when using SAS® to process large data sets. Learn how to perform the following tasks: use SAS system options to evaluate query efficiency tune SAS programs to improve big-data performance modify SQL queries to maximize implicit pass-through re-architect processes to improve performance identify situations where data precision could be degraded leverage FedSQL and DS2 for: parallel processing on the SAS® Platform for improved performance full-precision calculations with ANSI data types In-database processing with the SAS® In-Database Code Accelerator Take advantage of SAS® Viya® and SAS® Cloud Analytic Services (CAS) for fast distributed processing
Mark Jordan, SAS
Wow! You Did That Map With SAS®?! Round II
Session 2346This paper explores the creation of complex maps with SAS® software. This presentation explores the wide range of possibilities provided by SAS/GRAPH® software and polygon plots using the ODS Statistical Graphics (SG) procedures, as well as replays and overlays, annotations including spark lines, animations, Zip Code-level processing, and so on. The more recent GfK maps now provided by SAS that underlie newer SAS products such as SAS® Visual Analytics as well as traditional SAS products, are discussed. The differences and similarities in how SAS/GRAPH and SG procedures approach the summarization of data that is not already pre-digested to portray on maps are also explored. The new SGMAP procedure is also discussed.
Louise Hadden, Abt Associates
Writing Code with Your Data: The Basics of Data-Driven Programming Techniques
In this paper, which is aimed at SAS® programmers who have limited experience with DATA step programming, we discuss the basics of data-driven programming. First, we define data-driven programming, and then we show several easy-to-learn techniques to get a novice or intermediate programmer started using data-driven programming in their own work. We discuss using the SQL procedure with the SELECT statement and the INTO clause to push information into macro variables; the CONTENTS procedure and the dictionary tables to query metadata; using an external file to drive logic; and generating and applying formats and labels automatically. Prior to reading this paper, programmers should be familiar with the basics of the DATA step; be able to import data from external files; have a basic understanding of formats and variable labels; and be aware of both what a macro variable is and what a macro is. Knowledge of macro programming is not a prerequisite for understanding this papers concepts.
Joe Matise, NORC at the University of Chicago
Your Data Visualization Game Is Strong - Take It to Level 8.2
Your organization already uses SAS® Visual Analytics, and you have designed reports that show compelling data stories. The newest version of SAS Visual Analytics can give those stories a facelift through its clean, modern HTML5 interface and exciting new visualization features. Learn how to make the transition seamless while also using the move as an opportunity to focus on the most compelling reports. We walk through the methodology and the automation techniques that we used when we moved our own internal SAS Visual Analytics environment from 7.3 to 8.2.
Brandon Kirk, SAS
Jason Shoffner, SAS
Zillow's Home Value Prediction Using Data Mining
Session 2871Zillows Zestimate home valuation has shaken up the U.S. real estate industry since it was first released 11 years ago. A home is often the largest and most expensive purchase a person makes in his or her lifetime. Ensuring that homeowners have a trusted way to monitor this asset is incredibly important. The Zestimate was created to give consumers as much information as possible about homes and the housing market, marking the first time consumers had access to this type of home value information at no cost. Zestimates are estimated home values based on 7.5 million statistical and machine learning models that analyze hundreds of data points on each property. By continually improving the median margin of error (from 14% at the onset to 5% today), Zillow has since become established as one of the largest, most trusted marketplaces for real estate information in the U.S. and a leading example of impactful machine learning. The goal of this paper is to predict efficient house pricing for real estate customers with respect to their budgets and priorities. By analyzing previous market trends, price ranges, and upcoming developments, future prices can be predicted. This paper involves a website that accepts customers specifications, and then combines the application of multiple linear regression algorithms of data mining. This application can help customers to invest in a property without approaching an agent. It also decreases the risk involved in the transaction.
Vivek Singh, Oklahoma State University
ZIPpy Safe Harbor De-identification Macros
The U.S. Health Insurance Portability and Accountability Act of 1996 (HIPAA) Privacy Rule regulates the use of individually identifiable protected health information (PHI) to researchers. Researchers need to abstract the data that is within the patient charts for statistical purposes, but often do not need PHI. Unfortunately, accessing this data exposes researchers to PHI. One way to protect patient data, while ensuring that researchers are compliant with the HIPAA privacy rules, is to use de-identified data provided by an honest broker. Stated within the law, de-identification can be achieved either by a) an expert using statistical and scientific principles and methods, or more commonly, through b) the Safe Harbor method. The Safe Harbor method requires the removal of 18 types of identifiable information, such as names, dates, addresses, phone numbers, account numbers, and so on. This paper focuses on the creation and use of Safe Harbor macros to create a de-identified data set by properly removing and changing necessary geographic information, identifiable dates, and medical information.
Abigail Chatfield, Spectrum Health
Jessica Parker, Spectrum Health
Paul Egeler, Spectrum Health
%Q-Index: A SAS® Macro for a Conditional Item-Fit Index for the Rasch Model
The use of Rasch analysis has increased in past decades. Rasch analysis is commonly used in educational and psychological testing; the method is also popular in the measurement of health status and evaluation outcomes (Christensen, 2013). Moreover, the purpose of fit statistics is to screen misfitting items, which is an important issue in Rasch analysis in order to evaluate the items consistently through indicators. If fit statistics are incorrect, a misfitting item might not be located correctly, or a good item might be identified incorrectly as a misfitting item. More importantly, the properties and benefits of using a certain model, in this case the Rasch model, only hold if the data fit the model. The Q-Index has desirable characteristics, which could provide a solution to applied researchers concerned with the limitations of current fit indices. However, little research has been performed regarding the robustness of Q-Index (Ostini and Nering, 2006). This might be due to the lack of availability of the Q-Index in popular Rasch software such as WINSTEPS. In fact, the Q-Index is available only in the Rasch software WINMIRA, which has not been updated since 2001 (von Davier, 2001). In this paper, the researchers use SAS to estimate the Rasch model using the LOGISTIC procedure (Pan, 2011) and compute the Q-Index. Finally, this paper describes a SAS macro that fits the Rasch model for dichotomous and polytomous data.
Samantha Estrada, University of Northern Colorado
Rafael Perez Abreu, Centro de Investigación en Matemáticas A.C.