SAS® High-Performance Analytics
Generate accurate, timely insights and solve complex problems using huge volumes of structured and unstructured data
SAS High-Performance Analytics includes domain specific offerings for statistics, data mining, text mining, forecasting, optimization, and econometrics – all available for execution in a highly scalable, distributed in-memory processing architecture.
With models that can run in minutes or seconds, you can perform more frequent modeling iterations and use sophisticated analytics to get answers to questions you never thought of or had time to ask. This solution is available on EMC Greenplum or Teradata appliances, as well as on commodity hardware using Apache Hadoop or Cloudera.
Benefits
- Quickly and confidently seize new opportunities, detect unknown risks and make the right choices.
- Use all data (including unstructured) with advanced modeling techniques and perform more model iterations to get answers to your difficult questions.
- Derive insights at breakthrough speeds for high-value and time-sensitive decision making.
- Take advantage of a highly scalable and reliable analytics infrastructure to test more ideas and multiple scenarios with all your data.
Features
- SAS® High-Performance Statistics
- SAS® High-Performance Data Mining
- SAS® High-Performance Econometrics
- SAS® High-Performance Text Mining
- SAS® High-Performance Optimization
- SAS® High-Performance Forecasting
- Common set of high-performance procedures
" SAS High-Performance Analytics can turn any data, including big data, assets into quicker, better business decisions and ultimately competitive advantage."
— Dan Vesset
Program Vice President, IDC's Business Analytics research
Screenshot
High-performance data mining enables modelers to produce more training runs for significantly more lift and incremental predictive power.
Screenshots
How SAS® Is Different
- Provides the only in-memory analytics offering that processes high-end analytics and big data to produce time-sensitive insights. SAS High-Performance Analytics is truly about applying sophisticated analytical techniques to solve complex business problems – not just about using query, reporting and descriptive statistics within an in-memory environment.
- Addresses the entire analytical life cycle. Unlike other offerings, SAS High-Performance Analytics can perform analyses on structured and unstructured data that range from data summarization and exploration to model building and scoring of new data at breakthrough speeds. The timely results enable you to extract more value from your data and stay ahead of the competition.
- More than 36 years of proven technology – faster. SAS customers at more than 60,000 sites around the world can now take advantage of running SAS analytic procedures in a distributed, in-memory environment. With this solution, our customers can quickly extract value from big data to transform their businesses, opening up a vast array of possibilities never before imagined.
- Structured and unstructured data can be combined and mined to improve predictive modeling power. Integration and fast in-depth analysis of structured and unstructured data are now possible with the high-performance text mining. The descriptive knowledge contained in text data will improve your predictive modeling power.
Benefits
- Quickly and confidently seize new opportunities, detect unknown risks and make the right choices. Finer and more accurate results allow organizations to realize significant value, drive new revenue opportunities and increase bottom-line savings.
- Use all data (including unstructured) with advanced modeling techniques and perform more model iterations to get answers to your difficult questions. By applying sophisticated analytics against all of your data instead of just subsets or aggregates, you can improve accuracy for more targeted, high-impact decisions. You can employ the best modeling techniques and perform more model iterations to get answers to the most difficult questions. Combining structured data with text data uncovers relationships that were previously undetected and adds more predictive power to models.
- Derive insights at breakthrough speeds for high-value and time-sensitive decision making. Shrink the analytical model processing time and derive rapid insights so you can make well-informed decisions. SAS High-Performance Analytics delivers blazing fast performance to evaluate alternative scenarios, quickly detect changes in volatile markets, and make timely, optimal recommendations.
- Take advantage of a highly scalable and reliable analytics infrastructure to test more ideas and multiple scenarios with all your data. With SAS High-Performance Analytics, analytical professionals can take full advantage of the in-memory infrastructure to solve the most complex questions without constraints.
Features
- SAS® High-Performance Statistics
-
High-performance linear regression
- Supports generalized linear models and reference parameterization for classification effects.
- Supports partitioning of data into training, validation and testing roles.
- Supports a FREQ statement for grouped analysis and a WEIGHT statement for weighted analysis.
- Provides multiple effect-selection methods.
High-performance nonlinear regression
- Computes analytical derivatives of user-provided expressions for more robust parameter estimations, both improving and accelerating the estimations.
- Evaluates user-provided expressions and their confidence limits with the ESTIMATE and PREDICT statements.
- Estimates parameters by using least squares and the maximum likelihood method.
High-performance logistic regression
- Predicts binary, binomial and multinomial outcomes.
- Provides model-building syntax with the CLASS and effect-based MODEL statements (in particular, the GLM, LOGISTIC, GLIMMIX and MIXED procedures).
- Provides cumulative link models for ordinal data; generalized logistic modeling for unordered multinomial data enables model building (variable selection) through the SELECTION statement.
- Provides a WEIGHT statement for weighted analysis and a FREQ statement for grouped analysis.
- Provides an OUTPUT statement to produce a data set with predicted probabilities and other observation-wise statistics.
High-performance mixed linear models
- Fits a variety of mixed linear models to data and enables you to use these fitted models to make statistical inferences about the data.
- Supports multiple covariance structures.
- Provides appropriate standard errors for all specified, estimable linear combinations of fixed and random effects, and corresponding t-tests and F-tests.
- REML and ML estimation methods are implemented with a variety of optimization algorithms.
- Provides special dense and sparse algorithms that take advantage of distributed and multiple-core computing environments.
- SAS® High-Performance Data Mining
-
High-performance neural networks
- Provides automatic standardization of input and target variables.
- Provides intelligent defaults for most neural network parameters such as activation and error functions.
- Provides automatic selection and use of a validation data subset.
- Provides automatic termination of training when the validation error stops improving.
- Provides the ability to weight individual observations.
High-performance forests
- Creates an ensemble of hundreds of decision trees to predict a single target.
- Trains hundreds of decision trees in parallel independently on different grid nodes.
- Randomly selects the input variables considered for splitting a node from all available inputs.
- Considers only a single variable that is most associated with the target for splitting.
High-performance 4SCORE
- The 4SCORE procedure scores a previously trained forest model produced by the HPFOREST procedure.
High-performance data mining nodes
-
Includes the following high-performance-enabled SAS Enterprise Miner nodes:
- HP Data Partition.
- HP Explore.
- HP Transform.
- HP Variable Selection.
- HP Regression.
- HP Neural.
- HP Forest.
- HP Impute.
High-derformance decide
- Creates optimal decisions that are based on a decision matrix that you specify, on prior probabilities, and on output from a modeling procedure (which can be predicted values for an interval target variable).
- The decision matrix contains columns (decision variables) that correspond to each decision and rows (observations) that correspond to target values. The values of the decision variables represent target-specific consequences, which might be profit, loss, or revenue.
High-performance variable reduction
- Reduces dimensionality by using the HPREDUCE procedure for structured inputs and to select a subset of the original variables (variable selection) to preserve model interpretation.
- Performs unsupervised variable section by identifying a set of variables that jointly explain the maximum amount of data variance (covariance analysis).
- Provides distributed computation and output of the CORR, COV or SSCP matrix.
- Uses the CLASS statement to support categorical inputs.
- Supports main and interaction effects with the VAR statement.
- Outputs statistics and matrix information for exploratory data analysis that can also be used as direct input for statistical procedures, thus saving time by eliminating redundant matrix aggregations.
- SAS® High-Performance Econometrics
-
High-performance count regression
- Fits regression models where the dependent variable represents counts (e.g., the number of events recorded in some period or for some subject).
- Supports Poisson and negative binomial models.
- Supports zero-inflated Poisson and negative binomial models, and can fit separate regressors for the zero-inflated distribution.
- Estimates parameters by using the maximum likelihood method.
High-performance severity models
- Fits probability distributions for the severity (magnitude) of random events (including those with negative effects – e.g., the magnitude of damages caused by natural disasters, distributions of losses claimed under insurance policies, or the severity of disease outbreaks – as well as events with positive effects – e.g., the intermittent demand for certain products).
- Fits regression models for the scale of the severity distribution.
- Provides nine different probability distributions, including the Tweedie distribution, and can automatically select the best-fitting distribution.
- Allows users to add additional probability distributions.
- Can model data truncation and data censoring.
High-performance qualitative and limited independent variable models
- Fits linear, censored and truncated regression models with heteroscedasticity and stochastic frontier production and cost models.
- SAS® High-Performance Text Mining
-
- Text parsing with properties that include:
- Detection of different parts of speech, stemming and synonyms.
- Frequency and term weighting.
- Transformation via singular value decomposition (SVD) that:
- Reduces the term-by-document matrix generated from the parsing process to a reduced numeric, structured representation of the document collection.
- Enables transformation output to be used as input into high-performance structured data mining procedures.
- Graphs and table outputs provide detailed information about the terms and their distributions within the collection.
- Includes high-performance text mining scoring of large-scale textual data.
- Text parsing with properties that include:
- SAS® High-Performance Optimization
-
High-performance local search optimization
- Ability to optimize a user-defined objective subject to nonlinear constraints.
- Allows continuous and integer variables.
- Ability to embed the genetic algorithm solver (PROC GA) from SAS/OR, as well as other local search techniques.
- Available only for Teradata and Greenplum at this time.
High-performance optimization
- Provides a decomposition algorithm that works well for certain classes of large structured linear and mixed integer optimization problems. The algorithm decomposes the overall problem into a set of component problems that can be solved quickly.
- Provides a multistart capability that increases the likelihood of identifying a globally optimal solution. It can be used to select and begin optimization from each point. The best solution found among all starting points is reported.
- Parallelized tuner functionality in PROC OPTMILP helps define a good option setting for certain problems. It solves lots of problems with various settings using a clever search of the option combinations and finds the best set of option values.
- Available only for Teradata and Greenplum at this time.
- SAS® High-Performance Forecasting
-
- Designed to efficiently process large-scale, hierarchically-structured, time-stamped data sets. Process multiple hierarchy levels in a single pass of the data set.
- Provides an automatic way to generate time series from the time-stamped data, and produce forecasts for them in one step.
- For typical time series, the HPFORECAST procedure automatically chooses the best performing forecast from among the following smoothing models: Simple, linear, damped trend, seasonal (additive and multiplicative), Winters method (additive and multiplicative).
- Transformed versions of the models include: Log, square root, logistic, and Box-Cox.
- Can forecast time series data, whose observations are equally spaced by a specific time interval (for example, monthly, weekly), and also transactional data, whose observations are not spaced with respect to any particular time interval.
- Common set of high-performance procedures
-
The following procedures are available in all six of the SAS High-Performance Analytics products listed above.
High-performance data summarization
- Enables large-scale data exploration and summarization through a series of parallelized procedures.
- Generates descriptive statistics on a large scale, very quickly.
- Creates mean, min, max, range and measures of spread and centrality along with data for cardinality, summary and levels of variables.
High-performance data mining database
-
Creates summary statistics of key input data sources, using:
- Number of observations.
- Number of observations that contain a missing value.
- Minimum observed value.
- Maximum observed value.
- Mean of observed values.
- Standard deviation.
- Measure of asymmetry.
- Measure of the "heaviness of the tails."
- Sum of all non-missing observations.
- Corrected sum of squares.
- Sum of squares.
High-performance correlation
- Enables you to compute correlations for big data sets that have large quantities of rows and columns.
High-performance sampling
- Performs either high performance simple random sampling or stratified sampling.
- Creates one output data set, which contains the sample data set; one performance table, which contains performance information; and one frequency table, which contains the frequency information for the population and sample.
High-performance binning
- Bucket (equal-length) binning method.
- Winsorized binning method and Winsorized statistics.
- Pseudo–quantile binning method, which is similar to quantile binning.
- Provides a mapping table for the selected binning method.
- Provides a basic statistical table that contains the minimum, maximum, mean, pseudo-median, and so on.
- Histogram table that shows the output mapping statistics.
High-performance imputation
- Executes high-performance numeric variable imputation with a specified value.
- Can also replace numeric missing values with the mean, the pseudo-median, or some random value between the minimum value and the maximum value of the non-missing values.
Screenshots
High-performance data mining enables modelers to produce more training runs for significantly more lift and incremental predictive power.

High-performance data mining enables modelers to explore, model and score using complete data – not just a subset – to get accurate and timely insights.
System Requirements
Server tier
- Linux x64 (64-bit): Novell SuSE 10 and 11; RHEL 5 and 6; Oracle Linux 5.5 and 6
Client tier (SAS® software)
- Microsoft Windows x64 (64-bit): Windows XP Professional for x64, Windows Vista* for x64, Windows 7** for x64
- Linux x64 (64-bit): Novell SuSE 10 and 11; RHEL 5 and 6; Oracle Linux 5.5 and 6
Required SAS® software
- Base SAS
- SAS/ACCESS® to EMC Greenplum or SAS/ACCESS to Teradata (if running on Teradata or Greenplum appliances)
- SAS/STAT (for SAS High-Performance Statistics)
- SAS Enterprise Miner (for SAS High-Performance Data Mining)
- SAS Text Miner (for SAS High-Performance Text Mining)
- SAS/ETS (for SAS High-Performance Econometrics)
- SAS/OR (for SAS High-Performance Optimization)
- SAS Forecast Server (for SAS High-Performance Forecasting)
Required hardware
- Teradata database appliance along with Teradata 13.10 (or)
- EMC Greenplum database appliance along with EMC Greenplum 4.2 (or)
- Commodity chassis hardware provided by IBM, HP and Dell running either Apache Hadoop 0.23 release or Cloudera CDH4
* NOTE: Windows Vista supported editions are: Enterprise, Ultimate and Business.
** NOTE: Windows 7 supported editions are: Enterprise, Ultimate and Professional.
Ready to learn more?
Call us at 1-800-727-0025 (US and Canada) or request more information.




