This page exists on your local site.

Go there now
Stay here
X

SAS Visual Machine Learning Features List

Interactive programming in a web-based development environment

  • Visual interface for the entire analytical life cycle process.
  • Drag-and-drop interactive interface requires no coding, though coding is an option.
  • Supports automated code creation at each node in the pipeline.
  • Choose best practice templates (basic, intermediate or advanced) to get started quickly with machine learning tasks or take advantage of our automated modeling process.
  • Interpretability reports such as PD, LIME, ICE and Kernel SHAP.
  • Share modeling insights via a PDF report.
  • Explore data from within Model Studio and launch directly into SAS Visual Analytics.
  • Edit models imported from SAS Visual Analytics in Model Studio.
  • View data within each node in Model Studio.
  • Run SAS® Enterprise Miner 14.3 batch code within Model Studio.
  • Provides a collaborative environment for easy sharing of data, code snippets, annotations and best practices among different personas.
  • Create, manage and share content and administer content permissions via SAS Drive.
  • The SAS lineage viewer visually displays the relationships between models, data and decisions.

Intelligent automation with human oversight

  • Public API to automate many of the manual, complex modeling steps to build machine learning models – from data wrangling to feature engineering to algorithm selection to deployment.
  • Automatic Feature Engineering node for automatically cleansing, transforming and selecting features for models.
  • Automatic Modeling node for automatically selecting the best model using a set of optimization and auto-tuning routines across multiple techniques.
  • Interactively adjust the pruning and splitting of decision tree nodes.
  • Automated data prep suggestions from meta learning.
  • Automated pipeline generation with complete customization capability.

Natural language generation

  • View results in simple language to facilitate understanding of reports, including model assessment and interpretability

Embedded support for Python & R languages

  • Embed open source code within an analysis and call open source algorithms within Model Studio.
  • The Open Source Code node in Model Studio is agnostic to Python or R versions.
  • Manage Python models in a common repository within Model Studio.

Deep learning with Python (DLPy)

  • Build deep learning models for image, text, audio and time-series data using Jupyter Notebook.
  • High-level APIs are available on GitHub for:
    • Deep neural networks for tabular data.
    • Image classification and regression.
    • Object detection.
    • RNN-based tasks – text classification, text generation and sequence labeling.
    • RNN-based time-series processing and modeling.
  • Support for predefined network architectures, such as LeNet, VGG, ResNet, DenseNet, Darknet, Inception, ShuffleNet, MobileNet, YOLO, Tiny YOLO, Faster R-CNN and U-Net.
  • Import and export deep learning models in the ONNX format.
  • Use ONNX models to score new data sets in a variety of environments by taking advantage of Analytic Store (ASTORE).

SAS procedures (PROCs) & CAS actions

  • A programming interface (SAS Studio) allows IT or developers to access a CAS server, load and save data directly from a CAS server, and support local and remote processing on a CAS server.
  • Python, Java, R, Lua and Scala programmers or IT staff can access data and perform basic data manipulation against a CAS server, or execute CAS actions using PROC CAS.
  • CAS actions support for interpretability, feature engineering and modeling.
  • Integrate and add the power of SAS to other applications using REST APIs.

Highly scalable, distributed in-memory analytical processing

  • Distributed, in-memory processing of complex analytical calculations on large data sets provides low-latency answers.
  • Analytical tasks are chained together as a single, in-memory job without having to reload the data or write out intermediate results to disks.
  • Concurrent access to the same data in memory by many users improves efficiency.
  • Data and intermediate results are held in memory as long as required, reducing latency.
  • Built-in workload management ensures efficient use of compute resources.
  • Built-in failover management guarantees submitted jobs always finish.
  • Automated I/O disk spillover for improved memory management.

Model development with modern machine learning algorithms

  • Reinforcement learning:
    • Techniques include Fitted Q-Network (FQN) and Deep Q-Network (DQN).
    • FQN can train a model over precollected data points without the need to communicate with the environment.
    • Uses replay memory and target network techniques to decorrelate the non-IID data points and stabilize the training process.
    • Ability to specify a custom environment for state-action pairs and rewards.
  • Decision forests:
    • Automated ensemble of decision trees to predict a single target.
    • Automated distribution of independent training runs.
    • Supports intelligent auto-tuning of model parameters.
    • Automated generation of SAS code for production scoring.
  • Gradient boosting:
    • Automated iterative search for optimal partition of the data in relation to selected label variable.
    • Automated resampling of input data several times with adjusted weights based on residuals.
    • Automated generation of weighted average for final supervised model.
    • Supports binary, nominal and interval labels.
    • Ability to customize tree training with variety of options for numbers of trees to grow, splitting criteria to apply, depth of subtrees and compute resources.
    • Automated stopping criteria based on validation data scoring to avoid overfitting.
    • Automated generation of SAS code for production scoring.
    • Access LightGBM, a popular open source modeling package.
  • Neural networks:
    • Automated intelligent tuning of parameter set to identify optimal model.
    • Supports modeling of count data.
    • Intelligent defaults for most neural network parameters.
    • Ability to customize neural networks architecture and weights.
    • Techniques include deep neural network (DNN), convolutional neural networks (CNNs), recurrent neural networks (RNNs) and autoencoders.
    • Ability to use an arbitrary number of hidden layers to support deep learning.
    • Support for different types of layers, such as convolution and pooling.
    • Automatic standardization of input and target variables.
    • Automatic selection and use of a validation data subset.
    • Automatic out-of-bag validation for early stopping to avoid overfitting.
    • Supports intelligent autotuning of model parameters.
    • Automated generation of SAS code for production scoring.
  • Support vector machines:
    • Models binary target labels.
    • Supports linear and polynomial kernels for model training.
    • Ability to include continuous and categorical in/out features.
    • Automated scaling of input features.
    • Ability to apply the interior-point method and the active-set method.
    • Supports data partition for model validation.
    • Supports cross-validation for penalty selection.
    • Automated generation of SAS code for production scoring.
  • Factorization machines:
    • Supports the development of recommender systems based on sparse matrices of user IDs and item ratings.
    • Ability to apply full pairwise-interaction tensor factorization.
    • Includes additional categorical and numerical input features for more accurate models.
    • Supercharge models with time stamps, demographic data and context information.
    • Supports warm restart (update models with new transactions without full retraining).
    • Automated generation of SAS score code for production scoring.
  • Bayesian networks:
    • Learns different Bayesian network structures, including naive, tree-augmented naive (TAN), Bayesian network-augmented naive (BAN), parent-child Bayesian networks and Markov blanket.
    • Performs efficient variable selection through independence tests.
    • Selects the best model automatically from specified parameters.
    • Generates SAS code or an analytics store to score data.
    • Loads data from multiple nodes and performs computations in parallel.
  • Dirichlet Gaussian mixture models (GMM):
    • Can execute clustering in parallel and is highly multithreaded.
    • Performs soft clustering, which provides not only the predicted cluster score but also the probability distribution over the clusters for each observation.
    • Learns the best number of clusters during the clustering process, which is supported by the Dirichlet process.
    • Uses a parallel variational Bayes (VB) method as the model inference method. This method approximates the (intractable) posterior distribution and then iteratively updates the model parameters until it reaches convergence.
  • Semisupervised learning algorithm:
    • Highly distributed and multithreaded.
    • Returns the predicted labels for both the unlabeled data table and the labeled data table.
  • T-distributed stochastic neighbor embedding (t-SNE):
    • Highly distributed and multithreaded.
    • Returns low-dimensional embeddings that are based on a parallel implementation of the t-SNE algorithm.
  • Generative adversarial networks (GANs)
    • Techniques include StyleGANs for image data and GANs for tabular data.
    • Generate synthetic data for deep learning models.

Analytical data preparation

  • Feature engineering best practice pipeline includes best transformations.
  • Distributed data management routines provided via a visual front end.
  • Large-scale data exploration and summarization.
  • Cardinality profiling:
    • Large-scale data profiling of input data sources.
    • Intelligent recommendation for variable measurement and role.
  • Sampling:
    • Supports random and stratified sampling, oversampling for rare events and indicator variables for sampled records.

Data exploration, feature engineering & dimension reduction

  • T-distributed stochastic neighbor embedding (t-SNE).
  • Feature binning.
  • High-performance imputation of missing values in features with user-specified values, mean, pseudo median and random value of nonmissing values.
  • Feature dimension reduction.
  • Large-scale principal components analysis (PCA), including moving windows and robust PCA.
  • Unsupervised learning with cluster analysis and mixed variable clustering.
  • Segment profiles for clustering.

Integrated text analytics

  • Supports 33 native languages out of the box:
    • English
    • Arabic
    • Chinese
    • Croatian
    • Czech
    • Danish
    • Dutch
    • Farsi
    • Finnish
    • French
    • German
    • Greek
    • Hebrew
    • Hindi
    • Hungarian
    • Indonesian
    • Italian
    • Japanese
    • Kazakh
    • Korean
    • Norwegian
    • Polish
    • Portuguese
    • Romanian
    • Russian
    • Slovak
    • Slovenian
    • Spanish
    • Swedish
    • Tagalog
    • Turkish
    • Thai
    • Vietnamese
  • Stop lists are automatically included and applied for all languages.
  • Automated parsing, tokenization, part-of-speech tagging and lemmatization.
  • Predefined concepts extract common entities such as names, dates, currency values, measurements, people, places and more.
  • Automated feature extraction with machine-generated topics (singular value decomposition and latent Dirichlet allocation).
  • Supports machine learning and rules-based approaches within a single project.
  • Automatic rule generation with the BoolRule.
  • Classify documents more accurately with deep learning (recurrent neural networks).

Model assessment

  • Automatically calculates supervised learning model performance statistics.
  • Produces output statistics for interval and categorical targets.
  • Creates lift table for interval and categorical target.
  • Creates ROC table for categorical target.
  • Creates Event Classification and Nominal Classification charts for supervised learning models with a class target.

Model scoring

  • Automatically generates SAS DATA step code for model scoring.
  • Applies scoring logic to training, holdout data and new data.

SAS® Viya® in-memory engine

  • CAS (SAS Cloud Analytic Services) performs processing in memory and distributes processing across nodes in a cluster.
  • User requests (expressed in a procedural language) are translated into actions with the parameters needed to process in a distributed environment. The result set and messages are passed back to the procedure for further action by the user.
  • Data is managed in blocks and can be loaded in memory and on demand.
  • If tables exceed memory capacity, the server caches the blocks on disk. Data and intermediate results are held in memory as long as required, across jobs and user boundaries.
  • Includes highly efficient node-to-node communication. An algorithm determines the optimal number of nodes for a given job.
  • Communication layer supports fault tolerance and lets you remove or add nodes from a server while it is running. All components can be replicated for high availability.
  • Support for legacy SAS code and direct interoperability with SAS 9.4M6 clients.
  • Supports multitenancy deployment, allowing for a shared software stack to support isolated tenants in a secure manner.

SAS Viya Copilot

  • Code assistance:
    • Generate SAS code based on user input, ensuring accuracy and consistency.
    • Explain SAS code with clear, understandable explanations of existing SAS code. This makes complex scripts easier to follow and enables more effective code maintenance transfer.
    • Generate meaningful comments within the code, improving readability and documentation, which is particularly valuable for maintaining legacy code written by others.
  • AI-powered model pipeline development:
    • Clearly explain model outputs at each step in the pipeline, empowering users to make better-informed decisions.
    • Suggests and adds nodes to the pipeline based on the existing state of the pipeline and data.
    • Answers questions, accelerating model development tasks by facilitating the consumption of user documentation, providing an effortless way to navigate and understand technical details.

Synthetic data generation

  • Multitable relational data generation means users can handle complex data models, such as generating a synthetic financial data set in which customer, account and transaction tables remain consistent with each other for realistic end-to-end data simulations.
  • Time series data generation enables synthetic sequential data (like sensor readings over time, stock prices or patient vitals trends).
  • A low-/no-code interface provides a friendly code-free experience for synthetic data generation.
  • Privacy-preserving governance and evaluation tools mean teams can innovate using realistic data with zero privacy risk to actual individuals. SAS Data Maker has built-in governance and auditing features to ensure confidence in the synthetic data.
  • Process transparency and control mean complete visibility and control over the synthetic data generation process – from how data is profiled and modeled to how synthetic data is generated, validated and deployed.
  • Accelerated data for innovation gives users the freedom to flexibly develop and test models with a dramatically shortened provisioning cycle.
  • Enterprise scalability and performance mean you can scale to generate millions of records across multiple tables efficiently.
  • Integration with SAS Viya means synthetic data generation can be part of the same platform where data is prepared, models are built and decisions are made.  

AutoML

 

  • Repository of best practice pipelines to easily set up models.
  • Prebuilt models by SAS or by users can help you quickly stand up model pipelines.
  • Dynamically profiles data.
  • Fixes data quality issues with machine learning automatically.
  • Performs data transformations automatically.
  • Recommends and builds models best suited to fit your projects.
  • Optimizes performance and speed across models.
  • Fully editable and transparent – no black-box.