SAS® Unified Insights MM Features

Market-leading data mining & machine learning

  • Provides GUI-based data mining and machine learning via a single, collaborative and highly scalable environment.
  • Provides open source integration with R, Python, Java and Lua models.
  • Lets you use model competition to identify and deploy the most effective model.

View more market-leading data mining & machine learning features

Interactive programming in a web-based development environment

  • Visual interface for the entire analytical life cycle process.
  • Drag-and-drop interactive interface requires no coding, though coding is an option.
  • Supports automated code creation at each node in the pipeline.
  • Choose best practice templates (basic, intermediate or advanced) to get started quickly with machine learning tasks or take advantage of our automated modeling process.
  • Interpretability reports such as PD, LIME, ICE, and Kernel SHAP.
  • Explore data from within Model Studio and launch directly into SAS Visual Analytics.
  • Edit models imported from SAS Visual Analytics in Model Studio.
  • View data within each node in Model Studio.
  • Run SAS® Enterprise Miner 14.3 batch code within Model Studio.
  • Provides a collaborative environment for easy sharing of data, code snippets, annotations and best practices among different personas.
  • Create, manage and share content and administer content permissions via SAS Drive.
  • The SAS lineage viewer visually displays the relationships between decisions, models, data and decisions.

Intelligent automation

  • Public API to automate many of the manual,complex modeling steps to build machine learning models – from data wrangling, to feature engineering, to algorithm selection, to deployment.
  • Automatic Feature Engineering node for automatically cleansing, transforming, and selecting features for models.
  • Automatic Modeling node for automatically selecting the best model using a set of optimization and autotuning routines across multiple techniques.

Natural language generation

  • View results in simple language to facilitate understanding of reports, including model assessment and interpretability.

Embedded support for Python and R languages 

  • Embed open source code within an analysis, and call open source algorithms within Model Studio.
  • The Open Source Code node in Model Studio is agnostic to Python or R versions.

Deep learning with Python (DLPy)

  • Build deep learning models for image, text, audio and time-series data using Jupyter Notebook.
  • High level APIs are available on GitHub for:
    • Deep neural networks for tabular data.
    • Image classification and regression.
    • Object detection.
    • RNN-based tasks – text classification, text generation and sequence labeling.
    • RNN-based time-series processing and modeling.
  • Support for predefined network architectures, such as LeNet, VGG, ResNet, DenseNet, Darknet, Inception, ShuffleNet, MobileNet, YOLO, Tiny YOLO, Faster R-CNN and U-Net.
  • Import and export deep learning models in the ONNX format.

SAS® procedures (PROCs) and CAS actions

  • A programming interface (SAS Studio) allows IT or developers to access a CAS server, load and save data directly from a CAS server, and support local and remote processing on a CAS server.
  • Python, Java, R, Lua and Scala programmers or IT staff can access data and perform basic data manipulation against a CAS server, or execute CAS actions using PROC CAS.
  • CAS actions support for interpretability, feature engineering and modeling.
  • Integrate and add the power of SAS to other applications using REST APIs.

Highly scalable, distributed in-memory analytical processing

  • Distributed, in-memory processing of complex analytical calculations on large data sets provides low-latency answers.
  • Analytical tasks are chained together as a single, in-memory job without having to reload the data or write out intermediate results to disks.
  • Concurrent access to the same data in memory by many users improves efficiency.
  • Data and intermediate results are held in memory as long as required, reducing latency.
  • Built-in workload management ensures efficient use of compute resources.
  • Built-in failover management guarantees submitted jobs always finish.
  • Automated I/O disk spillover for improved memory management.

Model development with modern machine learning algorithms

  • Decision forests:
    • Automated ensemble of decision trees to predict a single target.
    • Automated distribution of independent training runs.
    • Supports intelligent autotuning of model parameters.
    • Automated generation of SAS code for production scoring. 
  • Gradient boosting:
    • Automated iterative search for optimal partition of the data in relation to selected label variable.
    • Automated resampling of input data several times with adjusted weights based on residuals.
    • Automated generation of weighted average for final supervised model.
    • Supports binary, nominal and interval labels.
    • Ability to customize tree training with variety of options for numbers of trees to grow, splitting criteria to apply, depth of subtrees and compute resources. 
    • Automated stopping criteria based on validation data scoring to avoid overfitting.
    • Automated generation of SAS code for production scoring.
  • Neural networks:
    • Automated intelligent tuning of parameter set to identify optimal model.
    • Supports modeling of count data.
    • Intelligent defaults for most neural network parameters.
    • Ability to customize neural networks architecture and weights.
    • Techniques include deep forward neural network (DNN), convolutional neural networks (CNNs), recurrent neural networks (RNNs) and autoencoders.
    • Ability to use an arbitrary number of hidden layers to support deep learning.
    • Support for different types of layers, such as convolution and pooling.
    • Automatic standardization of input and target variables.
    • Automatic selection and use of a validation data subset.
    • Automatic out-of-bag validation for early stopping to avoid overfitting.
    • Supports intelligent autotuning of model parameters.
    • Automated generation of SAS code for production scoring.
  • Support vector machines:
    • Models binary target labels.
    • Supports linear and polynomial kernels for model training.
    • Ability to include continuous and categorical in/out features.
    • Automated scaling of input features.
    • Ability to apply the interior-point method and the active-set method.
    • Supports data partition for model validation.
    • Supports cross-validation for penalty selection.
    • Automated generation of SAS code for production scoring.
  • Factorization machines:
    • Supports the development of recommender systems based on sparse matrices of user IDs and item ratings.
    • Ability to apply full pairwise-interaction tensor factorization.
    • Includes additional categorical and numerical input features for more accurate models.
    • Supercharge models with timestamps, demographic data and context information.
    • Supports warm restart (update models with new transactions without full retraining).
    • Automated generation of SAS score code for production scoring.
  • Bayesian networks:
    • Learns different Bayesian network structures, including naive, tree-augmented naive (TAN), Bayesian network-augmented naive (BAN), parent-child Bayesian networks and Markov blanket.
    • Performs efficient variable selection through independence tests.
    • Selects the best model automatically from specified parameters.
    • Generates SAS code or an analytics store to score data.
    • Loads data from multiple nodes and performs computations in parallel.
  • Dirichlet Gaussian mixture models (GMM):
    • Can execute clustering in parallel and is highly multithreaded.
    • Performs soft clustering, which provides not only the predicted cluster score but also the probability distribution over the clusters for each observation.
    • Learns the best number of clusters during the clustering process, which is supported by the Dirichlet process.
    • Uses a parallel variational Bayes (VB) method as the model inference method. This method approximates the (intractable) posterior distribution and then iteratively updates the model parameters until it reaches convergence.
  • Semisupervised learning algorithm:
    • Highly distributed and multithreaded.
    • Returns the predicted labels for both the unlabeled data table and the labeled data table.
  • T-distributed stochastic neighbor embedding (t-SNE):
    • Highly distributed and multithreaded.
    • Returns low-dimensional embeddings that are based on a parallel implementation of the t-SNE algorithm.

Analytical data preparation

  • Feature engineering best practice pipeline includes best transformations.
  • Distributed data management routines provided via a visual front end.
  • Large-scale data exploration and summarization.
  • Cardinality profiling:
    • Large-scale data profiling of input data sources.
    • Intelligent recommendation for variable measurement and role.
  • Sampling: 
    • Supports random and stratified sampling, oversampling for rare events and indicator variables for sampled records.

Data exploration, feature engineering and dimension reduction

  • T-distributed stochastic neighbor embedding (t-SNE).
  • Feature binning.
  • High-performance imputation of missing values in features with user-specified values, mean, pseudo median and random value of nonmissing values.
  • Feature dimension reduction.
  • Large-scale principal components analysis (PCA), including moving windows and robust PCA.
  • Unsupervised learning with cluster analysis and mixed variable clustering.
  • Segment profiles for clustering.

Integrated text analytics

  • Supports 33 native languages out of the box:
    • English
    • Arabic
    • Chinese
    • Croatian
    • Czech
    • Danish
    • Dutch
    • Farsi
    • Finnish
    • French
    • German
    • Greek
    • Hebrew
    • Hindi
    • Hungarian
    • Indonesian
    • Italian
    • Japanese
    • Kazakh
    • Korean
    • Norwegian
    • Polish
    • Portuguese
    • Romanian
    • Russian
    • Slovak
    • Slovenian
    • Spanish
    • Swedish
    • Tagalog
    • Turkish
    • Thai
    • Vietnamese
  • Stop lists are automatically included and applied for all languages.
  • Automated parsing, tokenization, part-of-speech tagging and lemmatization.
  • Predefined concepts extract common entities such as names, dates, currency values, measurements, people, places and more.
  • Automated feature extraction with machine-generated topics (singular value decomposition and latent Dirichlet allocation).
  • Supports machine learning and rules-based approaches within a single project.
  • Automatic rule generation with the BoolRule.
  • Classify documents more accurately with deep learning (recurrent neural networks).

Model assessment

  • Automatically calculates supervised learning model performance statistics.
  • Produces output statistics for interval and categorical targets.
  • Creates lift table for interval and categorical target.
  • Creates ROC table for categorical target.
  • Creates Event Classification and Nominal Classification charts for supervised learning models with a class target.

Model scoring

  • Automatically generates SAS DATA step code for model scoring.
  • Applies scoring logic to training, holdout data and new data.

SAS® Viya® in-memory engine

  • CAS (SAS Cloud Analytic Services) performs processing in memory and distributes processing across nodes in a cluster.
  • User requests (expressed in a procedural language) are translated into actions with the parameters needed to process in a distributed environment. The result set and messages are passed back to the procedure for further action by the user.
  • Data is managed in blocks and can be loaded in memory and on demand.
  • If tables exceed memory capacity, the server caches the blocks on disk. Data and intermediate results are held in memory as long as required, across jobs and user boundaries.
  • Includes highly efficient node-to-node communication. An algorithm determines the optimal number of nodes for a given job.
  • Communication layer supports fault tolerance and lets you remove or add nodes from a server while it is running. All components can be replicated for high availability.
  • Support for legacy SAS code and direct interoperability with SAS 9.4M6 clients.
  • Supports multitenancy deployment, allowing for a shared software stack to support isolated tenants in a secure manner.

Deployment options

  • On-site deployments:
    • Single-machine server to support the needs of small to midsize organizations.
    • Distributed server to meet growing data, increasing workloads and scalability requirements.
  • Cloud deployments:
    • Enterprise hosting.
    • Private or public cloud (e.g., BYOL in Amazon) infrastructure.
    • SAS managed software as a service (SaaS).
    • Cloud Foundry platform as a service (PaaS) to support multiple cloud providers.

Streamlined model deployment

  • Streamlines the process of creating, managing, administering, deploying and monitoring your analytical models.
  • Provides a framework for model registration, validation, monitoring and retraining.
  • Enables you to assess candidate models to identify and publish the champion model.
  • Ensures complete auditability and regulatory compliance.

View more streamlined model deployment features

Model registration

  • Provides secure, reliable, versioned storage for all types of models, as well as access administration, including backup and restore capabilities, overwrite protection and event logging.
  • Once registered, models can be searched, queried, sorted and filtered by attributes used to store them – type of asset, algorithm, input or target variables, model ID, etc – as well as user-defined propertied and editable keywords.
  • Add general properties as columns to the listing for models and projects, such as model name, role, type of algorithm, date modified, modified by, repository location, description, version and keywords (tags).
  • Access models and model-score artifacts using open REST APIs.
  • Directly supports Python models for scoring and publishing. Convert PMML and ONNX (using dlPy) to standard SAS model types. Manage and version R code like other types of code.
  • Provides accounting and auditability, including event logging of major actions – e.g., model creation, project creation and publishing.
  • Export models as .ZIP format, including all model file contents for movement across environments.
  • Easily copy models from one project to another, simplifying model movement within the repository. 

Analytical workflow management

  • Create custom processes for each model using SAS Workflow Studio:
    • The workflow manager is fully integrated with SAS Model Manager so you can manage workflows and track workflow tasks within the same user interface.
    • Import, update and export generic models at the folder level – and duplicate or move to another folder.
  • Facilitates collaboration across teams with automated notifications.
  • Perform common model management tasks, such as importing, viewing and attaching supporting documentation; setting a project champion model and flagging challenger models; publishing models for scoring purposes; and viewing dashboard reports.

Model scoring

  • Place a combination of Python, SAS or other open source models in the same project for users to compare and assess using different model fit statistics.
  • Set up, maintain and manage separate versions for models:
    • The champion model is automatically defined as a new version when the model is set as champion, updated or published in a project.
    • Choose challenger models to the project champion model.
    • Monitor and publish challenger and champion models.
  • Define test and production score jobs for SAS and Python models using required inputs and outputs.
  • Create and execute scoring tasks, and specify where to save the output and job history.

Model deployment

  • Depending on the use case, you can publish models to batch/operational systems – e.g., SAS server, in-database, in-Hadoop/Spark, SAS Cloud Analytic Services (CAS) Server, or to on-demand systems using Micro Analytic Score (MAS) service.
  • Publish Python and SAS models to run time containers with embedded binaries and score code files. Promote run time containers to local Docker, AWS Docker and Amazon EKS (elastic kubernetes service) environments.

Model monitoring

  • Monitor the performance of models with any type of score code. Performance reports produced for champion and challenger R, Python and SAS models include variable distribution plots, lift charts, stability charts, ROC, K-S and Gini reports with SAS Visual Analytics using performance-reporting output result sets.
  • Built-in reports display the measures for input and output data and fit statistics for classification and regression models to evaluate whether to retrain, retire or create new models. Performance reports for champion and challenger analytical models involving Python, SAS, R, etc., with different accuracy statistics are available.
  • Monitor performance of champion models for all projects using performance report definition and execution.
  • Schedule recurring and future jobs for performance monitoring.
  • Specify multiple data sources and time-collection periods when defining performance-monitoring tasks.

Self-service data preparation

  • Provides an interactive, self-service environment for data access, blending, shaping and cleansing to prepare data for analytics and reporting.
  • Fully integrates with your analytics pipeline.
  • Includes data lineage and automation.

View more self-service data preparation features

Data & metadata access

  • Use any authorized internal source, accessible external data sources and data held in-memory in SAS Viya.
    • View a sample of a table or file loaded in the in-memory engine of SAS Viya, or from data sources registered with SAS/ACCESS, to visualize the data you want to work with.
    • Quickly create connections to and between external data sources.
    • Access physical metadata information like column names, data types, encoding, column count and row count to gain further insight into the data.
  • Data sources and types include:
    • Amazon S3.
    • Amazon Redshift.
    • DNFS, HDFS, PATH-based files (CSV, SAS, Excel, delimited).
    • DB2.
    • Hive.
    • Impala.
    • SAS® LASR.
    • ODBC.
    • Oracle.
    • Postgres.
    • Teradata.
    • Feeds from Twitter, YouTube, Facebook, Google Analytics, Google Drive, Esri and local files.
    • SAS® Cloud Analytic Services (CAS).

Data provisioning 

  • Parallel load data from desired data sources into memory simply by selecting them – no need to write code or have experience with an ETL tool. (Data cannot be sent back to the following data sources: Twitter, YouTube, Facebook, Google Analytics, Esri; it can only be sourced form these sites).
    • Reduce the amount of data being copied by performing row filtering or column filtering before the data is provisioned.
    • Retain big data in situ, and push processing to the source system by including SAS In-Database optional add-ons.

    Guided, interactive data preparation

    • Transform, blend, shape, cleanse and standardize data in an interactive, visual environment that guides you through data preparation processes.
    • Easily understand how a transformation affected results, getting visual feedback in near-real-time through the distributed, in-memory processing of SAS Viya.

    Machine learning & AI suggestions

    • Take advantage of AI and machine learning to scan data and make intelligent transformation suggestions.
    • Accept suggestions and complete transformations at the click of a button. No advanced or complex coding required.
    • Automated suggestions include:
      • Casing.
      • Gender analysis.
      • Match code.
      • Parse.
      • Standardization.
      • Missing value imputation for numeric variables.
      • One hot encoding.
      • Remove column.
      • Whitespace trimming.
      • Convert column data type.
      • Center and scale.
      • Dedupe.
      • Unique ID creation.
      • Column removal for sparse data.

    Column-based transformations

    • Use column-based transformations to standardize, remediate and shape data without doing configurations. You can:
      • Change case.
      • Convert column.
      • Rename.
      • Remove.
      • Split.
      • Trim whitespace.
      • Custom calculation.
    • Support for wide tables allows for the saving of data plans for quick data preparation jobs.

    Row-based transformations

    • Use row-based transformations to filter and shape data.
    • Create analytical-based tables using the transpose transformation to prepare the data for analytics and reporting tasks.
    • Create simple or complex filters to remove unnecessary data.

    Code-based transformations

    • Write custom code to transform, shape, blend, remediate and standardize data.
    • Write simple expressions to create calculated columns, write advanced code or reuse code snippets for greater transformational flexibility.
    • Import custom code defined by others, sharing best practices and collaborative productivity.

    Multiple-input-based transformations

    • Use multiple-input-based transformations to blend and shape data.
    • Blend or shape one or more sets of data together using the guided interface – there’s no requirement to know SQL or SAS. You can:
      • Append data.
      • Join data.
      • Transpose data.

    Data profiling

    • Profile data to generate column-based and table-based basic and advanced profile metrics.
    • Use the table-level profile metrics to uncover data quality issues and get further insight into the data itself.
    • Drill into each column for column-level profile metrics and to see visual graphs of pattern distribution and frequency distribution results that help uncover hidden insights.
    • Use a variety of data types/sources (listed previously). To profile data from Twitter, Facebook, Google Analytics or YouTube, you must first explicitly import the data into the SAS Viya in-memory environment.

    Data quality processing

    (SAS® Data Quality in SAS® Viya® is included in SAS Data Preparation)

    Data cleansing

    • Use locale- and context-specific parsing and field extraction definitions to reshape data and uncover additional insights.
    • Use the extraction transformation to identify and extract contact information (e.g., name, gender, field, pattern, identify, email and phone number) in a specified column.
    • Use parsing when data in a specified column needs to be tokenized into substrings (e.g., a full name tokenized into prefix, given name, middle name and family name).
    • Derive unique identifiers from match codes that link disparate data sources.
    • Standardize data with locale- and context-specific definitions to transform data into a common format, like casing.

    Identity definition

    • Analyze column data using locale-specific rules to determine gender or context.
    • Use identification analysis to analyze the data and determine its context, which is particularly valuable if the data or source of data is unfamiliar.
    • Use gender analysis to determine the gender of a name using locale-specific rules so the data can be easily filtered or segmented.
    • Create a unique ID for each row with unique ID generator.
    • Identify the subject data in each column with identification analysis.
    • Identify, find and sort data by tagging data with columns and tables.

    Data matching

    • Determine matching records based upon locale- and context-specific definitions.
    • Easily identify matching records using more than 25 context-specific rules such as date, address, name, email, etc.
    • Use the results of the match code transformation to remove duplicates, perform a fuzzy search or a fuzzy join.
    • Find like records and logically group together.

    System & job monitoring

    • Use integrated monitoring capabilities for system- and job-level processes.
    • Gain insight into how many processes are running, how long they’re taking and who is running them.
    • Easily filter through all system jobs based on job status (running, successful, failed, pending and cancelled).
    • Access job error logs to help with root-cause analysis and troubleshooting. (Note: Monitoring is available using SAS Environment Manager and the job monitor application.)

    Data import & data preparation job scheduling

    • Create a data import job from automatically generated code to perform a data refresh using the integrated scheduler.
    • Schedule data explorer imports as jobs so they will become an automatic, repeatable process.
    • Specify a time, date, frequency and/or interval for the jobs.

    Data lineage

    • Explore relationships between accessible data sources, data objects and jobs.
    • Use the relationship graph to visually show the relationships that exist between objects, making it easier to understand the origin of data and trace its processing.
    • Create multiple views with different tabs, and save the organization of those views.

    Plan templates & project collaboration

    • Use data preparation plans (templates), which consist of a set of transformation rules that get applied to one or more sources of data, to improve productivity (spend less time preparing data).
    • Reuse the templates by applying them to different sets of data to ensure that data is transformed consistently to adhere to enterprise data standards and policies.
    • Rely on team-based collaboration through a project hub used with SAS Viya projects. The project’s activity feed shows who did what and when, and can be used to communicate with other team members.

    Batch text analysis

    • Quickly extract contents of documents, and perform text identification and extraction.

    Cloud data exchange

    • Securely copy data from on-site repositories to a cloud-based SAS Viya instance running in a private or public cloud for use in SAS Viya applications – as well as sending data back to on-site locations.
    • Preprocess data locally, which reduces the amount of data that needs to be moved to remote locations.
    • Use a Command Line Input (CLI) interface for administration and control.
    • Securely and responsibly negotiates your on-site firewall.  

    Visual data exploration & insights development

    • Provides bi-modal support for both governed and self-service exploration and visualization.
    • Enables self-service discovery, reporting and analysis.
    • Provides access to easy-to-use predictive analytics with “smart algorithms.”
    • Enables report sharing via email, web browser, MS Office or mobile devices.
    • Provides centralized, web-based administration, monitoring and governance of platform.

    View more visual data exploration & insights development features

    數據

    • 從各種來源匯入資料:資料庫、Hadoop、Excel 試算表、社群媒體等。
    • 將 Excel 檔案、CSV 或 SAS 資料集拖到工作區中,然後快速開始建立報告或儀表板。
    • 使用標準的資料品質功能,例如變更案例;轉換、重新命名、移除和拆分欄位;並使用自訂代碼建立計算欄位和轉換。
    • 使用附加、聯結、篩選和轉置功能準備資料。
    • 重複使用、排程和監控工作。
    • 使用網絡圖檢視歷程。
    • 快速檢視有關測量值的描述性統計資訊,這樣有助於您瞭解資料的特征。
    • 建立經計算的、彙總的或衍生的資料項。
    • 以自助方式建立可鑽取的階層架構,而無需預先定義使用者路徑。

    瞭解

    • 互動式資料勘查可讓業務使用者和分析師能夠輕鬆識別關係、趨勢、極端值等。
    • 精確的回應式版面為您提供了靈活的佈局和設計選項。您可堆疊或分組項目,還可執行更多功能。
    • 其中包含各種圖形對象或圖表:
      • 橫條圖。
      • 圓形圖。
      • 環圈圖。
      • 折線圖。
      • 散佈圖。
      • 熱點圖。
      • 泡泡圖。
      • 動畫泡泡圖。
      • 樹狀結構圖。
      • 點陣圖。
      • 針狀圖。
      • 數字序列。
      • 排程圖。
      • 向量圖。
      • 關鍵值資訊圖。
      • 還有更多具有靈活圖形搭建功能的圖表。
    • 將來自網路的內容(例如 YouTube 視訊、網路應用)和圖片(例如徽標)新增到您的報告中。
    • 藉助自訂順序功能,您可以按特徵(例如產品、客戶)列表或圖形中的訂單類別資料項目進行排序。對您組織最重要的順序將會優先顯示。
    • 一鍵篩選(例如,單向、雙向)和連結選擇可讓您花費更少的時間手動連結內容(例如,視覺化內容、報告)。
    • 文字對象包括日期驅動或系統生成的相關語意文字。
    • 同步選擇跨報表或儀表板中的視覺化內容和篩選器。
    • 連結不同的報告(例如,將銷售報告連結到庫存報告)。
    • 回報消費者可以使用控制項、篩選程式等變更計算參數和顯示規則,以檢視與其最相關的資訊。
    • 回報消費者可以即時切換測量值,變更圖表類型和格式,從而能夠立即做出關鍵的業務決策。
    • 設定單個物件、頁面或整個報告的刷新率。
    • 分析視覺化內容包含:
      • 盒狀圖。
      • 熱點圖。
      • 動畫泡泡圖。
      • 網路圖。
      • 相關矩陣。
      • 預測圖。
      • 平行坐標圖。
      • 決策樹。
      • 還有更多具有靈活圖形搭建功能的圖表。
    • 地理地圖視圖可讓您快速瞭解地理空間資料,包含旅行時間和旅行距離,以及透過 Esri 整合豐富的人口統計資料。
    • 網絡圖可讓您顯示整個地圖上的網絡。
    • 將您的自訂互動式視覺化內容(例如,D3.js 圖形、C3 視覺化內容或 Google 圖表)帶入 SAS 視覺分析工具,這樣它們都會由相同的資料驅動。
    • 藉助關鍵值視覺化內容,您可採用圖表樣式顯示重要的指標(數字或類別值),以利快速實行的參考。
    • 執行路徑分析(Sankey 圖)以視覺化顯示不同活動序列之間的關係。
    • 將橫條圖和熱點圖等儲存格視覺化內容新增到表中,以快速識別問題點並檢視資料趨勢。
    • 透過包含預測置信度區間即時生成預測資料。
    • 針對資料運行多個模型後,系統將自動選擇最合適的預測模型。
    • 藉助案例分析,您可以檢視不同變數的變化將如何影響預測資料。
    • 藉助目標搜索,您可以為預測指定目標值,然後確定實現目標值所需的基礎因素的值。
    • 決策樹以圖形方式描繪了可能的結果。
    • 自訂量化可將連續的資料移入少量組中,以更好地解釋和顯示結果。
    • 藉助文字分析功能,您可自動查找主題並瞭解來自各種文字來源的情感內涵,這些來源包括 Facebook、、Twitter、Google Analytics、YouTube 評論等。
    • 當工作階段意外結束時,可回復您正在編輯的報告。編輯完成後,報告每五秒鐘自動儲存一次。
    • 從所有裝置上之前工作階段中離開的位置繼續。

    增強式分析

    • 自動圖表繪製功能會自動選擇最適合顯示所選資料的圖形。
    • 自動解釋功能會確定哪些變數有助於得到結果,並提供易於理解的簡單自然語言解釋。
    • 使用「自動解釋」快速偵測並突出顯示資料中的模式和極端值。
    • 自動解釋功能會確定資料中頂部和頂部案例之間的關鍵區別。例如,哪些資料最能夠區分最低風險案例和最高風險案例?
    • 以下展示了自動解釋資料所採取的步驟,提供更清晰的示範。
    • 使用「自動解釋」可根據您選擇的變因,定義說明感興趣的資料群組。
    • 根據您的所有資料自動建立一個互動式分析故事並可供發佈。
    • 從資料中自動得出的建議性見解可讓您快速建立內容豐富的報告和儀表板。
    • 測量清單中突顯相關測量值,這樣使用者可以快速識別潛在的互動。

    共用與協作

    • 重複使用和共用報告修改內容,例如篩選器、計算值、階層架構和報告元素格式。
    • 透過在報告中添加評論,在不同行動裝置和網路之間進行協作。
    • 為報告對象建立警報,以便在滿足閾值條件時透過電子郵件或文字訊息通知訂閱者。
    • 以安全的方式將報告作為 PDF 或電子郵件進行分發。一次性或定期(例如每天、每週或每月)分發報告。
    • 可播放的儀表板可讓您將報告置於投影片放映模式。
    • 管理員可以配置是否支援訪客存取以檢視報告或視覺化內容。
    • 來賓(Guest)使用者可以檢視公開的洞察
    • 使用者可以使用 SAS Drive 檢視、組織和協同處理他們的工作:
      • 使用者可以從單一地方收藏、共用、預覽和標記其內容。
      • 打造與專案成員共用資料、內容和其他資源的專案。

    SAS® 視覺化分析應用

    • 可從以下位置免費取得:
      • App Store(iOS iPhone 和 iPads)。
      • Google Play(Android 手機和平板電腦)。
      • Microsoft Store(Windows 10 裝置)。
    • 此應用程式可讓您使用慣有的滑動習慣 ,閱讀SAS 視覺化分析報表和儀表板並與之互動。
    • 與iOS 上的 SAS 視覺化分析,可使用語音指令進行互動。
    • 在 SAS 視覺化分析中建立的報表可以在任何地方檢視。
    • 安全地存取線上和離線行動裝置上的內容。
    • 注釋、評論、共用和透過電子郵件將報表發送給其他人,以加強彼此之間的協作。
    • 可以擷取螢幕畫面,並與他人共用評論。
    • 當報告更新、資料更改或應用更新時,通知會提醒業務使用者。

    可嵌入式商業智能

    • 使用 iOS 的 SAS SDK 和 Android 的 SAS SDK 建立自己的行動應用程式,以便建立可嵌入式的商業智能:
      • 個人化您的行動應用程式,涵括可嵌入式的 SAS 視覺化分析內容、公司企業識別和命名。
      • 事先設定您的行動應用程式以連線到 SAS 伺服器並訂閱指定的報告。
      • 開發完全自訂的行動應用程式,於內嵌入SAS 視覺化分析內容(例如 GatherIQ)。
      • 藉助與行動裝置管理 (MDM) 服務整合(透過新的 API)來管理和保護您的行動應用程式和資料。      | 
    • 使用SAS 視覺化分析 SDK 將完整報告或單個對象嵌入網站和 Web app:
      • 在單一位置整併來自多個報告的洞察資料。
      • 切換嵌入的SAS 視覺化分析使用者選項,可以驅動網頁上任何位置的其他元素。

    位置分析

    • 透過 Esri ArcGIS Online 或 OpenStreetMap 啟用地理地圖。 
    • 您可以套用地理地圖上的資料點以選擇特定資料進一步分析。
    • 地理地圖可使地理區域內的視覺化測量變異變得簡單。
    • 可透過 Esri ArcGIS Online 免費存取所有 Esri 城市街道圖和地理搜索資訊。
    • 自訂多邊形(例如銷售區域、投票地區、平面圖、座次圖表)可讓您按照業務需求檢視整個世界。可以對這些多邊形進行動畫處理,以顯示關鍵指標隨時間變化的情況。
    • 地理位置叢集可以更輕鬆地閱讀大量位置資料並標示感興趣的區域。以不同的縮放級別取得更多或更少的細節。
    • 新增地圖圖釘,以在地圖上標記興趣點和見解。
    • 藉助 Esri ArcGIS Online 許可證,您可以使用 Esri 人口統計資料豐富自己的資料:
      • 從圖釘開始,然後根據行駛距離或提供的行駛時間選擇可以行駛的區域。
      • 建立點之間的旅行路線。
      • 透過對資料進行地理編碼來瞭解位置如何影響結果 –根據資料中的位置資訊(國家/地區、州/省、郵遞區號、城市、街道)在資料中新增緯度和經度欄位。

    安全和管理

    • SAS 環境管理器可以 Web形式輕鬆地集中管理和監控您的 BI 和分析環境,包括使用者、資料、內容、伺服器、服務和安全性。
    • 使用者驗證和內容授權,可對資料有效管控。
    • 支援Object-level安全級別(資料夾、報告等)和資料安全性(表和列級別)以對資料有效管控。
    • 能夠與企業身份目錄(如 LDAP)無縫整合。
    • 適用於使用者和群組的規則,對應應用功能以支援管控。
    • 將行動裝置列入白名單或黑名單,以確保對SAS 視覺化分析應用的授權。
    • 接近即時(Near-real time)的儀表板以監控系統運行狀況和關鍵活動。
    • 分佈式處理節點的新增和刪除。
    • 可批量執行管理 API ,包含安全性、程式庫、使用者組和配置的管理。
    • 可自訂的監控和效能報告。
    • 環境範圍內的記錄檔探索、作業排程和監控。

    SAS® Viya®  in-memory引擎

    • CAS(SAS 雲端分析服務)可in memory執行處理,及在叢集中的各個節點之間分佈式處理。
    • 使用者請求(以過程語言表示)可轉換為在分佈式環境中處理參數操作。結果集合與訊息將傳遞回該過程,以供用戶採取進一步的動作。
    • 資料按區塊進行管理,可以按需載入到記憶體中。
    • 如果表超出記憶體容量,則伺服器會將區塊緩存在磁碟上。資料和中間結果會根據需要儲存在跨越所有作業和使用者的記憶體中。
    • 包含高效的節點間溝通。透過演算法確定給定作業的最佳節點數。
    • 通訊層支援容錯功能,並可讓您在伺服器運行期間從伺服器中刪除或新增節點。所有元件均可複製以實現高可用性。
    • 支援舊版 SAS 程式碼以及與 SAS 9.4M5 用戶端的直接互通性。
    • 支援多租戶部署,以實現共用軟體疊層,進而以安全的方式支援孤立的租戶。

    部署靈活性

    • 線下部署:
      • 單機伺服器可滿足中小型組織的需求。
      • 分散式伺服器可滿足不斷增長的資料量、增加的工作量和延展性要求。
    • 雲端部署:
      • 企業主機代管
      • 私有或公共雲(例如 Amazon 中的 BYOL)基礎結構。
      • SAS 管理的軟體即服務 (SaaS)。
      • Cloud Foundry 平臺即服務 (PaaS) 支援多個雲提供商。

    Descriptive & predictive modeling

    • Explore and evaluate segments for further analysis using k-means clustering, scatter plots and detailed summary statistics.
    • Use machine learning techniques to build predictive models from a visual or programming interface.

    View more descriptive & predictive modeling features

    Visual data exploration & discovery (available through SAS® Visual Analytics) 

    • Quickly interpret complex relationships or key variables that influence modeling outcomes within large data sets.
    • Filter observations and understand a variable’s level of influence on overall model lift. 
    • Detect outliers and/or influence points to help you determine, capture and remove them from downstream analysis (e.g., models). 
    • Explore data using bar charts, histograms, box plots, heat maps, bubble plots, geographic maps and more. 
    • Derive predictive outputs or segmentations that can be used directly in other modeling or visualization tasks. Outputs can be saved and passed to those without model-building roles and capabilities.
    • Automatically convert measure variables with two levels to category variables when a data set is first opened.

    Visual interface access to analytical techniques

    • Clustering:
      • K-means, k-modes or k-prototypes clustering.
      • Parallel coordinate plots to interactively evaluate cluster membership.
      • Scatter plots of inputs with cluster profiles overlaid for small data sets and heat maps with cluster profiles overlaid for large data sets.
      • Detailed summary statistics (means of each cluster, number of observations in each cluster, etc.).
      • Generate on-demand cluster ID as a new column.
      • Supports holdout data (training and validation) for model assessment. 
    • Decision trees: 
      • Supports classification and regression trees. 
      • Based on a modified C4.5 algorithm or cost-complexity pruning. 
      • Interactively grow and prune a tree. Interactively train a subtree. 
      • Set tree depth, max branch, leaf size, aggressiveness of tree pruning and more. 
      • Use tree map displays to interactively navigate the tree structure. 
      • Generate on-demand leaf ID, predicted values and residuals as new columns. 
      • Supports holdout data (training and validation) for model assessment.
      • Supports pruning with holdout data.
      • Supports autotuning with options for leaf size.
    • Linear regression: 
      • Influence statistics.
      • Supports forward, backward, stepwise and lasso variable selection.
      • Iteration plot for variable selection.
      • Frequency and weight variables.
      • Residual diagnostics.
      • Summary table includes overall ANOVA, model dimensions, fit statistics, model ANOVA, Type III test and parameter estimates.
      • Generate on-demand predicted values and residuals as new columns.
      • Support holdout data (training and validation) for model assessment.
    • Logistic regression:
      • Models for binary data with logit and probit link functions.
      • Influence statistics.
      • Supports forward, backward, stepwise and lasso variable selection.
      • Iteration plot for variable selection.
      • Frequency and weight variables.
      • Residual diagnostics.
      • Summary table includes model dimensions, iteration history, fit statistics, convergence status, Type III tests, parameter estimates and response profile.
      • Generate on-demand predicted labels and predicted event probabilities as new columns. Adjust the prediction cutoff to label an observation as event or non-event.
      • Support holdout data (training and validation) for model assessment.
    • Generalized linear models:
      • Distributions supported include beta, normal, binary, exponential, gamma, geometric, Poisson, Tweedie, inverse Gaussian and negative binomial.
      • Supports forward, backward, stepwise and lasso variable selection.
      • Offset variable support.
      • Frequency and weight variables.
      • Residual diagnostics.
      • Summary table includes model summary, iteration history, fit statistics, Type III test table and parameter estimates.
      • Informative missing option for treatment of missing values on the predictor variable.
      • Generate on-demand predicted values and residuals as new columns.
      • Supports holdout data (training and validation) for model assessment.
    • Generalized additive models:
      • Distributions supported include normal, binary, gamma, Poisson, Tweedie, inverse Gaussian and negative binomial.
      • Supports one- and two-dimensional spline effects.
      • GCV, GACV and UBRE methods for selecting the smoothing effects.
      • Offset variable support.
      • Frequency and weight variables.
      • Residual diagnostics.
      • Summary table includes model summary, iteration history, fit statistics and parameter estimates.
      • Supports holdout data (training and validation) for model assessment.
    • Nonparametric logistic regression:
      • Models for binary data with logit, probit, log-log and c-log-log link functions.
      • Supports one- and two-dimensional spline effects.
      • GCV, GACV and UBRE methods for selecting the smoothing effects.
      • Offset variable support.
      • Frequency and weight variables.
      • Residual diagnostics.
      • Summary table includes model summary, iteration history, fit statistics and parameter estimates.
      • Supports holdout data (training and validation) for model assessment.

    Programming access to analytical techniques

    • Programmers and data scientists can access SAS Viya (CAS server) from SAS Studio using SAS procedures (PROCs) and other tasks.
    • Programmers can execute CAS actions using PROC CAS or use different programming environments like Python, R, Lua and Java.
    • Users can also access SAS Viya (CAS server) from their own applications using public REST APIs.
    • Provides native integration to Python Pandas DataFrames. Python programmers can upload DataFrames to CAS and fetch results from CAS as DataFrames to interact with other Python packages, such as Pandas, matplotlib, Plotly, Bokeh, etc.
    • Includes SAS/STAT® and SAS/GRAPH® software.
    • Principal component analysis (PCA):
      • Performs dimension reduction by computing principal components.
      • Provides the eigenvalue decomposition, NIPALS and ITERGS algorithms.
      • Outputs principal component scores across observations.
      • Creates scree plots and pattern profile plots.
    • Decision trees:
      • Supports classification trees and regression trees.
      • Supports categorical and numerical features.
      • Provides criteria for splitting nodes based on measures of impurity and statistical tests.
      • Provides the cost-complexity and reduced-error methods of pruning trees.
      • Supports partitioning of data into training, validation and testing roles.
      • Supports the use of validation data for selecting the best subtree.
      • Supports the use of test data for assessment of final tree model.
      • Provides various methods of handling missing values, including surrogate rules.
      • Creates tree diagrams.
      • Provides statistics for assessing model fit, including model-based (resubstitution) statistics.
      • Computes measures of variable importance.
      • Outputs leaf assignments and predicted values for observations.
    • Clustering:
      • Provides the k-means algorithm for clustering continuous (interval) variables.
      • Provides the k-modes algorithm for clustering nominal variables.
      • Provides various distance measures for similarity.
      • Provides the aligned box criterion method for estimating the number of clusters.
      • Outputs cluster membership and distance measures across observations.
    • Linear regression:
      • Supports linear models with continuous and classification variables.
      • Supports various parameterizations for classification effects.
      • Supports any degree of interaction and nested effects.
      • Supports polynomial and spline effects.
      • Supports forward, backward, stepwise, least angle regression and lasso selection methods.
      • Supports information criteria and validation methods for controlling model selection.
      • Offers selection of individual levels of classification effects.
      • Preserves hierarchy among effects.
      • Supports partitioning of data into training, validation and testing roles.
      • Provides a variety of diagnostic statistics.
      • Generates SAS code for production scoring.
    • Logistic regression:
      • Supports binary and binomial responses.
      • Supports various parameterizations for classification effects.
      • Supports any degree of interaction and nested effects.
      • Supports polynomial and spline effects.
      • Supports forward, backward, fast backward and lasso selection methods.
      • Supports information criteria and validation methods for controlling model selection.
      • Offers selection of individual levels of classification effects.
      • Preserves hierarchy among effects.
      • Supports partitioning of data into training, validation and testing roles.
      • Provides variety of statistics for model assessment.
      • Provides variety of optimization methods for maximum likelihood estimation.
    • Generalized linear models:
      • Supports responses with a variety of distributions, including binary, normal, Poisson and gamma.
      • Supports various parameterizations for classification effects.
      • Supports any degree of interaction and nested effects.
      • Supports polynomial and spline effects.
      • Supports forward, backward, fast backward, stepwise and group lasso selection methods.
      • Supports information criteria and validation methods for controlling model selection.
      • Offers selection of individual levels of classification effects.
      • Preserves hierarchy among effects.
      • Supports partitioning of data into training, validation and testing roles.
      • Provides variety of statistics for model assessment.
      • Provides a variety of optimization methods for maximum likelihood estimation.
    • Nonlinear regression models:
      • Fits nonlinear regression models with standard or general distributions.
      • Computes analytical derivatives of user-provided expressions for more robust parameter estimations.
      • Evaluates user-provided expressions using the ESTIMATE and PREDICT statements (procedure only).
      • Requires a data table that contains the CMP item store if not using PROC NLMOD.
      • Estimates parameters using the least squares method.
      • Estimates parameters using the maximum likelihood method.
    • Quantile regression models:
      • Supports quantile regression for single or multiple quantile levels.
      • Supports multiple parameterizations for classification effects.
      • Supports any degree of interactions (crossed effects) and nested effects.
      • Supports hierarchical model selection strategy among effects.
      • Provides multiple effect-selection methods.
      • Provides effect selection based on a variety of selection criteria.
      • Supports stopping and selection rules.
    • Predictive partial least squares models:
      • Provides programming syntax with classification variables, continuous variables, interactions and nestings.
      • Provides effect-construction syntax for polynomial and spline effects.
      • Supports partitioning of data into training and testing roles.
      • Provides test set validation to choose the number of extracted factors.
      • Implements the following methods: principal component regression, reduced rank regression and partial least squares regression.
    • Generalized additive models:
      • Fit generalized additive models based on low-rank regression splines.
      • Estimates the regression parameters by using penalized likelihood estimation.
      • Estimates the smoothing parameters by using either the performance iteration method or the outer iteration method.
      • Estimates the regression parameters by using maximum likelihood techniques.
      • Tests the total contribution of each spline term based on the Wald statistic.
      • Provides model-building syntax that can include classification variables, continuous variables, interactions and nestings.
      • Enables you to construct a spline term by using multiple variables.
    • Proportional hazard regression:
      • Fit the Cox proportional hazards regression model to survival data and perform variable selection.
      • Provides model-building syntax with classification variables, continuous variables, interactions and nestings.
      • Provides effect-construction syntax for polynomial and spline effects.
      • Performs maximum partial likelihood estimation, stratified analysis and variable selection.
      • Partitions data into training, validation and testing roles.
      • Provides weighted analysis and grouped analysis.
    • Statistical process control: 
      • Perform Shewhart control chart analysis.
      • Analyze multiple process variables to identify processes that are out of statistical control. 
      • Adjust control limits to compensate for unequal subgroup sizes.
      • Estimate control limits from the data, compute control limits from specified values for population parameters (known standards) or read limits from an input data table.
      • Perform tests for special causes based on runs patterns (Western Electric rules).
      • Estimate the process standard deviation using various methods (variable charts only).
      • Save chart statistics and control limits in output data tables.
    • Independent component analysis:
      • Extracts independent components (factors) from multivariate data.
      • Maximizes non-Gaussianity of the estimated components.
      • Supports whitening and dimension reduction.
      • Produces an output data table that contains independent components and whitened variables.
      • Implements symmetric decorrelation, which calculates all the independent components simultaneously.
      • Implements deflationary decorrelation, which extracts the independent components successively.
    • Linear mixed models:
      • Supports many covariance structures, including variance components, compound symmetry, unstructured, AR(1), Toeplitz, factor analytic, etc.
      • Provides specialized dense and sparse matrix algorithms.
      • Supports REML and ML estimation methods, which are implemented with a variety of optimization algorithms.
      • Provides Inference features, including standard errors and t tests for fixed and random effects.
      • Supports repeated measures data.
    • Model-based clustering:
      • Models the observations by using a mixture of multivariate Gaussian distributions.
      • Allows for a noise component and automatic model selection.
      • Provides posterior scoring and graphical interpretation of results.

    Descriptive statistics

    • Distinct counts to understand cardinality.
    • Box plots to evaluate centrality and spread, including outliers for one or more variables.
    • Correlations to measure the Pearson’s correlation coefficient for a set of variables. Supports grouped and weighted analysis.
    • Cross-tabulations, including support for weights.
    • Contingency tables, including measures of associations.
    • Histograms with options to control binning values, maximum value thresholds, outliers and more.
    • Multidimensional summaries in a single pass of the data.
    • Percentiles for one or more variables.
    • Summary statistics, such as number of observations, number of missing values, sum of nonmissing values, mean, standard deviation, standard errors, corrected and uncorrected sums of squares, min and max, and the coefficient of variation.
    • Kernel density estimates using normal, tri-cube and quadratic kernel functions.
    • Constructs one-way to n-way frequency and cross-tabulation tables.

    Group-by processing

    • Build models, compute and process results on the fly for each group or segment without having to sort or index the data each time.
    • Build segment-based models instantly (i.e., stratified modeling) from a decision tree or clustering analysis.

    Model comparison, assessment & scoring

    • Generate model comparison summaries, such as lift charts, ROC charts, concordance statistics and misclassification tables for one or more models.
    • Interactively slide the prediction cutoff for automatic updating of assessment statistics and classification tables.
    • Interactively evaluate lift at different percentiles.
    • Export models as SAS DATA step code to integrate models with other applications. Score code is automatically concatenated if a model uses derived outputs from other models (leaf ID, cluster ID, etc.).

    Model scoring

    • Export models as SAS DATA step code to integrate models with other applications. Score code is automatically concatenated if a model uses derived outputs from other models (leaf ID, cluster ID, etc.).

    SAS® Viya® in-memory runtime engine

    • SAS Cloud Analytic Services (CAS) performs processing in memory and distributes processing across nodes in a cluster.
    • User requests (expressed in a procedural language) are translated into actions with necessary parameters to process in a distributed environment. The result set and messages are passed back to the procedure for further action by the user.
    • Data is managed in blocks and can be loaded in memory on demand. If tables exceed the memory capacity, the server caches the blocks on disk. Data and intermediate results are held in memory as long as required, across jobs and user boundaries.
    • An algorithm determines the optimal number of nodes for a given job.
    • A communication layer supports fault tolerance and lets you remove or add nodes from a server while it is running. All components in the architecture can be replicated for high availability. 
    • Products can be deployed in multitenant mode, allowing for a shared software stack to support securely isolated tenants.

    Deployment options

    • On-site deployments:
      • Single-machine mode to support the needs of small to midsize organizations.
      • Distributed mode to meet growing data, workload and scalability requirements.
    • Cloud deployments:
      • Enterprise hosting.
      • Private or public cloud (e.g., BYOL in Amazon) infrastructure.
      • Cloud Foundry platform as a service (PaaS) to support multiple cloud providers. 

    Back to Top