SAS Data Preparation Features List

Data & metadata access

Data & metadata access

  • Use any authorized internal source, accessible external data sources and data held in-memory in SAS Viya.
    • View a sample of a table or file loaded in the in-memory engine of SAS Viya, or from data sources registered with SAS/ACCESS, to visualize the data you want to work with.
    • Quickly create connections to and between external data sources.
    • Access physical metadata information like column names, data types, encoding, column count and row count to gain further insight into the data.
  • Data sources and types include:
    • Amazon S3.
    • Amazon Redshift.
    • DNFS, HDFS, PATH-based files (CSV, SAS, Excel, delimited).
    • DB2.
    • Hive.
    • Impala.
    • SAS LASR.
    • ODBC.
    • Oracle.
    • Postgres.
    • Teradata.
    • Feeds from Twitter, YouTube, Facebook, Google Analytics, Google Drive, Esri and local files.
    • SAS Cloud Analytic Services (CAS).

Data provisioning

Data provisioning

  • Parallel load data from desired data sources into memory simply by selecting them – no need to write code or have experience with an ETL tool. (Data cannot be sent back to the following data sources: Twitter, YouTube, Facebook, Google Analytics, Esri; it can only be sourced from these sites).
    • Reduce the amount of data being copied by performing row filtering or column filtering before the data is provisioned.
    • Retain big data in situ, and push processing to the source system by including SAS In-Database optional add-ons.

    Guided, interactive data preparation

    Guided, interactive data preparation

    • Transform, blend, shape, cleanse and standardize data in an interactive, visual environment that guides you through data preparation processes.
    • Easily understand how a transformation affected results, getting visual feedback in near-real-time through the distributed, in-memory processing of SAS Viya.

    Machine learning & AI suggestions

    Machine learning & AI suggestions

    • Take advantage of AI and machine learning to scan data and make intelligent transformation suggestions.
    • Accept suggestions and complete transformations at the click of a button. No advanced or complex coding required.
    • Automated suggestions include:
      • Casing.
      • Gender analysis.
      • Match code.
      • Parse.
      • Standardization.
      • Missing value imputation for numeric variables.
      • One hot encoding.
      • Remove column.
      • Whitespace trimming.
      • Convert column data type.
      • Center and scale.
      • Dedupe.
      • Unique ID creation.
      • Column removal for sparse data.

    Column-based transformations

    Column-based transformations

    • Use column-based transformations to standardize, remediate and shape data without doing configurations. You can:
      • Change case.
      • Convert column.
      • Rename.
      • Remove.
      • Split.
      • Trim whitespace.
      • Custom calculation.
    • Support for wide tables allows for the saving of data plans for quick data preparation jobs.

    Row-based transformations

    Row-based transformations

    • Use row-based transformations to filter and shape data.
    • Create analytical-based tables using the transpose transformation to prepare the data for analytics and reporting tasks.
    • Create simple or complex filters to remove unnecessary data.

    Code-based transformations

    Code-based transformations

    • Write custom code to transform, shape, blend, remediate and standardize data.
    • Write simple expressions to create calculated columns, write advanced code or reuse code snippets for greater transformational flexibility.
    • Import custom code defined by others, sharing best practices and collaborative productivity.

    Multiple-input-based transformations

    Multiple-input-based transformations

    • Use multiple-input-based transformations to blend and shape data.
    • Blend or shape one or more sets of data together using the guided interface – there’s no requirement to know SQL or SAS. You can:
      • Append data.
      • Join data.
      • Transpose data.

    Data profiling

    Data profiling

    • Profile data to generate column-based and table-based basic and advanced profile metrics.
    • Use the table-level profile metrics to uncover data quality issues and get further insight into the data itself.
    • Drill into each column for column-level profile metrics and to see visual graphs of pattern distribution and frequency distribution results that help uncover hidden insights.
    • Use a variety of data types/sources (listed previously). To profile data from Twitter, Facebook, Google Analytics or YouTube, you must first explicitly import the data into the SAS Viya in-memory environment.

    Data quality processing

    Data quality processing

    SAS Data Quality in SAS Viya is included in SAS Data Preparation.

    Data cleansing

    Data cleansing

    • Use locale- and context-specific parsing and field extraction definitions to reshape data and uncover additional insights.
    • Use the extraction transformation to identify and extract contact information (e.g., name, gender, field, pattern, identify, email and phone number) in a specified column.
    • Use parsing when data in a specified column needs to be tokenized into substrings (e.g., a full name tokenized into prefix, given name, middle name and family name).
    • Derive unique identifiers from match codes that link disparate data sources.
    • Standardize data with locale- and context-specific definitions to transform data into a common format, like casing.

    Identity definition

    Identity definition

    • Analyze column data using locale-specific rules to determine gender or context.
      • Use identification analysis to analyze the data and determine its context, which is particularly valuable if the data or source of data is unfamiliar.
      • Use gender analysis to determine the gender of a name using locale-specific rules so the data can be easily filtered or segmented.
      • Create a unique ID for each row with unique ID generator.
      • Identify the subject data in each column with identification analysis.
      • Identify, find and sort data by tagging data with columns and tables.

    Data matching

    Data matching

    • Determine matching records based upon locale- and context-specific definitions.
    • Easily identify matching records using more than 25 context-specific rules such as date, address, name, email, etc.
    • Use the results of the match code transformation to remove duplicates, perform a fuzzy search or a fuzzy join.
    • Find like records and logically group together.

    System & job monitoring

    System & job monitoring

    • Use integrated monitoring capabilities for system- and job-level processes.
    • Gain insight into how many processes are running, how long they’re taking and who is running them.
    • Easily filter through all system jobs based on job status (running, successful, failed, pending and canceled).
    • Access job error logs to help with root-cause analysis and troubleshooting. (Note: Monitoring is available using SAS Environment Manager and the job monitor application.)

    Data import & data preparation job scheduling

    Data import & data preparation job scheduling

    • Create a data import job from automatically generated code to perform a data refresh using the integrated scheduler.
    • Schedule data explorer imports as jobs so they will become an automatic, repeatable process.
    • Specify a time, date, frequency and/or interval for the jobs.

    Data lineage

    Data lineage

    • Explore relationships between accessible data sources, data objects and jobs.
    • Use the relationship graph to visually show the relationships that exist between objects, making it easier to understand the origin of data and trace its processing.
    • Create multiple views with different tabs, and save the organization of those views.

    Plan templates & project collaboration

    Plan templates & project collaboration

    • Use data preparation plans (templates), which consist of a set of transformation rules that get applied to one or more sources of data, to improve productivity (spend less time preparing data).
    • Reuse the templates by applying them to different sets of data to ensure that data is transformed consistently to adhere to enterprise data standards and policies.
    • Rely on team-based collaboration through a project hub used with SAS Viya projects. The project’s activity feed shows who did what and when, and can be used to communicate with other team members.

    Batch text analysis

    Batch text analysis

    • Quickly extract contents of documents, and perform text identification and extraction.