SAS® Data Preparation Features
Data & metadata access
- Use any authorized internal source, accessible external data sources and data held in-memory in SAS Viya.
- View a sample of a table or file loaded in the in-memory engine of SAS Viya, or from data sources registered with SAS/ACCESS, to visualize the data you want to work with.
- Quickly create connections to and between external data sources.
- Access physical metadata information like column names, data types, encoding, column count and row count to gain further insight into the data.
- Data sources and types include:
- Amazon S3.
- Amazon Redshift.
- DNFS, HDFS, PATH-based files (CSV, SAS, Excel, delimited).
- SAS® LASR™.
- Feeds from Twitter, YouTube, Facebook, Google Analytics, Google Drive, Esri and local files.
- SAS® Cloud Analytic Services (CAS).
- Parallel load data from desired data sources into memory simply by selecting them – no need to write code or have experience with an ETL tool. (Data cannot be sent back to the following data sources: Twitter, YouTube, Facebook, Google Analytics, Esri; it can only be sourced form these sites).
- Reduce the amount of data being copied by performing row filtering or column filtering before the data is provisioned.
- Retain big data in situ, and push processing to the source system by including SAS In-Database optional add-ons.
Guided, interactive data preparation
- Transform, blend, shape, cleanse and standardize data in an interactive, visual environment that guides you through data preparation processes.
- Easily understand how a transformation affected results, getting visual feedback in near-real-time through the distributed, in-memory processing of SAS Viya.
Machine learning & AI suggestions
- Take advantage of AI and machine learning to scan data and make intelligent transformation suggestions.
- Accept suggestions and complete transformations at the click of a button. No advanced or complex coding required.
- Automated suggestions include:
- Gender analysis.
- Match code.
- Missing value imputation for numeric variables.
- One hot encoding.
- Remove column.
- Whitespace trimming.
- Convert column data type.
- Center and scale.
- Unique ID creation.
- Column removal for sparse data.
- Use column-based transformations to standardize, remediate and shape data without doing configurations. You can:
- Change case.
- Convert column.
- Trim whitespace.
- Custom calculation.
- Support for wide tables allows for the saving of data plans for quick data preparation jobs.
- Use row-based transformations to filter and shape data.
- Create analytical-based tables using the transpose transformation to prepare the data for analytics and reporting tasks.
- Create simple or complex filters to remove unnecessary data.
- Write custom code to transform, shape, blend, remediate and standardize data.
- Write simple expressions to create calculated columns, write advanced code or reuse code snippets for greater transformational flexibility.
- Import custom code defined by others, sharing best practices and collaborative productivity.
- Use multiple-input-based transformations to blend and shape data.
- Blend or shape one or more sets of data together using the guided interface – there’s no requirement to know SQL or SAS. You can:
- Append data.
- Join data.
- Transpose data.
- Profile data to generate column-based and table-based basic and advanced profile metrics.
- Use the table-level profile metrics to uncover data quality issues and get further insight into the data itself.
- Drill into each column for column-level profile metrics and to see visual graphs of pattern distribution and frequency distribution results that help uncover hidden insights.
- Use a variety of data types/sources (listed previously). To profile data from Twitter, Facebook, Google Analytics or YouTube, you must first explicitly import the data into the SAS Viya in-memory environment.
Data quality processing
(SAS® Data Quality in SAS® Viya® is included in SAS Data Preparation)
- Use locale- and context-specific parsing and field extraction definitions to reshape data and uncover additional insights.
- Use the extraction transformation to identify and extract contact information (e.g., name, gender, field, pattern, identify, email and phone number) in a specified column.
- Use parsing when data in a specified column needs to be tokenized into substrings (e.g., a full name tokenized into prefix, given name, middle name and family name).
- Derive unique identifiers from match codes that link disparate data sources.
- Standardize data with locale- and context-specific definitions to transform data into a common format, like casing.
- Analyze column data using locale-specific rules to determine gender or context.
- Use identification analysis to analyze the data and determine its context, which is particularly valuable if the data or source of data is unfamiliar.
- Use gender analysis to determine the gender of a name using locale-specific rules so the data can be easily filtered or segmented.
- Create a unique ID for each row with unique ID generator.
- Identify the subject data in each column with identification analysis.
- Identify, find and sort data by tagging data with columns and tables.
- Determine matching records based upon locale- and context-specific definitions.
- Easily identify matching records using more than 25 context-specific rules such as date, address, name, email, etc.
- Use the results of the match code transformation to remove duplicates, perform a fuzzy search or a fuzzy join.
- Find like records and logically group together.
System & job monitoring
- Use integrated monitoring capabilities for system- and job-level processes.
- Gain insight into how many processes are running, how long they’re taking and who is running them.
- Easily filter through all system jobs based on job status (running, successful, failed, pending and cancelled).
- Access job error logs to help with root-cause analysis and troubleshooting. (Note: Monitoring is available using SAS Environment Manager and the job monitor application.)
Data import & data preparation job scheduling
- Create a data import job from automatically generated code to perform a data refresh using the integrated scheduler.
- Schedule data explorer imports as jobs so they will become an automatic, repeatable process.
- Specify a time, date, frequency and/or interval for the jobs.
- Explore relationships between accessible data sources, data objects and jobs.
- Use the relationship graph to visually show the relationships that exist between objects, making it easier to understand the origin of data and trace its processing.
- Create multiple views with different tabs, and save the organization of those views.
Plan templates & project collaboration
- Use data preparation plans (templates), which consist of a set of transformation rules that get applied to one or more sources of data, to improve productivity (spend less time preparing data).
- Reuse the templates by applying them to different sets of data to ensure that data is transformed consistently to adhere to enterprise data standards and policies.
- Rely on team-based collaboration through a project hub used with SAS Viya projects. The project’s activity feed shows who did what and when, and can be used to communicate with other team members.
Batch text analysis
- Quickly extract contents of documents, and perform text identification and extraction.