SAS® Data Management Features
One data integration development environment
- An easy-to-use, point-and-click, role-based GUI with an intuitive set of configurable windows for managing authorized processes. Drag-and-drop functionality eliminates the need for programming.
- Wizards access to source systems, creating target structures, import and export metadata functions, and build/execute ETL and ELT process flows.
- Customizable metadata tree views let you display, visualize and understand metadata.
- Dedicated GUI for profiling data makes it easy to repair source system issues while retaining the business rules for use in other data management processes.
- Interactive debugging and testing of jobs during development and full access to logs is supported.
- Audit history and check-in/check-out allows designers to see which jobs or tables were changed, when and by whom.
- Ability to distribute data integration tasks across any platform and to virtually connect any source or target data store.
- Integration with third-party vendors Subversion and CVS provides enhanced version and source control features such as archiving, differencing and rollback.
- Enhanced SAS code import capabilities give current SAS users an easy way to import their SAS jobs and code.
- Command-line job deployment options for deploying single and multiple jobs.
Integrated process designer
- Build and edit data management processes with a visual, end-to-end event designer.
- Control the execution of data integration, SAS Stored Processes and data quality jobs.
- Conditionally execute jobs based on IF THEN logic and parameterization.
- Fork "jobs" and processes to execute in parallel.
- Publish job inputs and outputs for parameterized jobs.
- Listen for internal and external events as well as conditionally raise events.
- Execute external OS level commands such as call shell scripts.
- Call REST and SOAP web services.
- List and open old versions of jobs (in read-only mode) and make historic versions current with built-in versioning.
- Provide full support for promotion/migration of jobs in support of DEV/TEST/PROD.
- Use common scripting languages to deploy data integration batch jobs in an automated manner with automated job deployment.
- Run decision flows created in SAS Decision Manager from a node in SAS Data Integration Studio.
- Push data into SAS® LASR™ from nodes available in SAS Data Integration Studio to prepare data for visual analytics.
Superior connectivity & data access
- Provides connectivity in batch or in real time to more data sources on more platforms than most other solutions.
- Data access engines are available for enterprise applications, non-relational databases, RDBMSs, data warehouse appliances, PC file formats and more.
- Specialized table loaders provide optimized bulk loading of Oracle, Teradata and DB2.
- File reader/writer available for Hadoop file system (HDFS) and support for Hadoop's MapReduce, Pig and Hive within flows as well as Hortonworks.
- Cloudera Impala Source Designer allows you to view tables when accessing Hadoop via the Cloudera Impala interface.
- A complete and shared metadata environment provides consistent data definition across all data sources.
- Native access methods deliver top performance, reduce data movement and reduce the need for custom coding.
- Support for message-oriented middleware, including WebSphere MQ from IBM, MSMQ from Microsoft, Java Message Service (JMS) and TIBCO Rendezvous. Support for unstructured and semistructured data to parse and process files.
- Access to static and streaming data for sending and receiving via web services.
- Expanded support for MPP databases: Aster Data nCluster, Pivotal Greenplum and Sybase IQ, enabling more ELT pushdown and support for bulk-load utilities.
- Native support for SQL-based processing.
- Enhanced connectivity to Aster Data, Pivotal Greenplum, Hadoop and Sybase IQ databases with the ability to push down more processing to the databases.
Consistent metadata management
- Metadata is captured and documented throughout transformations and data integration processes, and is available for immediate reuse.
- Sophisticated metadata mapping technologies quickly propagate column definitions from sources to targets, and create automated, intelligent table joins.
- Metadata search enables quick location of desired components.
- Impact analysis for assessing the scope and impact of making changes to existing objects such as columns, tables and process jobs before they occur.
- Ability to determine the path, processes and transformations taken to produce the resulting information.
- Data lineage (reverse impact analysis), critical to validate dependencies helps build user confidence in data.
- Batch updates are made to object metadata repository via a relationship service for integration with SAS Lineage.
- Change analysis for metadata change discovery, comparison, analysis and selective propagation.
- Multiple-user collaboration support includes object check-in and check-out.
- Promotion and replication of metadata across development, test and production environments.
- Wizard-driven metadata import and export as well as column standardization.
- Metadata-driven deployment flexibility so process jobs can be deployed for batch execution, as reusable stored processes or as web services.
Foundation of data quality
- Data quality is embedded into batch, near-time and real-time processes.
- Data cleansing is provided in native languages with specific language awareness and localizations for more than 38 regions worldwide.
- Data quality functions are available in both operational and reporting (transaction and batch) environments.
- An interactive GUI enables you to profile operational data to identify incomplete, inaccurate or ambiguous data.
- Customizable and reusable data quality business rules can be accessed directly within process job flows.
- Out-of-the-box standardization rules conform data to corporate standards, or you can build customized rules for special situations.
- Metadata built and shared across the entire process provides an accurate trail of actions applied to the cleansed data.
- Value can be added to existing data by generating and appending postal addresses, geocoding, demographic data or facts from other sources of information.
- Data stewards can profile operational data and monitor ongoing data activities with an interactive GUI designed specifically for their needs.
- Simple process for institutionalizing data quality business rules. Apply basic or complex rules to validate data according to the specific business requirements of a particular process, project or organization. Rules may be applied in batch mode or as a real-time transaction cleansing process.
- Data quality monitoring enables you to continuously examine data in real time and over time to discover when quality falls below acceptable limits.
- Alerts can be issued when there is a need for corrective action.
Extract, transform, load (ETL) & extract, load & transform (ELT)
- A powerful, easy-to-use transformation user interface that supports collaboration, reuse of processes and common metadata.
- Out-of-the-box SQL-based transforms deliver ELT capabilities, including create tables, join, insert rows, delete rows, update rows, merge, SQL set, extract and SQL execute.
- Single or multiple-source data acquisition, transformation, cleansing and loading enable the easy creation of data warehouses, data marts, or BI and analytic data stores.
- Metadata is captured and documented throughout the data integration and transformation processes and is available for immediate reuse.
- Transformations can run on any platform with any data source.
- More than 300 predefined table and column-level transformations.
- Ready-to-use analytical transformations, including correlations and frequencies, distribution analysis and summary statistics.
- Transformation wizard or Java plug-in design templates let you easily generate reusable and repeatable transformations that are tracked and registered in metadata.
- Transformation processes, callable through custom exits, message queues and web services are reusable in different projects and environments.
- Transformations can be executed interactively and scheduled to run in batch at set times or based on events that trigger execution.
- Framework environment for publishing information to archives, a publishing channel, email or various message-queuing middleware.
- Easily refresh, append and update during loading.
- Optimize loading techniques with user-selectable options.
- Database-aware loading techniques include bulk-load facilities, index and key creation, and dropping and truncating of tables.
- Transformations automatically generate high-performance SAS code that is designed for rapid and efficient processing.
- Transformations include: Type 1 SCD support for merge and hash techniques, table differencing and enhancements for Type 2 SCD loaders.
- The Compare Tables transformation compares two data sources and detects changes in data.
- Provides the ability to call REST or SOAP web services.
- Virtual access to database structures, enterprise applications, mainframe legacy files, text, XML, message queues and a host of other sources.
- Ability to join data across data sources for real-time access and analysis.
- Instant access to a real-time view of the data using the built-in data viewer.
- Query optimization is provided both automatically as part of DBMS requests, and manually within the advanced SQL editor, and can be used for both homogeneous and heterogeneous data sources.
Master data management
- Enhanced metadata search features enable you to search by type, name, date or other keywords, subset by folders or other options, and save searches for future use.
- Support for semantic data descriptions of input and output data sources uniquely identify each instance of a business element (customer, product, account, etc.).
- Powerful transformation tools and embedded data quality processes improve master data quality.
- Sophisticated fuzzy-matching technology and clustering methodologies enable you to validate and consolidate master records into identifiable data groups.
- Real-time data monitoring, dashboards and scorecards let you check and control data integrity over time.
- Can be used as a basis for transitioning to a full-fledged master data management offering.
- Data feeds can arrive in a single transaction or in hundreds of transactions at the same time.
- Data sets can be processed in a single pass of the source data.
- Enhanced, web-based reference data management and business data environment to ease governance and semantic reference, respectively.
- Integrated business data glossary allows business terms to be organized hierarchically and related to term owners as well as technical metadata such as tables and data management processes.
- Extensive data stewardship capabilities, including web-based dashboarding and business rule exception monitoring for reporting and remediation.
- Metadata changes published into the SAS relationship service for viewing SAS Lineage, allowing business users to visualize relationships or impact analysis.
Migration & synchronization
- Ability to migrate or synchronize data between database structures, enterprise applications, mainframe legacy files, text, XML, message queues and a host of other sources.
- Metadata-driven access to sources and targets.
- Extensive library of predefined transformations can be extended and shared with other integration processes.
- Embedded, reusable data quality business rules clean data as it is moved, synchronized or replicated.
- Recognizes changes to key fields and replicates or synchronizes changes across multiple databases.
- Optional, integrated scheduler allows changes made in one or more systems to be propagated to other systems on a scheduled basis.
- Delivers real-time data services for synchronization and migration projects.
- Integration of asynchronous business processes via message-based connectivity.
- Interfaces to the leading message-queuing products, including Microsoft MSMQ, IBM WebSphere, Tibco Rendezvous and Java Message Service (JMS).
- Guaranteed message/transaction delivery reduces the cost of disruptions.
- Optimized access for each message-queue manager that is designed for minimal administrative effort.
- Event-based application integration so activities in one application automatically trigger actions in other applications.
- Dynamic, event-driven run streams and alerts.
- Ability to send and receive messages between distributed and disparate systems.
Partitioning & parallel processing
- Parallel write to Hadoop.
- SAS PROC DS2 and SAS FedSQL capabilities for SAS Scalable Performance Data Server.
- Ability to push and pull data from an Amazon environment.
Source designer for CAS
- From within SAS Data Integration Studio, you can configure and connect to SAS® Viya™ CAS.
- Enables customers to continue taking advantage of their existing SAS 9.4 platform while beginning to use features of SAS Viya.
SAS® Metadata Bridge for Tableau
- Supports Tableau (File) and Tableau Server (Repository).
Enhanced administration & monitoring
- Job status and performance reports and trending information provide the ability to track metrics such as CPU use, memory, I/O, etc. and deliver updates on how recent process runs perform relative to previous runs.
- Enables users to manage and monitor their complete integration environments, including the following types of jobs and activities:
- Data quality jobs.
- Federation cache jobs – scheduled queries to update the federation cache.
- Process flows.
- Access log files from a central, web-based panel for faster, easier troubleshooting.
- SAS® Stored Processes.
- Data integration jobs.