SAS® Data Loader for Hadoop Features
Transform and transpose data on Hadoop
- Copy relational databases, SAS/ACCESS libraries and SAS data sets to and from Hadoop via parallel bulk data movement.
- Import data from CSV and other delimited files into Hadoop, and delete rows on Hadoop tables.
- Transform data by filtering rows, managing columns and summarizing rows.
- Transpose and group selected columns.
- Access dozens of cloud, big data and relational data sources through the use of SAS/ACCESS libraries, including Amazon Redshift, Apache Hadoop, SAP HANA, Oracle, IBM DB2 and Teradata.
Secure and governed big data access
- Provides secure access to Kerberos-enabled Hadoop clusters.
- Supports Active Directory and LDAP-based user authentication.
- Lets you share and secure saved directives using SAS folders.
- Enables you to track metadata relationships when used in conjunction with SAS Lineage and SAS Data Integration Studio.
Cleanse data in Hadoop
- Standardize, deduplicate, match and parse data in Hadoop.
- Intelligent filtering allows import of values from Profile into Filter and Transform directives.
- Query, sort or deduplicate the data in an existing Hadoop table.
- Speed data exploration by determining the type of data within a column based on its values.
- Using other data quality functions, you can apply casing, determine gender, conduct pattern analysis and extract tokens from unstructured text fields.
Query or join data in Hadoop
- Query a table or join multiple tables without knowing SQL.
- Run aggregations on selected columns and filter source data.
- Power users can generate and edit a HiveQL query, or paste an existing HiveQL query.
Speed data management processes with Spark
- Data quality functions run in memory on Spark for improved performance.
- Spark matching and best record creation enables master data management for big data.
- Read and write to Spark data sets as needed.
Raise your data professionals’ productivity to a new level
- Speeds creation of Impala queries – a faster way to access data on Hadoop.
- Lets you chain multiple directives together and run as a group.
- Permits external job scheduling with an exposed public API.
- Lets you call SAS Data Loader directives and view profiles from SAS Data Integration Studio.
Manage data where it lives
- Hadoop support is included for Pivotal HD and IBM BigInsights as well as Hortonworks, Cloudera and MapR.
- Allows in-database merging of multiple data sources using match and merge directive.
- Improves performance by pushing processing down to the cluster using the embedded SAS Code Accelerator for Hadoop.
Profile data and save profile reports
- Enables you to determine uniqueness, incompleteness and patterns by selecting source columns from one or more tables.
- Lets you list and open reports generated by the profile data directive.
- Lets you create and save notes.
- Runs profiling in parallel on the Hadoop cluster for improved performance.
Manage and reuse directives via wizard-driven user interface
- Lets you view list and status of directives and job logs.
- Enables you to stop and start directives, and open their logs and generated code files.
- Lets you run, view or edit saved directives for reuse.
Share and manage assets from a central location
- Enables all users to work from a single, centrally deployed web application for easier, more secure data governance.
- Organizes the directives created by SAS Data Loader for Hadoop users in the standard SAS folder structure.
- Provides access to SAS Data Loader directives inside SAS Data Integration Studio and integrate them with other data flows.
- Lets you visualize metadata relationships that span traditional relational sources and Apache Hadoop when used with SAS Lineage and SAS Data Integration Studio.
Leverage more data sources with enhanced data connectivity
- Connects to a commonly used enterprise data source by moving data to and from JDBC sources using Oozie/Sqoop.
- Moves data to and from SAS libraries into Hadoop using SAS/ACCESS libraries.
Identify where data originates and how it's used
- Lets you view the lineage of SAS Data Loader directives when used within SAS Data Integration Studio jobs.
- Uses impact analysis to identify potential consequences of a change to the lineage of an object.
Integrate with an existing SAS® environment
- Leverages SAS midtier for authentication, object management, failover, administration and security.
- Grants administrative capabilities to a set of users for the SAS Data Loader for Hadoop web application.
- Assigns permissions on a directive-by-directive basis.
Load data to SAS® LASR™ Analytic Server
- Loads specified Hadoop columns in memory onto the SAS LASR Analytic Server for analysis using SAS Visual Analytics or SAS Visual Statistics (licensed separately).
Run a SAS® program
- Runs SAS programs that use the DS2 language on Hadoop using the SAS Embedded Process, a lightweight SAS execution engine.