Load your data into or out of Hadoop. Prep it so it's ready for reports, visualizations or advanced analytics. And do it all yourself, quickly and easily. SAS Data Loader for Hadoop empowers you to manage your own data without writing code.
Manage data – no special skills needed.
Perform data integration, data quality and data preparation tasks yourself, without having to write complex MapReduce code or ask for outside help. SAS Data Loader for Hadoop bridges the skills gap, giving all users access to their data regardless of technical ability.
Boost scalability and performance.
Business users find it easy to use. Data scientists and SAS coders like its speed, efficiency and agility. A code accelerator harnesses the power of Hadoop, and data quality functions run in memory on Spark for better performance. And by minimizing data movement, you increase your data's security.
Free IT for more technical tasks.
When your data scientists are weighed down with basic data management duties, their advanced skills go underused – and business takes a hit. SAS Data Loader for Hadoop frees IT to focus on making your systems better, faster and more powerful.
Get more from your big data.
It’s easy to load data from relational data sources or SAS data sets to and from Hadoop. You can profile, cleanse, join and transform your big data – and tap into the endless opportunities and transformative powers of advanced analytics.
SAS data expert Matt Magne explains how SAS Data Loader for Hadoop tackles the Hadoop skills shortage and empowers you to prepare, integrate and cleanse big data.
- Intuitive user interface. Easily access, transform and manage data stored in Hadoop with a web-based interface that reduces training requirements.
- Purpose-built to load data to and from Hadoop. Built from the ground up to manage big data on Hadoop; not repurposed from existing IT-focused tools.
- Big data quality and profiling. Built-in directives include casing, gender and pattern analysis, field extraction, match-merge and cluster-survive. Profiling runs in-parallel on the Hadoop cluster for better performance.
- Chaining and scheduling directives. Group multiple directives to run simultaneously or one after the other. And use the exposed Public API to schedule and automate directives.
- Data management with Spark. Data quality functions run in memory on Spark. Spark matching and best record creation enables master data management for big data, and you can read and write to Spark data sets as needed.
- Big data integration. Import data from CSV and other delimited files into Hadoop. Run HiveQL commands, and delete rows on Hadoop tables.
- In-memory analytics server. Load data in memory to prepare it for high-performance reporting, visualization or analytics.
- In-cluster code and data quality execution. Execute analytics and data quality processing within Hadoop for fast, budget-friendly results. Minimize data movement for increased scalability, governance and performance.