Big Data Certification

Become a SAS® Certified Big Data Professional

 

Thinking about getting your big data certification? It's a worthwhile step, as the ability to keep up with constantly changing technologies can make or break your career.

The SAS Certified Big Data Professional program – offered by the SAS Academy for Data Science – can give you the extra edge you're looking for.

SAS® Certified Big Data Professional

Format: Instructor-led classroom
Duration: 6 weeks
Location: Cary, NC
Questions? Call 1-800-727-0025 (US & Canada) to speak with a program adviser.


About the Big Data Certification Program

Is this program right for me?

The big data certification program is ideal for individuals ready to build on their basic programming knowledge by learning how to gather and analyze big data in SAS.

Prerequisites

To enroll in the SAS Certified Big Data Professional program, you need at least six months of programming experience in SAS or another programming language. Need to brush up on your programming skills? We recommend that you first complete the SAS Programming 1: Essentials course, which is available as an instructor-led or free online e-learning course. You might also find the SAS Programming 2: Data Manipulation Techniques course helpful.


What you will learn

The academy's big data certification program focuses on these areas:

  • Critical SAS programming skills.
  • Accessing, transforming and manipulating data.
  • Improving data quality for reporting and analytics.
  • Fundamentals of statistics and analytics.
  • Working with Hadoop, Hive, Pig and SAS.
  • Exploring and visualizing data.

Curriculum

The six-week big data certification program curriculum includes the following courses:

Big Data Challenges and Analysis-Driven Data

This course provides an overview of the challenges associated with big data and analysis-driven data.

Topics Covered

  • Reading external data files.
  • Storing and processing data.
  • Combining Hadoop and SAS.
  • Recognizing and overcoming big data challenges.

SAS Fundamentals: Programming, SQL and Macro Language

This course focuses on data manipulation techniques using the DATA step and SQL procedure to access, transform, join and summarize SAS data sets. You'll learn how to use components of the SAS macro facility to make text substitutions in SAS code and to write simple macro programs.

Topics Covered

  • Summarizing and presenting data.
  • Querying and subsetting data.
  • Transforming character, numeric and date variables.
  • Combining SAS data sets, including complex joins and merges.
  • Performing DO loop and SAS array processing.
  • Restructuring or transposing SAS data sets.
  • Performing text substitution in SAS code.
  • Using macro variables.
  • Creating simple macro definitions.

Exploring Data With SAS Visual Analytics

In this course, you'll learn how to use SAS Visual Analytics Explorer to explore in-memory tables from the SAS® LASR™ Analytic Server and perform advanced data analyses.

Topics Covered

  • Finding previously unknown relationships and spotting trends in your data.
  • Visualizing data using charts, plots and tables.
  • Using the autocharting function to visualize data in the best possible way.
  • Using advanced graphs, such as network diagrams, Sankey diagrams and word clouds.
  • Easily adding analytics to your graphs, and including descriptions of the analytics results.
  • Navigating through your data using on-the-fly hierarchies.

Statistics 1: Introduction to ANOVA, Regression and Logistic Regression

This introductory SAS/STAT® course focuses on t-tests, ANOVA and linear regression, and includes a brief introduction to logistic regression.

Topics Covered

  • Generating descriptive statistics and exploring data with graphs.
  • Performing analysis of variance and applying multiple comparison techniques.
  • Performing linear regression and assessing the assumptions.
  • Using regression model selection techniques to aid in the choice of predictor variables in multiple regression.
  • Using diagnostic statistics to assess statistical assumptions and identify potential outliers in multiple regression.
  • Using chi-square statistics to detect associations among categorical variables.
  • Fitting a multiple logistic regression model.
  • Scoring new data using developed models.

Preparing Data for Analysis and Reporting

In this course, you'll learn how to perform data management tasks, such as improving data quality, entity resolution and data monitoring.

Topics Covered

  • Creating and reviewing data explorations.
  • Creating and reviewing data profiles.
  • Creating data jobs for data improvement.
  • Establishing monitoring aspects for your data.
  • Understanding the QKB components.
  • Using the component editors.
  • Understanding various definition types.
  • Building a new data type (optional).

Introduction to SAS and Hadoop: Essentials

This course teaches you how to use SAS programming methods to read, write and manipulate Hadoop data. You'll learn how to use Base SAS methods to read and write raw data with the DATA step, manage the Hadoop Distributed File System (HDFS) and execute MapReduce and Pig code from SAS via the HADOOP procedure. You'll also learn how to use SAS/ACCESS® Interface to Hadoop methods that allow LIBNAME access and SQL pass-through techniques to read and write Hive or Impala table structures.

Topics Covered

  • Accessing Hadoop distributions using the LIBNAME statement and the SQL pass-through facility.
  • Creating and using SQL procedure pass-through queries.
  • Using options and efficiency techniques for optimizing data access performance.
  • Joining data using the SQL procedure and the DATA step.
  • Reading and writing Hadoop files with the FILENAME statement.
  • Executing and using Hadoop commands with PROC HADOOP.
  • Using Base SAS procedures with Hadoop.

DS2 Programming Essentials With Hadoop

This course focuses on DS2, a fourth-generation SAS proprietary language for advanced data manipulation, which enables parallel processing and storage of large data with reusable methods and packages.

Topics Covered

  • Identifying the similarities and differences between the SAS DATA step and the DS2 DATA step.
  • Converting a Base SAS DATA step to DS2.
  • Creating DS2 variable declarations, expressions and methods for data conversion, manipulation and conditional processing.
  • Creating user-defined and predefined packages to store, share and execute DS2 methods.
  • Creating and executing DS2 threads for parallel processing.
  • Leveraging the SAS In-Database Code Accelerator to execute DS2 code outside of a SAS session.
  • Executing DS2 code in the SAS High-Performance Analytics grid using the HPDS2 procedure.

Big Data Analysis With Hive and Pig

In this hands-on course, you'll use processing and analysis to find insights in structured and unstructured big data. You'll learn how to organize structural data in tabular format using Apache Hive and how to analyze the data using the Hive query language (HiveQL). You'll use the Apache Pig scripting language to perform batch processing tasks, such as extract, transform, load (ETL), data preparation and analytics.

Topics Covered

  • Moving data into the Hadoop ecosystem.
  • Using Hive to design a data warehouse in Hadoop.
  • Performing data analysis using HiveQL.
  • Joining data sources.
  • Performing ETL.
  • Organizing data in Hadoop by usage.
  • Performing analysis on unstructured data using Pig.
  • Joining massive data sets using Pig.
  • Using user-defined functions (UDFs).
  • Analyzing big data in Hadoop using Hive and Pig.

Getting Started With SAS In-Memory Statistics

This course focuses on accessing data on the SAS LASR Analytic Server and performing exploratory analysis and preparation. Topics include starting the server, loading data and manipulating data on the SAS LASR Analytic Server using the IMSTAT procedure. IMSTAT topics include deriving new temporary and permanent tables and columns, calculating summary statistics (e.g., mean, frequency and percentile), and creating filters and joins on in-memory data.

Topics Covered

  • Starting up a SAS LASR Analytic Server.
  • Loading tables into memory on the SAS LASR Analytic Server.
  • Processing in-memory tables with PROC LASR and PROC IMSTAT.
  • Accessing data more efficiently via intelligent partitioning.
  • Deriving new temporary and permanent tables and variables.
  • Creating filters and joins on in-memory data.
  • Exporting ODS result tables for client-side graphic development.
  • Producing descriptive statistics including counts, percentiles and means.
  • Creating multidimensional summaries including cross-tabulations and contingency tables.
  • Deriving kernel density estimates using normal functions.

Working With SAS Data Loader for Hadoop

This course focuses on profiling, integrating, cleansing and moving big data in a Hadoop environment – without having to write code – using an intuitive, web-based interface.

Topics Covered

  • Moving data in and out of Hadoop.
  • Interrogating and profiling data for quality issues.
  • Transforming, transposing and joining data that is fit for purpose.
  • Cleansing and integrating data suitable for analysis and reporting.
  • Loading data into the SAS In-Memory Analytics server for analytics and exploration.
  • Executing custom SAS and HiveQL code inside the Hadoop cluster.


In the big data certification program courses, you will learn to use the following SAS software:

  • Base SAS®
  • SAS® Data Loader for Hadoop
  • SAS® Enterprise Guide®
  • SAS® Enterprise Miner
  • SAS® In-Memory Statistics
  • SAS® Studio
  • SAS/STAT®
  • SAS® Visual Analytics
  • DataFlux® Data Management Server
  • DataFlux® Data Management Studio

                Certification exams

                The big data certification program includes two certification exams. To earn the SAS Certified Big Data Professional credential, you must pass both exams:

                Already have the skills covered in the big data certification program courses? If so, you may take the exams without completing the coursework. Get details – including test dates, locations and fees – using the links above.


                Upcoming Sessions

                2016 DatesProgramsFeeLocations
                11JAN – 18FEBSAS Certified Big Data ProfessionalUS$9,000Cary, NC
                Closed
                29FEB – 07APRSAS Certified Advanced Analytics Professional
                US$9,000Cary, NCRegister now
                11JAN – 18FEB
                29FEB – 07APR
                SAS Certified Data Scientist
                US$16,000
                (a $2,000 savings!)
                Cary, NCClosed
                06JUN – 15JULSAS Certified Big Data Professional
                US$9,000Cary, NC
                Register now
                25JUL – 01SEPSAS Certified Advanced Analytics Professional
                US$9,000Cary, NC
                Register now
                06JUN – 15JUL
                25JUL – 01SEP
                SAS Certified Data Scientist
                US$16,000
                (a $2,000 savings!)
                Cary, NC
                Register now

                 

                About SAS® Certification

                The SAS Global Certification program launched in 1999 to validate the skills and knowledge of SAS users and partners. Participants can currently earn credentials in SAS Programming, Predictive Modeling, Administration, Data Management, Business Intelligence, Big Data, Advanced Analytics and Data Science. Since the program's inception, the number of SAS certifications awarded each year has seen double-digit growth. To date,  nearly 90,000 SAS credentials have been awarded to individuals in 77 countries.