Clickstream Transformations

SAS Data Surveyor for Clickstream Data adds a number of transformations to the Transformations tree in SAS Data Integration Studio. Most of these transformations are added to the Clickstream Transformations subfolder. The Directory Contents transformation is added to the Access subfolder.
The main clickstream transformations are Clickstream Log, Clickstream Parse, and Clickstream Sessionize. These transformations are used in every job, and they are responsible for the bulk of the ETL performed on clickstream data. For more detail about the functions of these transformations, see Log Transformation, Parse Transformation, and About the Clickstream Sessionize Transformation.
The main transformations are described in the following table:
Clickstream Transformations
Name
Description
Inputs from and Outputs to
Clickstream Log transformation
Reads data from a clickstream log. Identifies the type of log to be processed. Maps input fields from the log to the clickstream parse input columns as described in Clickstream Parse Input Columns. Loads an output table with data from the log. For more information, see Log Transformation.
By default, this transformation will decode both URL encoded data and character encoded data.
From: Web log
To: Log Output table
Clickstream Parse transformation
Reads the output from the Log transformation. Maps the clickstream parse input columns to output columns in a target table for continued processing. Filters unwanted data records from the target table, according to user-defined rules. Enables the definition of a cookie, a query string, or a referrer parameter to be parsed and stored as new data items in the target table. If possible, uniquely identifies the visitor who is associated with each data record and adds the visitor ID as a new data item in the target table. For more information, see Parse Transformation.
From: Log Output table
To: Parse Output table
Clickstream Sessionize transformation
Reads the output from the Parse transformation. Identifies user sessions. Performs additional visitor ID analysis. Identifies and manages non-human visitors (such as spiders). Manages sessions that span Web logs. For more information, see Sessionize Transformation.
From: Parse Output table
To: Sessionize Output table
The following table describes the more specialized clickstream transformations in the order in which they are commonly used. Each of these transformations supports a special task in the template jobs that are installed with the SAS Data Surveyor for Clickstream Data.
Specialized Clickstream Transformations
Name
Description
Clickstream Setup transformation
Generates the folder structure on the file system to hold the SAS logs and any generated data files. It also generates configuration data if necessary and test Web log data for the template jobs. Used in clickstream setup jobs.
Directory Contents transformation
Generates a SAS data table that contains a numerical listing of the files found in a path or list of paths, and if selected, their subfolders. It is used in the Standard Web Logs Basic Template job as described in Prepare Data and Parameter Values to Pass to Loop 1.
Clickstream Create Groups transformation
Combines the grouped output from several calls to the Clickstream Parse transformation into a set of output views, one per group. It is used in the Standard Web Logs Basic Template job as described in Combine Groups.
Clickstream Create Detail transformation
Combines the output from multiple Clickstream Sessionize transformations and creates a single data table. It is used in the Standard Web Logs Basic Template job as described in Create Detail and Generate Output.
For more detail about the functions of these transformations, see Specialized Transformations.