Running a Basic Template Job

Overview

Sometimes you want to process the data contained in more than one clickstream log in a single job or improve performance by using parallel processing. In this case, you can process the job in the appropriate standard Web log template job. Unless you need to process SAS page tags or customer intelligence data, you should use the standard Web logs basic template.
If you have not done so already, you should run a copy of the setup job for the standard Web logs basic template, which is named clk_0010_weblog_basic_setup. When you actually process the data, you should run a copy of the standard Web logs job, which is named clk_0200_weblog_basic_load_weblog_detail. For information, see Copying the Folder Structure of a Clickstream Job. By running a copy, you protect the original template.
Perform the following tasks to run the template:

Review and Prepare the Job

You can examine the standard Web logs basic template job on the Diagram tab of the SAS Data Integration Studio Job Editor before you run it. You can also configure the job to change the list of logs that you process and set the number of groups that are used in the sessionizing loop. Finally, you can specify parallel and multiple processing options.
Perform the following steps to make these adjustments:
  1. Open the renamed standard Web logs basic template job.
  2. Scroll through the job on the Diagram tab.
    Note the following components:
    • the two loops and the connections between them
    • the transformations that prepare the clickstream logs and groups for loop processing
    • the output table that collects the results from the job
    For an overview of how the job is processed, see Stages in Template Jobs.
  3. Right-click the Log_Paths table and select Open from the pop-up menu. Review the list of log paths contained in the table. If you need to modify this list, you can click Switch to edit mode in the toolbar and make any needed changes.
  4. Open the Loop Options tabs in the property windows for the two Loop transformations and make sure that the appropriate parallel processing settings are specified. Be particularly careful to ensure that the path specified in the Location on host for log and output files field is correct.
    For information about the prerequisites for parallel processing, see the “About Parallel Processing” topic in the Working with Iterative Jobs and Parallel Processing chapter in the SAS Data Integration Studio: User's Guide. Of course, your job fails if parallel processing has been enabled but the parallel processing prerequisites have not been satisfied.
  5. Open the Parameters tab in the properties window for the template job and review these two parameters: Number of Distinct Clickstream Parse Output Paths and Number of Groups into which data should be divided. To access these values, select the parameters and click Edit to access the Edit Prompt window. Then, click Prompt Type and Values to review the number of groups specified in the Default value field. Click OK until you return to the Diagram tab.
    Note: The value for these parameters must match the value entered for the setup job. The setup job values are entered on the Options tab in the properties window for the Setup transformation in the setup job. If you change either of these values in the template job, you need to rerun the setup job to make sure that the settings match and that the supporting file system structure is generated.

Run the Job and Examine the Output

Perform the following steps to run a basic job and examine its output:
  1. Run the job.
    The following display shows the output of a completed job:
    Completed Basic Job
    Completed Basic Job
  2. If the job completes without error, right-click the WEBLOG_DETAIL table at the end of the job and select Open from the pop-up menu.
    The View Data window appears, as shown in the following display.
    Basic Job Output
    Basic Job Output

Troubleshooting

If the job does not complete successfully, then you might want to examine the logs for each loop in the job. Failures in the main flow of the job are often caused by failures in one or more of the loop SAS sessions. Examine the Status tab to determine where the error occurred and refer to the log for that part of the job. A SAS log is saved for each pass through the loops in the Multiple Log Template Job. These logs are placed in a folder called Process Logs under the Loop1 and Loop2 folders in the structure that is created by the template setup job.
In order to know which file you are looking for, you should understand the naming conventions for these log files. The files in the ProcessLogs folder are named Lnn_x.log, where nn is a unique number for this particular Loop transformation and x is a number that represents the iteration of the current loop. For example, if you process 200 Web logs, then the ProcessLogs folder for Loop1 (Clickstream Log transformation and Clickstream Parse transformation) contains 200 logs named Lnn_1.log to Lnn_200.log (where nn is some constant number).
The ProcessLogs folder for Loop2 (Clickstream Sessionize transformation) has the same naming convention. However, the log folder for Loop2 contains one log for each group. For example, if the Clickstream Parse transformation in the first loop generated five groups, then the logs are named Lnn_1.log to Lnn_5.log (where nn is a constant number).
Note: The log files in the Process Logs folder are not overwritten during subsequent runs of this job. Consider clearing the Process Logs folder between runs during job development to avoid accumulating a large number of logs.