Compressing Data

Compression is a process that reduces the number of bytes that are required to represent each table row. In a compressed file, each row is a variable-length record. In an uncompressed file, each row is a fixed-length record. Compressed tables contain an internal index that maps each row number to a disk address so that the application can access data by row number. This internal index is transparent to the user. Compressed tables have the same access capabilities as uncompressed tables. Here are some advantages of compressing a file:
  • reduced storage requirements for the file
  • fewer I/O operations necessary to read from or write to the data during processing
Here are some disadvantages of compressing a file:
  • More CPU resources are required to read a compressed file because of the overhead of uncompressing each observation.
  • There are situations when the resulting file size might increase rather than decrease.
These are the types of compression that you can specify:
  • CHAR to use the RLE (Run Length Encoding) compression algorithm, which works best for character data.
  • BINARY to use the RDC (Ross Data Compression) algorithm, which is highly effective for compressing medium to large (several hundred bytes or larger) blocks of binary data.
You can compress these types of tables:
  • all tables that are created during a SAS session. Besides specifying SAS system options on the command line or inside a SAS program with the OPTIONS statement, you can use SAS Data Integration Studio to set system options. For example, you can use the System Options field to set the COMPRESS= system option on a table loader transformation. (A table loader transformation generates or retrieves code that puts data into a specified target table.)
    The Options Tab in a Table Loader Properties Dialog Box in SAS Data Integration Studio
    The Options Tab in a Table Loader Properties Dialog Box in
SAS Data Integration Studio
  • all tables for a particular library. For example, when you register a Base SAS engine library in the metadata, you can specify the COMPRESS= option in the Other options to be appended field on the Options for any host tab (see Setting LIBNAME Options That Affect Performance of SAS Tables). For third-party relational database tables, you can use the Options to be appended field on the Other Options tab (see Setting LIBNAME Options That Affect Performance of SAS/ACCESS Databases).
    Note: You cannot specify compression for an SPD Engine data library.
  • an individual table. In SAS Data Integration Studio, SAS tables have a Compressed option that is available from the table properties dialog box. To use CHAR compression, you select YES. To use BINARY compression, you select Binary.
    The Table Options Dialog Box in SAS Data Integration Studio
    The Table Options Dialog Box in SAS Data Integration Studio
    For SPD Engine tables and third-party relational database tables, you can use the Table Options field in the table properties dialog box to specify the COMPRESS= option.
Note: The SPD Engine compresses the data component (DPF) file by blocks as the engine is creating the file. (The data component file stores partitions for an SPD Engine table.) To specify the number of observations that you want to store in a compressed block, you use the IOBLOCKSIZE= table option in addition to the COMPRESS= table option. For example, in the Table Options field in the table properties dialog box, you might enter COMPRESS=YES IOBLOCKSIZE=10000. The default blocksize is 4096 (4k).
When you create a compressed table, SAS records in the log the percentage of reduction that is obtained by compressing the file. SAS obtains the compression percentage by comparing the size of the compressed file with the size of an uncompressed file of the same page size and record count. After a file is compressed, the setting is a permanent attribute of the file, which means that to change the setting, you must re-create the file. For example, to uncompress a file, in SAS Data Integration Studio, select Default (NO) for the Compressed option in the table properties dialog box for a SAS table.
For more information about compression, see SAS Data Set Options: Reference.