SASSAS | The Leader in Business Intelligence -- Superior Software that gives you The Power to Know
  NewsEventsConsultingCareersContact UsResource Center
Home Products and Solutions Customers Partners Company Customer Support
 
Partners
Alliance Programs
Application Process
Partner Directory
Contact Us
 



Maureen Chew, Sun Microsystems
Leigh Ihnen, SAS Institute
Tom Keefer, Sun Microsystems


Peace between SAS Users & Solaris/Unix System Administrators:
Finding a Middle Ground

Abstract:
SAS User: "I don't want to be a Unix guru to get good performance."
Unix Sys Admin: "I don't want to be a SAS guru to provide good performance."

There are a number of issues from both the perspective of the SAS user and the Solaris/Unix systems administrator to consider when trying to create predictable performance in a multi-user or multi-load environment. Often, unproductive finger pointing between the 2 groups results with each group communicating in a techno jargon that the other group does not understand.

Here in the SAS Institute Customer Technology Center, we work closely with a number of customers in a wide variety of situations that include addressing inquiries with regards to performance and optimization. The intent of this paper is to summarize a number of commonly discussed scenarios. Additionally, we hope to establish a middle ground for both the systems administrators and SAS users in order to provide enough information, in relevant terminology, such that both groups can work together when addressing capacity planning, performance questions and/or problem resolution.


Contents


What makes it difficult?

Here at the Customer Technology Center(CTC) at the SAS Institute, we often receive calls regarding performance; especially performance under load. We often hear "SAS applications require a tremendous amount of memory.  If it doesn't, why do you recommend configurations with so much?".  Many, if not most, of the examples discussed have come from customer questions or inquiries.

If a talented person is locked in a room  with access to an unlimited supply of building materials and tools and asked to build something "spectacular", you can imagine the range of possibilities of results. For example, you could end up with a house or a  boat or weapon capable of destruction. The SAS software platform, includes a 4GL programming language along with an array of sophisticated tools and applications which allow SAS programmers the capability and flexibility to create an infinite range of applications. Coupled with the fact that there are SAS/ACCESS interfaces which have the ability to connect to some 50 types of data bases and application suites, the possibilities for great results are limited only by the creativity and talent of the users and application developers.

SAS in it of itself does not use alot of memory. However, like a car where you have control over how fast you go, you can tell certain SAS applications to use more memory. Sometimes doing this will allow your job to run faster, sometimes it will make no difference and sometimes it can hurt. Sometimes you can drive faster only to find out that you arrived at the traffic deadlocked spot faster and you arrive at your destination no sooner than if you had driven at posted speed. The same is true for certain SAS applications. Similarly, "large" problems require "large" amounts of resources to handle. The point of this is that most SAS applications don't require much memory while some do require a large amount of memory. Thus it is untrue and unfair to say the SAS applications require alot of memory. It would probably be a fair statement though, that in most cases where performance is an issue, memory is usually the culprit (as opposed to CPU, I/O, network bound applications)

Users and administrators we speak to seem to have the most difficulty with the concept of being able to characterize the concurrent peak SAS workload. Unfortunately, this takes both knowledge from the SAS users and the systems administrators. Individual SAS users can tell you about their particular types of jobs but don't have knowledge about the hardware and OS configurations. Administrators may know the configurations both don't understand the application resource requirements. Additionally, it is extremely difficult to gather complete and accurate information from all SAS users especially if there may be hundreds of the them. So, it's this collective and cooperative requirement that makes sizing of a system for SAS applications or analysis of performance issues much more difficult.

Lastly, the SAS software platform, is just that. It is a collection of fully programmable and configurable technologies . The power and flexibility available to application developer's is not quantifiable. As such, it is impossible to make "one size fits all" rules and recommendations. Thus, a disclaimer, with any guidelines and suggestions enclosed herein; "Your mileage may vary" and you may experience affects which are completely opposite of what is presented.


Looking into What's Happening
  • Turn on the "fullstimer" option
  • System and Test Configurations
  • Conventions

Turn on the "fullstimer" option

This option is key to gathering performance data from the SAS application perspective. When enabled, the SAS log will contain detailed timing and memory usage on a PROC by PROC (as well as data step) basis. For applications run in batch or background mode, the timing totals will be printed in summarized form. An example for the PHREG proc:

NOTE: PROCEDURE PHREG used:
      time:                           memory:
         real        3:06.380            page faults   0
         user cpu    2:34.728            page reclaims 0
         system cpu  27.287 seconds      usage         30.99 M
      block I/O operations:           context switches:
         input       1                   voluntary     521
         output      75                  involuntary   2036

This option can either be invoked from within a SAS program:

         options fullstimer;

or it can be added on the command line during program invocation:

         $ <SAS_INSTALL_DIR>/sas -fullstimer myprog.sas 

The items of most interest are the times and the amount of memory used. This information is obtained from the library call getrusage(3C). The man page (man getrusage) can provide all the gory details of what the fields exactly mean.  A brief explanation of real, user and system times:

  • Real time represents wall clock time.
  • User time is the CPU time spent executing user or application code.
  • System time is the CPU time spent in the kernel performing system functions on behalf of the user application.

Differences in time between real time and user+system can be attributed to any one or all of:

  • CPU contention - time spent waiting for a CPU time slice
  • Paging or swapping
  • I/O contention
  • Network
  • Waiting for a lock
  • Other running processes

Applications which have very close real and user times are typically CPU intensive. In these cases, performance can be increased by using faster CPUs. Many times, optimizing the application can realize performance gains as well.

If you are seeing a large differential between the wall clock time and user+system, there is potential for performance improvement.  The critical next step would be to identify which factors are the root cause or causes for the difference.  These factors are often a combination of a hardware configuration limitation and/or application inefficiency.

FULLSTIMER will also report the voluntary and involuntary context switches.  These fields are documented in the ru_nvcsw and ru_nivcsw section of the getrusage(3C) man page. Voluntary switches usually represent wait states on a resource so high numbers in this field are not necessarily bad.  However, a high number of involuntary context switches usually indicate a resource constraint.

Conventions

Shell commands that can be run by any user are prefaced with the character '$'. Commands which must be run by root or the super-user are prefaced with '#'.

System and Test Configurations

Testing was done on a pair of Sun E4000s with 1.5 GB RAM and model 100 Sun Storage Arrays running Solaris 2.6 and the SAS System, Release 6.12. We also re-ran some tests with Solaris 7 and Version 8 of the SAS System. Although all testing was done on Solaris, a majority of the information should be applicable to all the major Unix platforms. The portions which are not applicable possibly would be certain command line options to the tools and availability of certain tools (both bundled or public domain) referenced.

Much of the testing was done with a household census data set which had 115 variables, record length 200 bytes, 5.5M observations, for a total data set size of 1.1 GB.

When discussing performance issues, it is essential to provide basic system configuration information such as the table below. The commands used to gather this information is also discussed.

Sample System Information
Item Example
Hardware Platform Sun E4500
# CPUs 8
Clock Speed 336 MHz
Memory or RAM 1.5 GB
Storage Platform Sun A5000
Storage Layout Data area - 30 GB RAID 0+1
WORK area - 30 GB RAID 0
VERITAS Volume Manager
VERITAS File System
SWAP Configuration Total 8 GB striped across 2 spindles
OS Solaris 2.6
SAS Version 6.12 TS050


As is the case with Unix, there are usually several ways to accomplish the same task.

Let's start with the software first.

The version of the SAS System is placed in the SAS LOG window if running in display mode. You can also get this information from a shell or command mode:

    $ <SAS_INSTALL_DIR>/sas -nodms
will print:
$ /u1/sas612/sas -nodms NOTE: Copyright (c) 1989-1996 by SAS Institute Inc., Cary, North Carolina, USA. NOTE: SAS (r) Proprietary Software Release 6.12 TS050 <cntrl-D> to exit.

To determine the version of Solaris:

    $ cat /etc/release
                 Solaris 2.6 s297s_smccServer_37cshwp SPARC
        Copyright 1996 Sun Microsystems, Inc.  All Rights Reserved.
                   Manufactured in the USA 18 July 1997

Alternatively, uname -a works but requires a translation from the version of the kernel/base OS (SunOS 5.6) to the complete Solaris umbrella name (Solaris 2.6). To list the swap configuration:

   $ /usr/sbin/swap -l
   swapfile             dev  swaplo blocks   free
   /dev/dsk/c1t10d0s1  32,73     16 1048784 1002016

Blocks are 512 KB so there is ~500 MB of SWAP configured  above.

There is no easy way for users to determine the storage platform and layout from command line options so this information must be provided by the systems administrator. The command:

$ df -k

will show the mount points; this can be used to verify that WORK or data areas are physically located on the system and are not NFS storage areas.

The hardware configuration can be determined in several ways, either with a combination of prtconf(1M) and dmesg(1M) or with prtdiag.

   $ /usr/sbin/prtconf -v | more
   System Configuration:  Sun Microsystems  sun4u
   Memory size: 1536 Megabytes
   System Peripherals (Software Nodes):

   SUNW,Ultra-Enterprise
   .....

Some prefer using the more obscure prtdiag(1M) command as it gives detailed system configuration which includes # CPUs, clock speed, e-cache, adaptor cards, memory interleave, board temperature, etc.: This command is located in the SUNWkvm package and may not be available for non UltraSPARC based systems. Prtdiag will return all the necessary hardware configuration info with the exception of the storage platform.


/*"SUNW,Ultra-Enterprise" below should be replaced with the
output of `uname -m` or particular architecture of your system. */

$ /usr/platform/SUNW,Ultra-Enterprise/sbin/prtdiag

System Configuration:  Sun Microsystems  sun4u 8-slot Sun Enterprise 4000/5000
System clock frequency: 82 MHz
Memory size: 1536MB

========================= CPUs =========================

                    Run   Ecache   CPU    CPU
Brd  CPU   Module   MHz     MB    Impl.   Mask
---  ---  -------  -----  ------  ------  ----
 0     0     0      248     4.0   US-II    1.1
 0     1     1      248     4.0   US-II    1.1
 2     4     0      248     4.0   US-II    1.1
 2     5     1      248     4.0   US-II    1.1
 4     8     0      248     4.0   US-II    1.1
 4     9     1      248     4.0   US-II    1.1
 6    12     0      248     4.0   US-II    1.1
 6    13     1      248     4.0   US-II    1.1


========================= Memory =========================

                                              Intrlv.  Intrlv.
Brd   Bank   MB    Status   Condition  Speed   Factor   With
 0     0     256   Active      OK       60ns    4-way     A
 0     1     256   Active      OK       60ns    2-way     B
 2     0     256   Active      OK       60ns    4-way     A
 4     0     256   Active      OK       60ns    4-way     A
 6     0     256   Active      OK       60ns    4-way     A
 6     1     256   Active      OK       60ns    2-way     B

========================= IO Cards =========================

     Bus   Freq
Brd  Type  MHz   Slot  Name                              Model
---  ----  ----  ----  --------------------------------  ----------------------
 1   SBus   25     1   cgsix                             SUNW,501-1717
 1   SBus   25     3   SUNW,hme
 1   SBus   25     3   SUNW,fas/sd (block)
 1   SBus   25    13   SUNW,soc/SUNW,pln                 501-2069
 3   SBus   25     2   SUNW,soc/SUNW,pln                 501-2069
 3   SBus   25     3   SUNW,hme
 3   SBus   25     3   SUNW,fas/sd (block)
 3   SBus   25    13   SUNW,soc/SUNW,pln                 501-2069
........


Lastly,  truss(1M) is another often used tool which allows a process to trace and display all system calls. There have been countless times where we used truss to debug generic problems (i.e.; a process may fail because the user doesn't have write permission in a temp directory. In this case, the open system call would be shown and would have failed with the resulting errno(3C). The features of truss(1M) in Solaris 7 has been expanded to include library calls so you can follow the calling sequence through libc, libthread, etc.


I/O..., I/O..., It's Off to the Disk We Go...
  • SAS System I/O Requirements
  • General Guidelines
  • RAID issues - RAID 5, Striping/Mirroring, Stripe interlace
  • SWAP Configuration
  • SAS BUFSIZE/BUFNO / Solaris Buffer Cache
  • Data Set Compression
  • File System
    • UFS Tuning
    • Direct I/O
    • Other Kernel Parameters
    • VERITAS VxFS vs UFS

SAS System I/O Requirements

In the area of SAS applications and I/O, there are only a few very simple concepts to keep in mind. Conceptually, there are 2 distinct I/O areas of concern to SAS applications. In practice, this can fan out to potentially many different directories.

There are SAS data areas and WORK or scratch areas. Data areas are specified programmatically by the  SAS System libname directive. Thus, they could be any writable directory on Unix platforms including NFS (but hopefully not).  Any data in the WORK area is removed when the SAS application terminates properly. Note, jobs that terminate abnormally could possibly leave large unusable work areas.
It should be obvious that I/O configurations are highly dependent on site specific, user specific and application specific factors. Thus, there is no one set of I/O configuration guidelines that fit all (or most for that matter) situations. However, here are a few general considerations:

General Guidelines

  • Spread the load out over as many spindles, controllers  and paths back to the host as possible.
  • Separate your data and WORK areas; The default WORK area is /usr/tmp or /tmp and should be modified to point to a separate logical volume which has been configured for heavy, write intensive activity. If all system users are using the same WORK area, this could easily become a system bottleneck.  While it is conceptually very easy to set up separate WORK areas which may correspond to different I/O channels, educating users to take advantage or change their configurations may prove more difficult than one might think. If the plan is to use 1 large WORK area for all users, maximize the I/O paths.  This may involve striping to different storage cabinets if necessary.
    Since the WORK area is temporary, and typically doesn't need to be backed up, configure appropriately. When sizing the WORK volumes, there are no set rules. Size is specifically application dependent. Additionally, users can code their application to make more (or less) use of the SAS WORK area. Many applications (i.e.: SORT) will need to make complete copies of their input data sets before deleting the original.  This temporary copy is usually made in the "working" area and not necessarily in the SAS WORK directory.
  • Since NFS mount points are transparent, users may not realize that access of large data files is occurring at network speeds causing serious application performance degradation as well as severe network congestion. Avoid NFS access of large data sets if performance is a primary concern.
  • Don't overlook the network. Numerous SAS/CONNECT sessions requiring large data transfers can cause network and I/O bottlenecks system wide. Look at the jobs and the amount of data being transferred. Can the data be subset before being brought over the wire? Do more network interfaces need to be added to the system allowing for more network bandwidth? If using 100 Mb ethernet, a switched hub is essential..

RAID issues - RAID 5, Striping/Mirroring, Stripe interlace

Areas that are write intensive (typically WORK) should avoid using software RAID 5(parity) and configure with RAID 0+1 (striped mirrors) or RAID 0(striped). Due to the nature of RAID 5, a logical write requires on the order of 4-5 I/Os. Additionally, the parity calculations require a non-trivial amount of CPU time. As a test, using a simple SAS data step, we copied our household dataset (~1 GB) to a striped partition (RAID 0) and then to a RAID 5 partition (using software RAID).

RAID 0 vs. RAID 5 data set copy
Real User System
RAID 0

2:33.7

 57.1

1:02.6

RAID 5 11:54.3  56.4

1:47.9


Hardware RAID 5 platforms such as the Sun StorEdge A3500 and the A1000 can provide RAID 5 performance comparable to that of RAID 0. RAID 0 offers no redundancy and thus should obviously not be used in 24x7 mission critical environments.

Logical Volume Stripe Unit - Most workgroup and enterprise server configurations use a logical volume manager such as the VERITAS Volume Manager to configure and manage the storage platforms. In a striped configuration, the volume unit size (also known as the interlace, chunk size, segment size )times the number of columns or disks equal the stripe width. What stripe unit should be chosen? In the absence of knowing anything about the application, we simply suggest choosing 64K. With large blocked sequential I/O, its best to choose a moderate size stripe unit. This allows more I/O's to be spread out across all the spindles of the stripe column and for the read-ahead buffers to be handled more efficiently. From observations using truss(1M), you might notice that a SAS application will typically issue relatively small read/write requests, usually on the order of 8K or 16K. Thus, 64K or 128K should be a good target stripe unit. Classical SAS applications often depend on sequential I/O - i.e.: performing multiple iterations, record by record, over an entire data set. However, certainly, making extensive use of indexed data sets will usually not be sequential at all in nature. So, even in cases where random I/O may dominate(i.e.: using indexes with high cardinality in the data which return small result sets), a stripe unit of 64k is still generally a reasonable choice.

SWAP Configuration

What about SWAPFS as a WORK area? There are times when this can be a performance win but in general, this should be avoided if working with large data sets. SWAPFS is a memory based file system which is backed by the SWAP partition. If you write large data files to this partition, you could easily induce paging.

Configure plenty of SWAP if using PROCs which have large virtual memory requirements. Many SAS procedures make extensive use of the mmap(2) system call. For every mmap'ed memory segment a corresponding amount of SWAP must be reserved even if it is not used. The pages are not allocated until needed. A number of PROCs will return "insufficient memory" errors if the SWAP reservation cannot be made. Thus you could be the only user on a quiet system configured with 8 GB of RAM and get an insufficient memory error if you do not have a large enough SWAP area. Some PROCs may produce unexpected results. For instance, we discovered in experimenting with different SORTSIZE values, that certain runs were using a variable amount of memory despite the fact that we knew exactly how much should have been used. We realized that our SWAP area was much too small to back the mmap requests depending on what else was going on system wide.. After allocating more SWAP area, the programs utilized the expected amount of memory. You can check for the amount of SWAP configured with the swap(1M) command (swap -l) and the amount reserved with swap -s. Note that the free space on the SWAP device does not equal the free SWAP available. This is because of the swapfs file system, /tmp. This partition is a combination of physical SWAP space and free memory. In the example below, the SWAP device reports 1.0+GB free while swap -s reports 1.7GB available swap -l gives the actual amount available on disk.

     $ swap -l
      swapfile             dev  swaplo blocks   free 
      /dev/dsk/c1t10d0s1  32,73     16 1048784 1002016

     $ swap -s
     total: 33800k bytes allocated + 5488k reserved = 39288k used, 1731016k 
available

If users specify large MEMSIZE requests, you must have a cumulative amount of SWAP area set aside. If in the unfortunate case where you do have paging to your SWAP area, configure your SWAP devices similar to your data areas. Optimize I/O- Rather than 3 or 4 single SWAP devices, consider a RAID 0 (striping) configuration. If you have a 7x24 environment, then RAID 0 will obviously not satisfy availability requirements. You can dynamically create and add SWAP areas with the mkfile(1M) and swap -a commands. Obviously, you want to avoid doing this on a disk partition that is under heavy load. As discussed in the section on memory, you can monitor the SWAP device as a simple way to check for paging.

SAS BUFSIZE/BUFNO Options / Solaris Buffer Cache

If you truss(1M) your SAS processes, you may find that most of the read/write(2) system calls use a fairly small buffer size, probably 8k or 16k.  You may find that performance of certain jobs is increased if you increase the BUFSIZE option in the SAS System for a specific data set. Note, this BUFSIZE  setting is specific to a SAS data set and its value is stored in the metadata header. Thus, the only way to change the buffer size would be to use a data step to copy the data set and specify the new BUFSIZE:

           data new (BUFSIZE=64k)
                        set libname.old;
                run;

You can query the BUFSIZE of the data set by issuing a PROC CONTENTS:

                proc contents data=libname.dataset;     
                run;

In our experimentation, we found a small but not appreciable performance benefit to increasing BUFSIZE. However, one of large enterprise customers in the medical insurance industry found significant improvement in performance when they aligned the SAS BUFSIZE and the data volume strip unit.  We need to mention that this testing and running of jobs was done in a very controlled environment.  If you have a well characterized job or set of jobs and you have this ability to regulate the runs, you might find that changing BUFSIZE produces good performance gains.


The Solaris file buffer cache in lessens the need to modify BUFSIZE. An area where changing BUFSIZE could potentially benefit would be if truss(1M) were to show that a file was opened synchronously(O_SYNC) and sync(2) was being issued to flush the contents of the output buffers. Note, that truss(1M) with Solaris 7 can show library calls so this could show up as an fsync(3C).Be careful in experimenting with this parameter as a number of PROCs make algorithmic decisions based on the value of BUFSIZE which could adversely affect performance. Although intuitively, you may want the SAS System to take advantage of larger I/O buffer capabilities that are more typical of today's enterprise configurations, it is probably, on the the average, not wise to change BUFSIZE as a blanket policy . For large data sets, it is a fairly expensive experiment since multiple copies of the data set are required during the testing phase.

To show the effects of reading a data set from the Solaris file buffer cache we show timings from 2 consecutive runs where we read our household data set (but do no writes):

                data _null_;   
                        set gold.hrecs(keep=state);   
                        /* no state codes of '00' */
                        where state = '00';    
                run;

        
Comparison of data set read from disk and buffer cache

Real

User System
1st time 2:05.83

5.02

21.98

2nd time

19.26

4.88

10.98


Obviously, there is a big advantage to being able to access files from the Solaris buffer cache.  We can use the Memtool command prtmem to show the distribution of memory.  Before the file copy above, prtmem would show something similar to:

$ prtmem

Total memory:            1468 Megabytes
Kernel Memory:            125 Megabytes
Application memory:        13 Megabytes
Executable  memory:        32 Megabytes
Buffercache memory:       184 Megabytes
Free memory:             1112 Megabytes

Note, that there is 180+ MB of memory being used for the buffer cache and 1.1 GB free

$ prtmem
Total memory:            1468 Megabytes
Kernel Memory:            125 Megabytes
Application memory:        13 Megabytes
Executable  memory:        32 Megabytes
Buffercache memory:      1272 Megabytes
Free memory:               24 Megabytes

After the data set copy, free memory went to 24 MB and buffer cache memory was 1.2 GB. So, this is why the free memory shown in vmstat(1) can sometimes be misleading The same is true for the page-in and page-out columns. There is plenty of memory available and as long as no one is requesting it, memory will be used for the buffer cache. This is directly relevant to the memory discussions as well as the priority paging section.

If the size of the data sets are larger than the file cache, it is possible that the buffer cache could get in the way.  See the section on Direct I/O below.

In our experience, we found that modifying the SAS System BUFNO option made no difference in performance nor did if seem to make a difference in the amount of memory used as reported by fullstimer. BUFSIZE and BUFNO make more of difference in legacy mainframe environments.

Data Set Compression
The SAS system has an option to compress data sets at creation by either specifying (COMPRESS=yes) in the data step or by using OPTIONS COMPRESS=YES;. Compression can typically save 2X the amount of space that a data set might normally occupy.  However, there is no such thing as a free lunch.  Saving disk space comes at the expense of increased CPU time to compress the data as it is written and/or decompress the data as it is read.  If your system is CPU bound then compressed data sets should probably not be used.  However, if your applications have a dominant I/O component, you may want to consider the compress option.  We have a documented case  of ~15% performance increase using compressed data sets versus uncompressed data sets.  This was working with ~2 GB data sets on a system with 336 MHz CPUs.  An indication that an application has a large I/O component is either a large system time as reported by FULLSTIMER or large differential between Real time and (User + System) time.  However, this could also be attributed to excessive paging as well.

File System

  • UFS tuning
    In general, there is no tuning to be done. If a large data volume is created to hold large files (as opposed to many small ones), consider using the option to decrease the number of inodes. The default is to create 1 inode per 2K of file space. If changed use to 1 inode per 32k, it not only provides more space for the file area but decreases the fsck(1M) time from ~1:25 to ~33 sec for a 20 GB file system. Also, for large volumes, you might consider decreasing the free space percentage from 10% to perhaps 2-3%. An example of a newfs(1M) command to do these 2 suggestions:
         # /usr/sbin/newfs -v -i 32k -m 3 <device name>
    
    
  • Direct I/O
    When direct I/O is enabled via a mount(1M) option, data is transferred directly into the user buffer space and the transfer to the kernel buffer cache is avoided. If doing large amounts of sequential I/O, this could be helpful. However, in our experimentation, we saw the opposite effect. Doing a simple data set copy, we saw the copy go from ~35 seconds to over 3 minutes when the file system  was mounted with the -forcedirectio option. Presumably, the Solaris buffer cache buffered data and was able to more efficiently service the read system calls. In our case, the system was not under heavy load where this could make a difference. Thus, there are scenarios where direct I/O would results in a large performance gain but don't try it unless done in a careful and controlled setting. The VERITAS file system (VxFS) has a feature called discovered direct I/O. VERITAS systematically analyzes the I/O patterns and will automatically turn on direct I/O for large sequential reads. By default, this option probably won't get invoked since, as mentioned, the SAS system typically issues relatively small read/write requests of 8K or 16K.

  • Other kernel parameters
    In general, there is no Solaris kernel tuning that needs to be done. Unlike DBMS configurations or other applications, where you might need to increase shared memory parameters or other kernel parameters, this is not needed in typical SAS applications. Historically, many people thought it necessary to tune the Solaris buffer cache. On other Unix systems, this was done by changing a kernel parameter, often bufhwm.  The  parameter is used exclusively for UFS file system metadata and has little to no affect on the memory/file I/O page cache.

    For the most part, the Solaris buffer cache is self-tuning.

  • VERITAS File System (VxFS)
    The VERITAS file system, an unbundled product from VERITAS Corporation, is highly recommended in large scale enterprise application environments. A few of the features they offer:
    • File system creation time: For a 20 GB file system, the time to create a UFS file system for was on the order of 43 minutes using the default parameters. For a VxFS file system of the same size, it took about 10 seconds. If 24x7 availability is not a requirement, it may be possible to use a simple RAID 0(striped) configuration for the SAS WORK area. Should a disk go bad, the time to remake the file system is minimal. Of course, the current jobs in progress would have to be re-started.
    • Extent based allocation: In an environment where there is mostly sequential processing of a relatively few number of files, extent based allocation can be a big performance win. This feature allocates space for the file in large sequentially blocks. Thus in an optimally striped volume configuration, sequential file I/O can be serviced by in more parallel  (by virtue of multiple spindles servicing the I/O) and efficient manner.
    • Ability to tune and dynamically re-configure file system parameters: VxFS has options to do more aggressive read-ahead buffering. If you have large amounts of sequential processing over a number of different files, changing this parameter could help. To do this:
         # vxtunefs -o read_pref_io=256k <mount point> 
              # vxtunefs -o read_nstream=4 <mount point>
              
      
    • VxFS used in conjunction with the VERITAS volume manager can query the underlying volume settings and determine optimal file system parameters such {read/write}_pref_io and {read/write}_nstream.
    • Performance: In general, VxFS will usually outperform UFS. In simple data set copies, performance was about the same but in a set of 5 I/O intensive applications doing various tasks (sorting, index creation, MDDBs, etc), VxFS outperformed UFS anywhere from 10%-40%. However, the increase in real time performance did come with a price in that more user CPU was accumulated during the VxFS tests. VERITAS publishes an excellent white paper detailing the conditions where VxFS is a real win and where it might be so clear.

Summary

The classical SAS program typically utilizes sequential processing on large data sets. Thus, configure your data areas for blocked sequential I/O and your WORK area for write intensive applications.  Since the WORK area can be configured as a system wide resource, ensure that the I/O channels back to the host are sufficiently wide.  Sometimes adding CPU resources could cause an overall system performance degradation in that they create an increased I/O burden that pushes the system over a previously optimal I/O configuration.  When adding CPUs, ensure that the I/O subsystem can handle the increased requirements.

The data volume layout can make a substantial difference in performance.  Although using a storage platform which had 4 times the bandwidth to the host and disks which had almost 2 times the rotational speed, I/O intensive tests took longer on the faster platform when the data was not striped.  In other words, a significantly slower storage platform outperformed a faster storage platform when the underlying volume on the slower storage was configured more optimally.

Although the SAS System tends to issue relatively small read/write requests, choose a moderate size for the stripe unit. Using 64K is probably a good general rule of thumb. Don't change the SAS System BUFSIZE and BUFNO options unless you can do so in a controlled environment. Consider compressing SAS data sets if applications are dominated by a large I/O component.  Note that compressing costs in terms of requiring addition CPU resources. I/O performance can potentially be gained by using the VERITAS File System (VxFS).  VxFS provides other features making it very suitable and appropriate for large scale enterprise applications.


CPU - the Scarecrow Needed a Brain, Does your System need one?
  • Does clock speed make a difference?
  • CPU Contention

A CPU is one of potentially several/many "brains" of the system. Similar to the scarecrow feeling the need for a "brain" in the Wizard of OZ, we look at how systems could be constrained at the CPU level.

Does clock speed make a difference?

Here we look at a very commonly used SAS procedure which is particularly CPU intensive. Logistic regression is used to find patterns among the data and is often used in data mining and decision support applications.

A forward, stepwise and backward logistic regression was run on a data set which had 200,000 observations, ~500 variables, and a record length of about 1500 bytes. The total size of the data set was ~320 MB.

We ran the tests on different Ultra Enterprise Servers at clock speeds of 167 MHz, 250 MHz, 300 MHz, 336 MHz. The external cache varied on these systems from .5 MB to 4 MB which might slightly affect the results but was probably insignificant relative to the total processing time.

Regression Test Results Using Different Speed of Processors
Hours:Minutes:Seconds
167 MHz 250 MHz 300 MHz 336 MHz
forward

9:43:32

7:12:24

6:28:33

5:24:37

stepwise

10:03:59

7:28:22

7:08:08

5:43:15

backward

40:54:58

30:31:33

26:50:03

22:57:47

imageimage
imageimage

Note a near linear increase in performance times; as the clock speed doubled, the performance was about twice as good. The forward regression went from 9 hrs, 43 minutes to 5 hours, 24 minutes when the clock speed doubled. This speaks extremely well for the UltraSPARC processors. At the time of this writing, the faster processors in Sun systems clock in at 400 MHz.

CPU contention

Let's examine the timesharing effects on CPU bound applications. How are application times affected when there are multiple jobs contending for a single CPU?

We used PROC GLM (general linear model) which took about ~1 hour to run a single job and used about ~270MB of memory.

         real        user       system
        1:07:01     1:03:50       29.184 seconds


We ran 4 of these jobs with 4 CPUs enabled. Note, since each job required ~270 MB or RAM and the system had 1.5 GB RAM, the sum total memory required easily fit within the physical RAM configuration.

Since the system had 8 processors, 4 of them were turned off for the test.  Not that administrators would want to turn off processors.  Perhaps it would be a good April Fools joke imagefor an administrator to turn off 63 processors of a fully configured Sun Enterprise 10000 (aka Starfire) image
To turn off processors, use the psradm(1M) command; this command must be run as root. Use the -f option to turn off the processors and the -n option to turn them back on.

# /usr/sbin/psradm -f 8 9 12 13
# /usr/sbin/psrinfo

0       on-line   since 01/22/99 10:01:50
1       on-line   since 01/22/99 10:02:31
4       on-line   since 01/26/99 09:05:20
5       on-line   since 01/26/99 09:05:20
8       off-line  since 01/25/99 17:00:26
9       off-line  since 01/25/99 17:00:26
12      off-line  since 01/25/99 17:00:26
13      off-line  since 01/25/99 17:00:26

We start the 4 jobs simultaneously and got the following results:

4 Jobs on 4 CPUs

Job 1

Job 2

Job 3

Job 4

Real 1:22:37 1:22:34 1:22:43 1:22:34
User 1:20:15 1:20:11 1:20:12 1:20:06

Note, the jobs all ran equally well with little performance degradation compared to when only 1 job was run. There are 4 jobs running and 4 CPUs enabled. Don't forget that the Solaris kernel itself needs CPU resources itself not to mention the many other system applications that are running. Thus with no memory contention, and no CPU contention, we get comparable performance.

To have been a more fair test, we should have allowed Solaris to have its own processor and used a feature of Solaris 2.6 called processor sets to created dedicated 4 CPU and 2 CPU sets.  This is done by using /usr/sbin/psrset(1M) and /usr/sbin/pbind(1M) to remove the effects of unrelated activity.

Now, we turn off 2 of the CPUs so that only 2 CPUs are active:

# /usr/sbin/psradm -f 4 5


To confirm that only 2 are active:
# /usr/sbin/psrinfo
0       on-line   since 01/22/99 10:01:50
1       on-line   since 01/22/99 10:02:31
4       off-line  since 01/25/99 22:26:17
5       off-line  since 01/25/99 22:26:17
8       off-line  since 01/25/99 17:00:26
9       off-line  since 01/25/99 17:00:26
12      off-line  since 01/25/99 17:00:26
13      off-line  since 01/25/99 17:00:26

As expected, the job times roughly doubled. If 1 job/CPU ran in ~1.3 hours, it would be expected that 2 jobs/CPU would take ~2.5 hours.

4 Jobs on 2 CPUs

Job 1

Job 2

Job 3

Job 4

Real 2:33:04 2:42:39 2:37:05 2:42:58
User 1:20:15 1:20:11 1:20:12 1:20:06

imageNote that user times are nearly identical to the single job/CPU times but that the wall clock time was just about exactly double. This difference in time is attributable to the fact that the user process was idle waiting for CPU time slices.

Both examples speak well for the linearity and predictability of CPU intensive jobs on the UltraSPARC architecture.

If there is valid reason to give priority to a group of "power users" or end of quarter/month/year processes, consider using processor sets. Use /usr/sbin/psrset(1M) and psradm(1M) to create and administer these.

Summary

We have shown the effects of CPU contention in terms of raw CPU performance and or simple CPU contention. One area that is often overlooked when creating budgets is the cost of the users time spent waiting around for their applications to run. It is always extremely difficult to quantify  the cost or value of employees time and productivity, however, this should not be overlooked. Another area of consideration would be for production jobs which may run overnight or toward scheduled peak times and the window of time for completion gets smaller and smaller as data set sizes grow.

If systems are upgraded by adding more CPUs, ensure that there are sufficient I/O channels to handle the potential increase in data traffic.


Cached Out at the Memory Bank?
  • Baseline Memory Recommendations
  • Are we paging yet?
  • Swap - A Necessary Evil, Gotta' have it, even if you don't use it
  • Increasing MEMSIZE/SORTSIZE - when does it help, when does it not?
  • Priority Paging

Baseline Memory Recommendations

From the hardware/OS point of view, the area of memory is probably the single most important area responsible for performance increases or decreases as well as for predictable performance under load.

You can think of memory as money in the bank. Consider a family of 5, all of whom make deposits as well as withdrawals. Given that overall monthly cash in versus cash out is positive, there could still be many times where there might not be enough cash at any given moment to cover requested withdrawals. The issue of memory is the same. Continuing the bank account analogy, where there may be automatic monthly debits to the account, administrators and users must also take into account other system requirements when gathering memory usage requirements. You can have sufficient CPU cycles and an optimally configured I/O subsystem, but without sufficient memory resources, performance will be dismal.

Assuming sufficient CPU, I/O and network resources, the bottom line is that you want the sum total of requested memory + plus system requirements to fit within the confines of your physical RAM configuration. This is absolutely crucial for predictable performance under load.

As a ballpark starting point, we generally recommend .5 GB of memory per CPU. On a more granular level, allocate 32-64 MB for base Solaris, 32 MB for the windowing system and 32 MB per SAS session. We will discuss how any individual SAS session could very easily need to be increased to .5 GB or even over 1 GB. To accumulate SAS session memory requirements on a system level, we must know:

  • How many users, or more specifically SAS processes, are running concurrently?
  • For each SAS process, what is the memory requirement as reported by FULLSTIMER?

This sounds as if it should be fairly simple, but in practice, it is practically impossible to find this number.

Are we paging yet?

The Solaris memory system provides virtual memory management on a demand paged basis. The swap device is used as a physical device for backing store for this virtual address space.

There is an excellent white paper detailing the Solaris Memory System written by Richard McDougall and available from ftp:playground.sun.com in the pub/memtool directory.

In many cases, the SAS software platform does virtual memory management on behalf of the SAS process. Memory is typically mmap(2)ed up to the limit of MEMSIZE if the specific application requires it or believes it can take advantage of it.

While Solaris does an excellent job of paging, allowing the SAS virtual memory management to page on top of Solaris is completely detrimental to application performance.

The key to predictable performance is to allow each SAS user a portion of memory such that the memory requirements for all concurrent SAS users can be resident in memory.

Let's examine paging from both the system and SAS users perspective.

From a system perspective: Here are 2 "quick & dirty" ways to determine if your system is experiencing excessive paging using tools bundled with Solaris:

  • vmstat - consistently large values in the "sr" (scan rate) column could indicate a memory shortage; specifically, scan rates of more than 200-300 pages per second for extended periods of time.
  • look for activity to the swap device
    # simple shell script to run iostat on the swap device
    $ cat checkswap 
    echo " extended device statistics"
     echo " r/s w/s  kr/s kw/s wait actv wsvc_t asvc_t %w %b device" iostat -nxc 3 7 
    | 
    grep c0t5d4
    

    Note: the disk name substring (c0t5d4) is hardcoded above. It also assumes a single swap device in a complete partition.

Users can check the memory usage of their own applications with a new utility in Solaris 2.6, pmap(1). An excerpt of the output shows:

 $ ps -e | grep 
sas 29859 pts/0 0:01 sas 
$ /usr/proc/bin/pmap -x 29859

29859:  /u1/sas612/sas
Address   Kbytes Resident Shared Private Permissions       Mapped File
00002000       8       8       -       8 read               [ anon ]
00010000    2048    1792       8    1784 read/exec         sas
0021E000      56      56       -      56 read/write/exec   sas
0022C000    1272    1256       -    1256 read/write/exec    [ heap ]
EE210000     256      32       -      32 read/write/exec    [ anon ]
EE280000    2144    1200       8    1192 read/exec         sasxxxpm
EE4A6000     336     336       -     336 read/write/exec   sasxxxpm
EE500000     432     408     400       8 read/exec         libX11.so.4
EE57A000      24      24       8      16 read/write/exec   libX11.so.4
EE600000    2912    2304       8    2296 read/exec         sasmotif
......
EF7D0000     112     112     112       - read/exec         ld.so.1
EF7FA000       8       8       8       - read/write/exec   ld.so.1
EFFF6000      40      40       -      40 read/write/exec    [ stack ]
--------  ------  ------  ------  ------
total Kb   13192   10184    1960    8224

This is a SAS session sitting in an idle display mode. Our resident size for this process is 10+MB and this is about the overhead for each SAS session.

Administrators can use tools bundled with Solaris such as sar(1), vmstat(1M), or even ps(1) which give you a snapshot of collective memory requirements.  There are also freely available tools such as top or proctool and Memtools (references in the Appendix) l. Memtool is actually collection of various tools described in the Solaris Memory white paper above. Solaris 7 also includes sdtprocess(1) which is similar to top.

Let's take a look at some of the tools.  Below we see a snapshot of proctool on a relatively quiet system.

image

Of particular interest, look at the SIZE and RSS columns. SIZE represents the amount of virtual address space and RSS represents the amount of memory which is actually resident. A simple way to think about whether you have enough memory is to sum the SIZE values or virtual address space (the amount all processes want) and the RSS values (the amount that all processes have). If the 2 values are roughly equal, (Requested = Actual Size), then you have enough memory. However, if the actual(RSS)  is much less than requested(SIZE), then paging is likely an issue. The sum of the  RSS column should equal roughly the amount of physical RAM in your system. The (Memtool) memps -s command can give you a more finely grained breakdown on the memory allocations.  RSS or resident memory is actually composed of shared and private memory segments.  Thus, a large shared value among like processes  would not be consuming unique amounts of memory.  For example, most code segments for a specific binary are loaded in a read-only shared mode.   Thus multiple copies of that binary would only consume the "shared" amount of memory.   However, typically, SAS processes which have large memory footprints consume mostly private data. Thus, in general, when monitoring SAS processes, resident size values should be fairly close to the size of the private segment. We show an excerpt from memps -s below:

$ memps -s
SunOS ctcsun2 5.6 Generic_105181-11 sun4u    02/05/99

14:56:48 WaitCPU WaitIO AvgLoad Users CUsers
14:56:48    0.00   0.00    0.02     6      6
14:56:48
   PID      Size Resident   Shared  Private  Process

  28161    5912k    4720k    1304k    3416k  ./proctool
  11517    5904k    3752k    1344k    2408k  dtgreet -display :0
   1123    5728k    3840k    1440k    2400k  /usr/dt/bin/dtlogin -daemon
  11503    5752k    3840k    1440k    2400k  /usr/dt/bin/dtlogin -daemon
  28162    2848k    2056k     728k    1328k  ./pmon
   1106    3016k    2256k    1072k    1184k  /usr/sbin/nsr/nsrindexd
    932    2792k    2224k    1192k    1032k  /usr/lib/autofs/automountd
   1023    2416k    2088k    1072k    1016k  /usr/sbin/nsr/nsrd
     11    2744k     776k      24k     752k  vxconfigd -m boot
    951    2248k    1912k    1216k     696k  /usr/sbin/nscd
    936    4128k    1864k    1176k     688k  /usr/sbin/syslogd
    994    2168k    1712k    1080k     632k  /usr/lib/sendmail -bd -q15m
   1107    2520k    1912k    1280k     632k  /usr/lib/nfs/mountd
  11502   16208k    1672k    1080k     592k  /usr/openwin/bin/Xsun :0 -nobanner
....

    

Although the listing above is not complete, the Size (Requested) column is fairly close to the RSS (Actual) and the totals fit well within our 1.5 GB RAM configuration.

The (Memtool) prtmem command is also very useful in showing the break down of application memory and  Solaris buffer cache memory.

$ prtmem
Total memory:            1468 Megabytes
Kernel Memory:            117 Megabytes
Application memory:        14 Megabytes
Executable  memory:        41 Megabytes
Buffercache memory:         7 Megabytes
Free memory:             1287 Megabytes

Note- Buffer cache memory is low, free memory is high.


$ vmstat 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi  po fr de sr s6 sd sd sd   in   sy   cs us sy id
 0 0 17 13960  7344   0  25 351 40 468 0 98  0  0  0  0  140  162  114  2  0 98
 0 0 28 2718608 1318792 0 0  0  0  0   0  0  0  0  0  0  113   43   76  0  0 100
 0 0 28 2718608 1318792 0 0  0  0  0   0  0  0  0  0  0  115   45   77  0 
 0 100

Vmstat(1) pretty much agrees with prtmem.

No activity on the swap device.

$ ./checkswap
                       extended device statistics
  r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.2  0.1    1.5    3.4  0.0  0.0    1.2   18.9   0   0 c0t5d4
  0.0  0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t5d4
  0.0  0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t5d4
  0.0  0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0 c0t5d4

How Bad Does it Hurt?

We saw a quiet system. Now, we start 2 SAS jobs which each will eventually request 1.2 GB of memory. These 2 jobs by themselves will collectively request 2.4 GB memory, and recall that we only have 1.5 MB of RAM. This particular proc (MDDB) requires a minimum MEMSIZE such that loading the N-way cube fit in virtual memory (1.2 MB in this case). If our system only had .5MB memory and were just running 1 job, you would have no choice but to allow Solaris to page on behalf of the SAS System.

image

Here we see a snapshot from proctool as the processes are getting started. Each process is asking for ~612+ MB(Size column) but has been allocated only ~599 MB (RSS column).

$ prtmem

Total memory:            1468 Megabytes
Kernel Memory:            116 Megabytes
Application memory:      1268 Megabytes
Executable  memory:         5 Megabytes
Buffercache memory:        55 Megabytes
Free memory:               22 Megabytes

Now, prtmem is showing most available memory allocated to application memory. High allocation to buffer cache memory is not necessarily bad(and is in fact usually a good thing).  However,  note that the "free" column in prtmem as well as vmstat may mislead one into thinking that there is a memory shortage when in fact it is just allocated to the buffer cache.  This is where prtmem is helpful in that it can narrow the memory allocations between buffer cache and all other application or kernel usage.


As the job progresses, proctool below shows that each as requested ~1 GB but still has ~600 MB. image

Another view from memps -m is showing basically the same thing:

$ memps -m
SunOS ctcsun2 5.6 Generic_105181-11 sun4u    01/29/99

15:51:54 WaitCPU WaitIO AvgLoad Users CUsers
15:51:54    0.00   2.00    1.04     6      6
15:51:54 
   PID      Size Resident   Shared  Private  Process                          
   
  20777 1161656k  677376k    1400k  675976k  /u1/sas612/sas -memsize 
1200m
-auto
  20781 1161664k  676776k    1392k  675384k  /u1/sas612/sas -memsize 
1200m
-auto
  20813   11240k    5640k    1400k    4240k  /u1/sas612/sas -config
/u1/sas612/a
  20815    5912k    3008k    1232k    1776k  ./proctool
  11517    5904k    2152k    1224k     928k  dtgreet -display :0
  20816    2848k    1592k     688k     904k  ./pmon
.....

Our swap device is showing activity:

$ ./checkswap
                       extended device statistics
  r/s  w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
  0.2  0.1    1.5    3.5  0.0  0.0    1.2   18.9   0   0 c0t5d4
  8.3  0.0   66.4    0.0  0.0  0.1    0.0   13.8   0  11 c0t5d4
 10.0  0.0   80.0    0.0  0.0  0.1    0.0   12.8   0  13 c0t5d4
  6.0  0.0   48.0    0.0  0.0  0.1    0.0   11.9   0   7 c0t5d4
 10.7  0.0   85.3    0.0  0.0  0.1    0.0   13.6   0  14 c0t5d4
 11.0  0.0   88.0    0.0  0.0  0.1    0.0   12.9   0  14 c0t5d4
  9.0  0.0   72.0    0.0  0.0  0.1    0.0   12.5   0  11 c0t5d4


And vmstat is also showing a memory shortage with very high scan rates:

$ vmstat 5
 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi   po     fr de sr s6 sd sd sd in  sy  cs us sy id
 0 0 17 14696  7504   0  25 353  41    469 0   99 0  0 0  0 140 163 114 2  0 98
 0 1 49 398280 23816  9 168 1438 3224 3465 0 4116 0  0 0  0 384 642 410 2 
 2 96
 0 1 49 398192 24272  0 156 1286 2896 3112 0 4081 0  0 0  0 374 539 406 0 
 2 98
 0 1 49 398200 24216  3 162 1283 3404 3563 0 4117 0  0 0  0 433 694 426 1 
 2 97
 0 1 49 398208 24088 44 157 1406 3644 3734 0 4056 0  0 0  0 407 
601 418 1  3 96

We looked at proctool and the memtools during this process.   For completeness, we wanted to show similar output from top. An advantage of top is that it does not require the installation of a kernel driver;  however, it is usually installed as a setuid root program.  Thus, it can be run by users without requiring the root password. Top output as the same two previous MDDB memory intensive jobs get started:

$ top
last pid: 4862; load averages: 1.84, 0.88, 0.36 13:31:03 62 processes: 56 sleeping, 2 zombie, 1 stopped, 3 on cpu CPU states: 75.3% idle, 20.3% user, 3.7% kernel, 0.7% iowait, 0.0% swap Memory: 1536M real, 30M free, 1241M swap in use, 1258M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 4858 root 4 0 0 515M 502M cpu9 2:47 11.76% sas 4859 root 3 0 0 515M 502M cpu4 2:46 11.75% sas 4862 root 1 0 0 2240K 1680K cpu0 0:04 0.31% top 1023 root 1 58 0 2416K 1144K sleep 3:49 0.00% nsrd 989 root 1 58 0 904K 656K sleep 0:04 0.00% utmpd 932 root 6 58 0 2848K 1936K sleep 20:17 0.00% automountd 880 root 1 58 0 2072K 912K sleep 1:13 0.00% rpcbind 951 root 11 50 0 2296K 1632K sleep 0:12 0.00% nscd 11 root 4 58 0 2744K 504K sleep 0:12 0.00% vxconfigd 1 root 1 58 0 824K 112K sleep 0:07 0.00% init 1106 root 1 58 0 3016K 1056K sleep 0:04 0.00% nsrindexd 945 root 1 48 0 1688K 440K sleep 0:03 0.00% cron ....

Similar to the 2nd proctool illustration above, the top command excerpt below demonstrates the same differential between SIZE (requested) and RES (actual or RSS) and thus we know that paging will be a performance factor. Both processes are asking for 1+ GB but able to secure only ~600 MB.

last pid:  4863;  load averages:  1.03,  1.11,  0.64                   13:36:22
63 processes:  57 sleeping, 2 zombie, 1 stopped, 3 on cpu
CPU states: 19.7% idle,  9.5% user,  3.0% kernel, 67.8% iowait,  0.0% swap
Memory: 1536M real, 22M free, 2114M swap in use, 386M swap free

  PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND
 4859 root       3  20    0 1069M  619M cpu13   6:18  6.75% sas
 4858 root       4  41    0 1000M  551M cpu4    5:50  4.63% sas
 4862 root       1   0    0 2240K 1432K cpu5    0:12  0.31% top
  936 root      20  52    0 4128K 1496K sleep   0:02  0.00% syslogd
  932 root       6  58    0 2848K 1792K sleep  20:17  0.00% automountd
 1023 root       1  58    0 2416K  784K sleep   3:49  0.00% nsrd
  880 root       1  58    0 2072K  680K sleep   1:13  0.00% rpcbind
  951 root      11  50    0 2296K 1552K sleep   0:12  0.00% nscd
   11 root       4  58    0 2744K  152K sleep   0:12  0.00% vxconfigd
    1 root       1  58    0  824K  120K sleep   0:07  0.00% init
 1106 root       1  58    0 3016K  696K sleep   0:04  0.00% nsrindexd
  989 root       1  58    0  904K  632K sleep   0:04  0.00% utmpd
  945 root       1  48    0 1688K  360K sleep   0:03  0.00% cron



Look at how this memory "shortage" affects performance. When 1 job is run, this proc runs in about 12 minutes:
NOTE: PROCEDURE MDDB used:
      time:                           memory:
         real        12:22.520           page faults   3882
         user cpu    7:26.896            page reclaims 0
         system cpu  2:28.122            usage         1128.56 M

When 2 jobs are run, there should be plenty of CPU bandwidth since this is an 8 way system.  However, due to the memory contention, the time to run goes from approximately 12 minutes to over 1 hour. All this extra time was spent paging. Recall that we had 1.5 GB RAM and the 2 jobs by themselves wanted 2.4 GB.


Job times when memory constrained

Time

job 1 1:05:16
job 2 1:14:33


Let's look at this another way. We'll show how a faster system can produce slower results if there is a memory shortage. We have 2 different MDDB procs which respectively requires memory of 547 MB and 1.1 GB. Our "slower" system has 250 MHz processors and 1.5 GB RAM and our "faster" system has 300 MHz processors but with only 1 GB RAM.

How faster CPUs be affected by memory shortages

MDDB
proc 1
547 MB
MDDB
proc 2
1120 MB
"Slower" system
1.5 GB RAM
7:22.0 15:09.9
"Faster" system
1 GB RAM
6:36.7 35:52.4

image

From the CPU discussion above, we saw a near linear increase in performance for CPU bound applications. Thus we expect the "faster" system to outperform the "slower" system. The results show that this is true when problem "fit" in memory, but when it didn't, the time practically doubled and the "faster" system actually ran much slower.

We don't intend to single out proc MDDB in our examples.  There are other SAS procs which can potentially require a large minimum amount of memory in order to run.  Some examples include IML, GLM, and certain data mining procs.

SWAP - A Necessary Evil, Gotta' have it, even if you don't use it

As mentioned in the I/O Section earlier, plenty of SWAP space must be allocated. How much can only be determined collectively by the users and system administrators. There must be enough to back all mmap(2)ed requests or jobs could fail with insufficient memory. Thus, even if you have enough memory for jobs to be resident in memory, you could likely get an insuffcient memory error condition if there is not enough swap to service the reservation. Given that any user can ask for an arbitrarily large memory footprint by specifying MEMSIZE=<BIG VALUE> or MEMSIZE=0, it is not immediately obvious to systems administrators how much SWAP to allocate.

Also, don't confuse virtual memory with real memory. Physical memory accesses is several orders of magnitude faster than a disk access. The main goal is to eliminate or minimize the activity to the SWAP device.

Increasing MEMSIZE/SORTSIZE - when does it help, when does it not?

There are 2 options when running SAS applications which can control memory usage; MEMSIZE and SORTSIZE. They are set at 32 MB and 16 MB respectively in the default config file (config.sas612 or sasv7.cfg depending on the version of the SAS software).

MEMSIZE is the total amount of memory that a SAS application could allocate on behalf of a SAS process. The SORT procedure would use up to SORTSIZE amount of memory so as a general rule of thumb, MEMSIZE should be at least (SORTSIZE + ~4 MB) just to ensure that there is enough memory to meet the SAS requirements. Let's take a closer look at MEMSIZE. MEMSIZE is an upper limit. Consider 3 categories of SAS procs when looking at individual MEMSIZE settings:

  • i) Uses a small fixed amount  of memory regardless of the value of MEMSIZE
  • ii) Uses more memory as MEMSIZE increases
  • iii) Requires a *minimum* amount of memory based on data set size or other programmatic requirement

i) Uses a small fixed amount of memory  regardless of the value of MEMSIZE
If you have MEMSIZE set to a large value and the fullstimer option reports some small value of memory used, then increasing MEMSIZE won't help. For instance

           memory:
                        usage         57 K

Many data steps and procs (freq, tabulate, etc) use only a small amount of memory. However, if amount of memory reportedly used is close to MEMSIZE, increasing MEMSIZE may help.

ii) Uses more memory as MEMSIZE increases
The SORT procedure is a good example.  It will use as much memory as specified in SORTSIZE. However, in general, unless the entire data set can fit in memory, performance will remain flat despite the fact that more memory is used. Thus, you could hurt overall system performance by consuming more memory even though your application experiences no benefit. Below are the results of SORT on our 1 GB household data set.

Effects of Changing SORTSIZE
SORTSIZE

Memory
Used

Time

 16m

    16.49 MB

8:33.45

 32m

   32.82 MB

8:23.43

 64m

   65.44 MB

8:26.98

 128m

 130.72 MB

8:33.64

 256m

  261.19 MB

8:16.93

 512m

 522.36 MB

8:15.90

 768m

 783.65 MB

9:14.33

1024m

1045.03 MB

8:52.91

1300m

1141.12 MB 3:30.72

image
As you can see, the times were flat around 8-9 minutes until the data set fit in memory and then it went down to 3+ minutes. Unless you are sure it will fit in memory AND you won't contend with other user requirements, don't change SORTSIZE. There is an undocumented SORT parameter, UBUFSIZE; however, in our testing scenario we did not see any performance benefit from changing it from its default value of 8K.

The good news is that with SAS Version 8, additional memory is not consumed unless SORTSIZE is large enough for the entire data set. However, recall that SORTSIZE could still exceed the amount of physical RAM or the amount of available memory at a given time. In this case the SAS System would be relying on the virtual memory paging facility of Solaris and performance will end up potentially be much worse than specifying a small or default SORTSIZE of 16 MB.

If SORT is not depending on a complete in-memory sort, the resulting runtime will typically be dominated by the I/O component.  This can be seen as either a proportionally large system time in the -FULLSTIMER output.  Alternatively, a result where the REAL time component is larger that the sum of the USER + SYSTEM components demonstrates the same effect.  In this case, the I/O configuration will be critical to maximizing performance.

Similar affects can be seen with the LOGISTIC procedure. In this case, the threshold was basically the size of the data set. If MEMSIZE is set to something less than the size of the data set, only 2 MB memory was used. The performance was only marginally slower when 2 MB memory was used than when the whole data set fit in memory. Thus it is arguable whether the benefit gained is worth the cost of memory consumption.  

Changing MEMSIZE in LOGISTIC

Memory
Used

Time

MEMSIZE
Setting

16m

1.3 MB

7:01:12

32m

1.3 MB

7:11:51

64m

1.3 MB

7:14:42

128m

1.3 MB

7:07:31

256m

1.3 MB

7:08:01

512m

416 MB 6:28:32

In this case, 30 minutes out of 7 hours was saved by fitting the entire data set in memory. The cost for this savings was basically 500 MB of memory, about half the memory available to all users of the system.

Similar to the SORT procedure, the PHREG procedure (used for survival analysis studies) shows the same behavior, though in a more stairstep fashion. Although, the data set to fit in memory on the upper MEMSIZE values, we never saw any real increase in performance. Thus, increasing MEMSIZE in this case, only hurt because it consumed memory resources and making less available for other system processes.

Changing MEMSIZE in PHREG

Memory
Used

Time

MEMSIZE
Setting



16m

15.0 MB

3:07

32m

30.1 MB

3:04

64m

40.4 MB

3:01

128m

40.4 MB

3:02

256m

183.3 MB

3:09

512m

183.3 MB

3:09

768m

183.3 MB

3:09


Not to revisit the paging issues discussed above, here is an example of proving the value of finding the "sweet spot" for the MEMSIZE setting. We had a large healthcare drug safety application using the IML procedure. Their system had 1.5GB RAM. In initial testing, MEMSIZE values of 1 GB and 2 GB were used. When 1 GB was used, the job did not run due to insufficient memory.  Because they weren't sure exactly how much memory proc IML required in their situation, MEMSIZE of 2 GB was used. When set at 2 GB (.5 GB over their physical RAM configuration), the job ran in 23 hours with user+system time coming in around 13 hours. It was suggested that they try to discover a MEMSIZE value which would allow the proc to run that was under the boundary of the their physical RAM configuration.  If it wasn't possible, they would have to rely on the virtual memory system.  However, there was good news in that using a MEMSIZE setting of 1.2 GB, the real time was brought down to 13 hours thus reducing their job time by 10 hours. It was definitely worth their trial and error effort to discover the "minimal maxima" memory requirement.

Thus, you can increase MEMSIZE (or SORTSIZE) with the invocation:

       <SAS_INSTALL_DIR>/sas -memsize 1024m myprog.sas 

This kind of testing is made more difficult if other people or processes are contending for system resources at the time of your tests. If you have a performance baseline, you can still do valid comparisons. The user and system times should remain about the same between runs regardless of CPU and memory contention. If the real or wall clock time wildly differs between runs, then it is likely there is contention for CPU, I/O or memory resources.

Priority Paging

In cases where there is a fair amount of memory paging going on, we have seen that general application performance takes an unusually severe degradation. For instance, an "ls"(1) of a directory doesn't come back or a simple windowing system event doesn't get serviced (mouse click). In this scenario, it is possible that the application pages are being paged out due to the large amount of paging and/or file accesses.

With priority paging turned on, the algorithm allows the system to place a boundary around the file cache so that file system I/O does not cause paging of applications.

To use this, you need Solaris 7 or Solaris 2.6 with kernel patch 105181 (rev 09 or higher). Set the following in /etc/system and reboot:

set priority_paging=1

You can discover more information on this via:
http://www.sun.com/sun-on-net/performance/priority_paging.html

As discussed in the Solaris Buffer Cache section and in the memory section where the prtmem command was shown,  don't confuse the distinction between the memory used for the buffer cache and application memory.  Priority paging is targeted for situations where heavy file system I/O induces a situation where the application pages become second class citizens.

One critical note, if you enable priority paging, make sure that your large data files do not have execute permission on them. The criteria for determining an application page versus a pure file system I/O page is whether the file has execute permission.

Summary

In general, if an application is not particularly, I/O intensive, and your wall clock time is 2 or more times the combined user + system time, look closely for CPU and memory/paging constraints. Also, as noted in the I/O section, be sure that you have enough backing store (SWAP) or you may see unpredictable results.

Only the user and system administrators together can determine the cost/benefit ratio of increasing memory resources.

Increase MEMSIZE carefully and judiciously. If you know that you can benefit from an increase in memory, evaluate its effects on a system wide basis. More often than not, in this case, using a smaller MEMSIZE will gain you more predictable performance when the system is under load. Think of it as a large sandbox with alot of toys. The more kids in there, the fewer toys there will be to play. It's alot more pleasant for all the kids if they are to share rather than to fight and try to get toys  exclusively for themselves.

In certain cases, it will be quite possible that every page the SAS application asks for causes a page fault, and performance will be dismal. If the application can't be recoded (fewer iterations or fewer variables used), the only way to improve performance would be to add memory to the system. If these jobs are critical for your job function, then it should be justifiable that more memory is needed.

We have hopefully demonstrated the effects of paging. When asked "Do SAS applications require alot of memory?"; the answer is, "In general, no, SAS applications do not require alot of memory".  For "most" procedures, the SAS System works extremely well in low memory configurations. However,  specific applications can have large memory requirements either by user directive or if there are problems to solve which require large memory configurations.


Usually, the Biggest Payoff

  • Application Coding...

In a reasonably configured system, the most performance gain can usually be realized by tuning the application.  The manual, SAS Programming Tips: A Guide to Efficient SAS Programming outlines basic principals for writing efficient SAS code, namely:

  • Read and write data selectively.  I/O is often the largest single component of program execution time.
    Samples:
    • Use DROP/KEEP to eliminate unnecessary fields in data sets.
    • Use SHAREBUFFERS on infile statements to eliminate the need for separate input and output buffers.
    • Create indexes when appropriate:
      • the data set is relatively large
      • data set not frequently updated
      • data frequently subset by values of the indexed variable
      • data is uniformly distributed
      • result sets are usually less than 30% of entire data set
  • Execute only the statements you need, in the order you need them. Samples:
    • Use mutually exclusive conditions - IF-THEN/ELSE instead of IF-THEN
    • Write conditions in order of descending probability
      /* China is most likely */
      select(country);
         when('China') output
              china1;
         when('India') output
              india1;

       ....
    • Use IN operators rather than logical OR operators
          if street in ('Maple', 'Elm', 'Willow') instead of
          if street='Maple' or street='Elm' or street='Willow' then
    • Use _TEMPORARY_ arrays rather than variables you DROP as they can reduce storage as well as CPU time based on their contingous memory locations.
    • Set lower bound of arrays to 0.
  • Take advantage of SAS procedures. Samples:
    • Use PROCs if possible rather than creating your own data step.
    • Use PROC DATASETS to copy datasets with indexes.
    • Use WHERE conditions in procedures.
    • Use SQL procedures to simplify code.
  • Know SAS System defaults. Samples:
    • Reduce the storage space for variables.  The SAS System uses 8 bytes for numeric variables.  Storage can be saved if you were to shorten variables such as house number or building number.
    • Use character rather than numeric variables. GENDER  stored as a numeric 1 or 0 would be 8 bytes compared to a 1 byte character 'F' or 'M'.
    • Avoid default type conversions - Use the PUT function to perform numeric-to-character conversions and INPUT function for character-to-numeric conversions.
  • Control sorting. Samples:
    • Sort only when necessary.
    • Sort as few observations and variables as possible.
    • Use NOEQUALS if it is not necessary for observations within BY groups to have the same relative order.  If you do not care about the ordering within each CITY grouping below, we specify NOEQUALS.
          proc SORT data=big noequals;
             by city;
          run;
  • Know your data and test programs. Samples:
    • Examine raw data before reading them.
    • Label variables and data sets.
  • Code clearly

This manual does a nice job of cross referencing the tips and suggestions  by resources saved: CPU, I/O, memory and should be a must-read for SAS application developers.  The suggestions above are only a small sample of what is covered.  

Again, you may find better results which are contrary to these suggestions. A user from a large consumer credit organization mentioned that they found that using the OR operator as opposed to IN provided better results.

Another reference highly recommended by SAS users, Efficiency: Improving the Performance of Your SAS(R) Applications,  by RobertVirgile (order GO55960 or ISBN: 1-58025-228-1) can be ordered through SAS publications.

We discussed the efficiency of creating indexes. The recommended decision criteria as to whether or not to index a data set is roughly based on the expected size of the returned sample. On the average, indexing should be used if the typical returned sample is ~30% or less of the entire data set. As the returned subset gets larger, the cost of creating, maintaining and reading the index is one of diminishing returns.

Using our household data set, we perform a PROC FREQ of the state variable to understand the distribution of the data by state, pull a 6% sample, a 40-50% sample and then a 70-80% sample over the vanilla dataset, a sorted version and then an indexed version.

How Coding can Affect Performance
Regular SORTed INDEXed
SORT Time

5:36.5


INDEX Time

2:02.7

proc FREQ

58.3

1:34.1

1:24.7

6% sample

51.8

51.1

6.9

40-50% sample

1:17.1

1:10.6

56.7

70-80% sample

1:23.4

1:22.6

1:23.3


The cost of sorting and indexing is not trivial in terms of CPU cycles and disk space required. This exercise was to demonstrate (albeit simplistically) how application coding decisions can widely affect the results to get the same answer.

Another example could be deciding between PROC SUMMARY or PROC SQL which could be used interchangeably. PROC SQL sorts the input data set while PROC SUMMARY sorts the output data set. If the input data set is large and the output data set small, then proc SUMMARY is probably more appropriate. If the reverse is true (output dataset > input dataset), then choosing PROC SQL would be more beneficial.

In another example, we were running PROC SUMMARY in Version 7. While running, we ran truss(1M) on the process and noticed that the process was endlessly showing:

# ps -e | grep sas
 20926 pts/0    0:45 sas
#/usr/bin/truss -p 20926
...
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
getcontext(0xEE48D568)
....

It turned out the this was the result of the floating point exception checks. This code was needed to prevent PROC SUMMARY from terminating in the case of missing values. Knowing this was not an issue, we supplied the "notrap" option to PROC SUMMARY. The results were significant in that performance almost doubled:

SUMMARY notrap option effects

default with notrap

Real time

40:56.29

17:54.96

User cpu time

25:08.09

17:03.29

System cpu time

14:38.10

25.09 seconds

image

While it may not have been obvious from the truss output that you needed to turn on "notrap", this info was enough to get quick and immediate reply from SAS engineering.

Version 8 data access
The SAS System, Version 8 provides many new features some of which include more transparent access to different data set formats. Data sets can be accessed across different hardware platforms without converting to transport format via PROC CPORT/CIMPORT.  For example, a SAS program running on Solaris would be able to read a data set on a different hardware platform which had a different byte ordering format.  This is a very nice feature but it does come at a performance cost in terms of the SAS system having to internally do the conversion.  If frequently used, large data sets are being utilized, it could be well worth the conversion time and disk space to convert them to native formats.


Interesting and Important But...

A key area of the system that was not discussed was that of the network configuration. This could probably warrant a paper by itself. A real strength of the SAS System is in the suite of SAS/CONNECT and SAS/ACCESS products which respectively offer you the ability to remote connect and/or distribute portions of the SAS application to remote servers as well as the ability to seemlessly access many different types of data stores/application suites. Obviously, a key component for successful implementation would be the network architecture.

There are both technologies and products from both SAS Institute and third parties which allow parallelization of typical data warehousing SAS processes and thus can exploit the multiprocessor environment of the Sun Enterprise series. Products such as the SPDS - Scalable Performance Data Server from the SAS Institute allow parallelization in areas such as fast parallel loads, batch updates, fast reads, bitmap and b-tree indexes, parallel sort for order by processing, and WHERE clause evaluation. or from third parties such as The SAS Analyzer from Ab Initio  or Orchestrator for the SAS System  from Torrent Systems all of which are available and shipping today.


Executive Summary

System performance must be approached holistically with cooperation between the users and system administrators by examining an application within the context of the system configuration. A good configuration must balance the CPU, I/O, and network system resources against the collective application requests. Increasing resources in one area without regard to the other areas could actually hurt overall performance.  Lastly, don't overlook optimizations at the application code level as this is where the biggest payoff is most likely to be realized.

Classical SAS applications require relatively little memory however, some applications can require a large amount. Getting a handle on the collective memory requirements is a key factor for predictable performance under load. Having a surplus of system memory, is the cheapest "insurance" that one can buy when sizing a system for future growth and unexpected or unpredictable peak load times. As a very general target, look to see if your system has either .5 GB of memory configured per CPU or 32 MB per user.  Adjustments up or down should be made from those starting points.

There is relatively little tuning outside of "common sense" that should be done to the system (i.e.: tuning of kernel parameters, adjusting OS I/O buffer sizes, implementing more/less aggressive file caching and look ahead, etc). Rather, the biggest wins can usually be gained at the application level.  With exception to sometimes changing the SAS MEMSIZE/SORTSIZE parameters,  it's probably best to generally leave all other SAS parameters as well as Solaris parameters alone.  The exception is if you have consistent production level jobs which are periodically run on a regular schedule. An example might be a weekly or monthly refresh of the data warehouse or data mart. Only in these cases, where you have a well characterized job, does it usually make sense to start looking at changing some of the other SAS System or file system parameters.

In general, tune the application and not the hardware or OS or SAS platform.

Hopefully,  we have demonstrated the effects of contention for resources on the I/O, CPU and memory level and when it might make sense to investigate system upgrades and/or expansions. While the costs of reduced or limited productivity are difficult to quantify, this factor should not be overlooked if many analysts and users experience idle time waiting for their jobs and applications to complete.


Appendix

A) How to Find the Tools

  • top
    http://sunfreeware.com
  • proctool
    http://sunfreeware.com
  • Memtool tools:
    ftp://playground.sun.com/pub/memtool
    Note: Versions prior  to (and including) 3.7.3 are incompatible with Solaris patches 105181-14 (Solaris 2.6) and 106541-04 (Solaris 7) and higher.
    • Memtool tar package- RMCmem3.7.3.tar.Z (5.4 MB)
    • Virtual Memory paper -vmsizing.ps - (864 KB)
    • Memtools documentation - memtool.ps - (325 KB)
  • hstat
    Internal Sun tool to gather UltraSPARC performance counter statistics.  Can be used to get hardware execution profiles.  Some types of information it can return, LWP stats, TLB misses, i-cache references, d-cache read/write references, e-cache stats, accumulated cycles, # instructions completed, etc.  This is not publicly available.  If you are working closely with a Sun systems engineer, he or she may be able to make it available.  The location- hstat.ireland.
  • Tools bundled with Solaris - vmstat, ps, sdtprocess(new with Solaris 7), /usr/proc/bin/pmap (new with 2.6)
    sar, /etc/prtconf, /usr/platform/sun4u/sbin/prtdiag, truss, iostat(the -n option used above is new to Solaris 2.6), ps, uname
  • Solaris Performance Monitoring at a Glance- Notes from a SUG meeting by Brian Wong

Note: top, proctool, Memtool, hstat are not officially supported.  Additionally, all except top, require installation of kernel drivers.  The tools which are bundled with Solaris are fully supported.

B) Detailed system configurations

Most testing was done on a pair of 8 way Sun Enterprise 4000's. Each had 1.5 GB RAM and utilized Sun Model 100 Storage Arrays configured with the VERITAS Volume Manager version 2.5. One system had 167 MHz processors, while the other had 250 MHz processors. Both were running Solaris 2.6. For the VERITAS File system tests, we were using version 3.2.5 of VxFS. On one system we were running the SAS System, Release 6.12 TS045 and TS050 on the other.

For the other clock speed tests, we used a 2 way Sun Enterprise 450 with 300MHz processors, 1 GB RAM, and internal UltraSCSI storage. The last system we used was an 8 way Enterprise 4500 with 336 MHz processors, 4 GB RAM, and Sun A5000 storage. Both systems had similar software configurations to the base test platforms.

C) References

D) About the Authors

  • Maureen Chew, Sr. Member of Technical Staff, has been with Sun Microsystems for over 10 years.  She is a resident of Chapel Hill, NC and can be reached at maureen.chew@east.sun.com
  • Leigh Ihnen, Manager of Numerical Architecture and Performance, has been with the SAS Institute for over 16 years.  He can be reached at leigh.ihnen@sas.com
  • Tom Keefer, Systems Engineer for the Sun ISV East Region, assigned to SAS Institute. Tom, an employee of Sun Microsystems, has worked with Sun and UNIX for over ten years. He can be reached at thomas.keefer@east.sun.com

E) Acknowledgments

We would like to thank the following people for their contributions of expertise and time:  

  • SAS Institute
    A special thanks to Margaret Crevar whose support and expertise was invaluable.
    Dan Lucas, Clarke Thacher, Robert Ray, Rajen Doshi
  • Sun Microsystems
    Jim Nissen, Alberto Bullani, Morgan Herrington, Kelly Hemphill, Mark Mulligan, Bill Stroud
  • Wellmark Blue Cross
    Jason Veatch, Kris Hoffmeyer, Jason Stann
  • Household International/Beneficial
    Susan Riley
  • Winterthur
    Dr. Eugenio Rossi
  • Banca Commerciale Italiana
    Flavio Addolorato, Angela Ancona, Emiliano Laruccia
    Economic Research Department, Strategic Decision Support Systems

If you have any comments, suggestions or would like to share any application experiences, email one of the authors. image

09June1999
Version 1.2
http://www.sas.com /partners/directory/sun/performance/

The Power to Know
   Contact Us      Worldwide Sites     Search     Site Map     RSS Feeds     Terms of Use    Privacy Statement   Copyright © 2008 SAS Institute Inc. All Rights Reserved