![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Item | Example |
| Hardware Platform | Sun E4500 |
| # CPUs | 8 |
| Clock Speed | 336 MHz |
| Memory or RAM | 1.5 GB |
| Storage Platform | Sun A5000 |
| Storage Layout | Data area - 30 GB RAID 0+1 WORK area - 30 GB RAID 0 VERITAS Volume Manager VERITAS File System |
| SWAP Configuration | Total 8 GB striped across 2 spindles |
| OS | Solaris 2.6 |
| SAS Version | 6.12 TS050 |
As is the case with Unix, there are usually several ways to accomplish the
same task.
Let's start with the software first.
The version of the SAS System is placed in the SAS LOG window if running in display mode. You can also get this information from a shell or command mode:
$ <SAS_INSTALL_DIR>/sas -nodms
will print:
$ /u1/sas612/sas -nodms NOTE: Copyright (c) 1989-1996 by SAS Institute Inc., Cary, North Carolina, USA. NOTE: SAS (r) Proprietary Software Release 6.12 TS050 <cntrl-D> to exit.
To determine the version of Solaris:
$ cat /etc/release
Solaris 2.6 s297s_smccServer_37cshwp SPARC
Copyright 1996 Sun Microsystems, Inc. All Rights Reserved.
Manufactured in the USA 18 July 1997
Alternatively, uname -a works but requires a translation from the version
of the kernel/base OS (SunOS 5.6) to the complete Solaris umbrella name (Solaris
2.6). To list the swap configuration:
$ /usr/sbin/swap -l swapfile dev swaplo blocks free /dev/dsk/c1t10d0s1 32,73 16 1048784 1002016
Blocks are 512 KB so there is ~500 MB of SWAP configured above.
There is no easy way for users to determine the storage platform and layout from command line options so this information must be provided by the systems administrator. The command:
$ df -k
will show the mount points; this can be used to verify that WORK or data areas are physically located on the system and are not NFS storage areas.
The hardware configuration can be determined in several ways, either with a combination of prtconf(1M) and dmesg(1M) or with prtdiag.
$ /usr/sbin/prtconf -v | more System Configuration: Sun Microsystems sun4u Memory size: 1536 Megabytes System Peripherals (Software Nodes): SUNW,Ultra-Enterprise .....
Some prefer using the more obscure prtdiag(1M) command as it gives detailed system configuration which includes # CPUs, clock speed, e-cache, adaptor cards, memory interleave, board temperature, etc.: This command is located in the SUNWkvm package and may not be available for non UltraSPARC based systems. Prtdiag will return all the necessary hardware configuration info with the exception of the storage platform.
/*"SUNW,Ultra-Enterprise" below should be replaced with the
output of `uname -m` or particular architecture of your system. */
$ /usr/platform/SUNW,Ultra-Enterprise/sbin/prtdiag
System Configuration: Sun Microsystems sun4u 8-slot Sun Enterprise 4000/5000
System clock frequency: 82 MHz
Memory size: 1536MB
========================= CPUs =========================
Run Ecache CPU CPU
Brd CPU Module MHz MB Impl. Mask
--- --- ------- ----- ------ ------ ----
0 0 0 248 4.0 US-II 1.1
0 1 1 248 4.0 US-II 1.1
2 4 0 248 4.0 US-II 1.1
2 5 1 248 4.0 US-II 1.1
4 8 0 248 4.0 US-II 1.1
4 9 1 248 4.0 US-II 1.1
6 12 0 248 4.0 US-II 1.1
6 13 1 248 4.0 US-II 1.1
========================= Memory =========================
Intrlv. Intrlv.
Brd Bank MB Status Condition Speed Factor With
0 0 256 Active OK 60ns 4-way A
0 1 256 Active OK 60ns 2-way B
2 0 256 Active OK 60ns 4-way A
4 0 256 Active OK 60ns 4-way A
6 0 256 Active OK 60ns 4-way A
6 1 256 Active OK 60ns 2-way B
========================= IO Cards =========================
Bus Freq
Brd Type MHz Slot Name Model
--- ---- ---- ---- -------------------------------- ----------------------
1 SBus 25 1 cgsix SUNW,501-1717
1 SBus 25 3 SUNW,hme
1 SBus 25 3 SUNW,fas/sd (block)
1 SBus 25 13 SUNW,soc/SUNW,pln 501-2069
3 SBus 25 2 SUNW,soc/SUNW,pln 501-2069
3 SBus 25 3 SUNW,hme
3 SBus 25 3 SUNW,fas/sd (block)
3 SBus 25 13 SUNW,soc/SUNW,pln 501-2069
........
Lastly, truss(1M) is another often used tool which allows a process to trace and display all system calls. There have been countless times where we used truss to debug generic problems (i.e.; a process may fail because the user doesn't have write permission in a temp directory. In this case, the open system call would be shown and would have failed with the resulting errno(3C). The features of truss(1M) in Solaris 7 has been expanded to include library calls so you can follow the calling sequence through libc, libthread, etc.
In the area of SAS applications and I/O, there are only a few very simple concepts to keep in mind. Conceptually, there are 2 distinct I/O areas of concern to SAS applications. In practice, this can fan out to potentially many different directories.
There are SAS data areas and WORK or scratch areas. Data areas are specified
programmatically by the SAS System libname directive. Thus, they could
be any writable directory on Unix platforms including NFS (but hopefully
not). Any data in the WORK area is removed when the SAS application
terminates properly. Note, jobs that terminate abnormally could possibly
leave large unusable work areas.
It should be obvious that I/O configurations are highly dependent on site
specific, user specific and application specific factors. Thus, there is
no one set of I/O configuration guidelines that fit all (or most for that
matter) situations. However, here are a few general considerations:
RAID issues - RAID 5, Striping/Mirroring, Stripe interlace
Areas that are write intensive (typically WORK) should avoid using software
RAID 5(parity) and configure with RAID 0+1 (striped mirrors) or RAID 0(striped).
Due to the nature of RAID 5, a logical write requires on the order of 4-5
I/Os. Additionally, the parity calculations require a non-trivial amount
of CPU time. As a test, using a simple SAS data step, we copied our household
dataset (~1 GB) to a striped partition (RAID 0) and then to a RAID 5 partition
(using software RAID).
| Real | User | System | |
| RAID 0 | 2:33.7 |
57.1 | 1:02.6 |
| RAID 5 | 11:54.3 | 56.4 | 1:47.9 |
Hardware RAID 5 platforms such as the Sun StorEdge A3500 and the A1000 can
provide RAID 5 performance comparable to that of RAID 0. RAID 0 offers no
redundancy and thus should obviously not be used in 24x7 mission critical
environments.
Logical Volume Stripe Unit - Most workgroup and enterprise server configurations use a logical volume manager such as the VERITAS Volume Manager to configure and manage the storage platforms. In a striped configuration, the volume unit size (also known as the interlace, chunk size, segment size )times the number of columns or disks equal the stripe width. What stripe unit should be chosen? In the absence of knowing anything about the application, we simply suggest choosing 64K. With large blocked sequential I/O, its best to choose a moderate size stripe unit. This allows more I/O's to be spread out across all the spindles of the stripe column and for the read-ahead buffers to be handled more efficiently. From observations using truss(1M), you might notice that a SAS application will typically issue relatively small read/write requests, usually on the order of 8K or 16K. Thus, 64K or 128K should be a good target stripe unit. Classical SAS applications often depend on sequential I/O - i.e.: performing multiple iterations, record by record, over an entire data set. However, certainly, making extensive use of indexed data sets will usually not be sequential at all in nature. So, even in cases where random I/O may dominate(i.e.: using indexes with high cardinality in the data which return small result sets), a stripe unit of 64k is still generally a reasonable choice.
What about SWAPFS as a WORK area? There are times when this can be a performance win but in general, this should be avoided if working with large data sets. SWAPFS is a memory based file system which is backed by the SWAP partition. If you write large data files to this partition, you could easily induce paging.
Configure plenty of SWAP if using PROCs which have large virtual memory requirements. Many SAS procedures make extensive use of the mmap(2) system call. For every mmap'ed memory segment a corresponding amount of SWAP must be reserved even if it is not used. The pages are not allocated until needed. A number of PROCs will return "insufficient memory" errors if the SWAP reservation cannot be made. Thus you could be the only user on a quiet system configured with 8 GB of RAM and get an insufficient memory error if you do not have a large enough SWAP area. Some PROCs may produce unexpected results. For instance, we discovered in experimenting with different SORTSIZE values, that certain runs were using a variable amount of memory despite the fact that we knew exactly how much should have been used. We realized that our SWAP area was much too small to back the mmap requests depending on what else was going on system wide.. After allocating more SWAP area, the programs utilized the expected amount of memory. You can check for the amount of SWAP configured with the swap(1M) command (swap -l) and the amount reserved with swap -s. Note that the free space on the SWAP device does not equal the free SWAP available. This is because of the swapfs file system, /tmp. This partition is a combination of physical SWAP space and free memory. In the example below, the SWAP device reports 1.0+GB free while swap -s reports 1.7GB available swap -l gives the actual amount available on disk.
$ swap -l
swapfile dev swaplo blocks free
/dev/dsk/c1t10d0s1 32,73 16 1048784 1002016
$ swap -s
total: 33800k bytes allocated + 5488k reserved = 39288k used, 1731016k
available
If users specify large MEMSIZE requests, you must have a cumulative amount of SWAP area set aside. If in the unfortunate case where you do have paging to your SWAP area, configure your SWAP devices similar to your data areas. Optimize I/O- Rather than 3 or 4 single SWAP devices, consider a RAID 0 (striping) configuration. If you have a 7x24 environment, then RAID 0 will obviously not satisfy availability requirements. You can dynamically create and add SWAP areas with the mkfile(1M) and swap -a commands. Obviously, you want to avoid doing this on a disk partition that is under heavy load. As discussed in the section on memory, you can monitor the SWAP device as a simple way to check for paging.
SAS BUFSIZE/BUFNO Options / Solaris Buffer Cache
If you truss(1M) your SAS processes, you may find that most of the read/write(2) system calls use a fairly small buffer size, probably 8k or 16k. You may find that performance of certain jobs is increased if you increase the BUFSIZE option in the SAS System for a specific data set. Note, this BUFSIZE setting is specific to a SAS data set and its value is stored in the metadata header. Thus, the only way to change the buffer size would be to use a data step to copy the data set and specify the new BUFSIZE:
data new (BUFSIZE=64k)
set libname.old;
run;
You can query the BUFSIZE of the data set by issuing a PROC CONTENTS:
proc contents data=libname.dataset;
run;
In our experimentation, we found a small but not appreciable performance benefit to increasing BUFSIZE. However, one of large enterprise customers in the medical insurance industry found significant improvement in performance when they aligned the SAS BUFSIZE and the data volume strip unit. We need to mention that this testing and running of jobs was done in a very controlled environment. If you have a well characterized job or set of jobs and you have this ability to regulate the runs, you might find that changing BUFSIZE produces good performance gains.
The Solaris file buffer cache in lessens the need to modify BUFSIZE. An area
where changing BUFSIZE could potentially benefit would be if truss(1M)
were to show that a file was opened synchronously(O_SYNC) and sync(2) was
being issued to flush the contents of the output buffers. Note, that
truss(1M) with Solaris 7 can show library calls so this could show
up as an fsync(3C).Be careful in experimenting with this parameter as a number
of PROCs make algorithmic decisions based on the value of BUFSIZE which could
adversely affect performance. Although intuitively, you may want the SAS
System to take advantage of larger I/O buffer capabilities that are more
typical of today's enterprise configurations, it is probably, on the the
average, not wise to change BUFSIZE as a blanket policy . For large data
sets, it is a fairly expensive experiment since multiple copies of the data
set are required during the testing phase.
To show the effects of reading a data set from the Solaris file buffer cache we show timings from 2 consecutive runs where we read our household data set (but do no writes):
data _null_;
set gold.hrecs(keep=state);
/* no state codes of '00' */
where state = '00';
run;
Real |
User | System | |
| 1st time | 2:05.83 | 5.02 |
21.98 |
| 2nd time | 19.26 |
4.88 |
10.98 |
Obviously, there is a big advantage to being able to access files from the
Solaris buffer cache. We can use the Memtool command prtmem to show
the distribution of memory. Before the file copy above, prtmem would
show something similar to:
$ prtmem Total memory: 1468 Megabytes Kernel Memory: 125 Megabytes Application memory: 13 Megabytes Executable memory: 32 Megabytes Buffercache memory: 184 Megabytes Free memory: 1112 Megabytes
Note, that there is 180+ MB of memory being used for the buffer cache and 1.1 GB free
$ prtmem Total memory: 1468 Megabytes Kernel Memory: 125 Megabytes Application memory: 13 Megabytes Executable memory: 32 Megabytes Buffercache memory: 1272 Megabytes Free memory: 24 Megabytes
After the data set copy, free memory went to 24 MB and buffer cache memory was 1.2 GB. So, this is why the free memory shown in vmstat(1) can sometimes be misleading The same is true for the page-in and page-out columns. There is plenty of memory available and as long as no one is requesting it, memory will be used for the buffer cache. This is directly relevant to the memory discussions as well as the priority paging section.
If the size of the data sets are larger than the file cache, it is possible that the buffer cache could get in the way. See the section on Direct I/O below.
In our experience, we found that modifying the SAS System BUFNO option made no difference in performance nor did if seem to make a difference in the amount of memory used as reported by fullstimer. BUFSIZE and BUFNO make more of difference in legacy mainframe environments.
Data Set Compression
The SAS system has an option to compress data sets at creation by either
specifying (COMPRESS=yes) in the data step or by using
OPTIONS COMPRESS=YES;. Compression can typically save 2X the
amount of space that a data set might normally occupy. However, there
is no such thing as a free lunch. Saving disk space comes at the expense
of increased CPU time to compress the data as it is written and/or decompress
the data as it is read. If your system is CPU bound then compressed
data sets should probably not be used. However, if your applications
have a dominant I/O component, you may want to consider the compress option.
We have a documented case of ~15% performance increase using
compressed data sets versus uncompressed data sets. This was working
with ~2 GB data sets on a system with 336 MHz CPUs. An indication that
an application has a large I/O component is either a large system time as
reported by FULLSTIMER or large differential between Real time and (User
+ System) time. However, this could also be attributed to excessive
paging as well.
# /usr/sbin/newfs -v -i 32k -m 3 <device name>
For the most part, the Solaris buffer cache is self-tuning.
# vxtunefs -o read_pref_io=256k <mount point>
# vxtunefs -o read_nstream=4 <mount point>
Summary
The classical SAS program typically utilizes sequential processing on large data sets. Thus, configure your data areas for blocked sequential I/O and your WORK area for write intensive applications. Since the WORK area can be configured as a system wide resource, ensure that the I/O channels back to the host are sufficiently wide. Sometimes adding CPU resources could cause an overall system performance degradation in that they create an increased I/O burden that pushes the system over a previously optimal I/O configuration. When adding CPUs, ensure that the I/O subsystem can handle the increased requirements.
The data volume layout can make a substantial difference in performance. Although using a storage platform which had 4 times the bandwidth to the host and disks which had almost 2 times the rotational speed, I/O intensive tests took longer on the faster platform when the data was not striped. In other words, a significantly slower storage platform outperformed a faster storage platform when the underlying volume on the slower storage was configured more optimally.
Although the SAS System tends to issue relatively small read/write requests, choose a moderate size for the stripe unit. Using 64K is probably a good general rule of thumb. Don't change the SAS System BUFSIZE and BUFNO options unless you can do so in a controlled environment. Consider compressing SAS data sets if applications are dominated by a large I/O component. Note that compressing costs in terms of requiring addition CPU resources. I/O performance can potentially be gained by using the VERITAS File System (VxFS). VxFS provides other features making it very suitable and appropriate for large scale enterprise applications.
A CPU is one of potentially several/many "brains" of the system. Similar to the scarecrow feeling the need for a "brain" in the Wizard of OZ, we look at how systems could be constrained at the CPU level.
Does clock speed make a difference?
Here we look at a very commonly used SAS procedure which is particularly CPU intensive. Logistic regression is used to find patterns among the data and is often used in data mining and decision support applications.
A forward, stepwise and backward logistic regression was run on a data set which had 200,000 observations, ~500 variables, and a record length of about 1500 bytes. The total size of the data set was ~320 MB.
We ran the tests on different Ultra Enterprise Servers at clock speeds of
167 MHz, 250 MHz, 300 MHz, 336 MHz. The external cache varied on these systems
from .5 MB to 4 MB which might slightly affect the results but was probably
insignificant relative to the total processing time.
| 167 MHz | 250 MHz | 300 MHz | 336 MHz | |
| forward | 9:43:32 |
7:12:24 |
6:28:33 |
5:24:37 |
| stepwise | 10:03:59 |
7:28:22 |
7:08:08 |
5:43:15 |
| backward | 40:54:58 |
30:31:33 |
26:50:03 |
22:57:47 |


Note a near linear increase in performance times; as the clock speed doubled, the performance was about twice as good. The forward regression went from 9 hrs, 43 minutes to 5 hours, 24 minutes when the clock speed doubled. This speaks extremely well for the UltraSPARC processors. At the time of this writing, the faster processors in Sun systems clock in at 400 MHz.
Let's examine the timesharing effects on CPU bound applications. How are application times affected when there are multiple jobs contending for a single CPU?
We used PROC GLM (general linear model) which took about ~1 hour to run a single job and used about ~270MB of memory.
real user system
1:07:01 1:03:50 29.184 seconds
We ran 4 of these jobs with 4 CPUs enabled. Note, since each job required ~270 MB or RAM and the system had 1.5 GB RAM, the sum total memory required easily fit within the physical RAM configuration.
Since the system had 8 processors, 4 of them were turned off for the test.
Not that administrators would want to turn off processors. Perhaps
it would be a good April Fools joke
for an
administrator to turn off 63 processors of a fully configured Sun Enterprise
10000 (aka Starfire)
![]()
To turn off processors, use the psradm(1M) command; this command must
be run as root. Use the -f option to turn off the processors and the
-n option to turn them back on.
# /usr/sbin/psradm -f 8 9 12 13 # /usr/sbin/psrinfo 0 on-line since 01/22/99 10:01:50 1 on-line since 01/22/99 10:02:31 4 on-line since 01/26/99 09:05:20 5 on-line since 01/26/99 09:05:20 8 off-line since 01/25/99 17:00:26 9 off-line since 01/25/99 17:00:26 12 off-line since 01/25/99 17:00:26 13 off-line since 01/25/99 17:00:26
We start the 4 jobs simultaneously and got the following results:
Job 1 |
Job 2 |
Job 3 |
Job 4 |
|
| Real | 1:22:37 | 1:22:34 | 1:22:43 | 1:22:34 |
| User | 1:20:15 | 1:20:11 | 1:20:12 | 1:20:06 |
Note, the jobs all ran equally well with little performance degradation compared to when only 1 job was run. There are 4 jobs running and 4 CPUs enabled. Don't forget that the Solaris kernel itself needs CPU resources itself not to mention the many other system applications that are running. Thus with no memory contention, and no CPU contention, we get comparable performance.
To have been a more fair test, we should have allowed Solaris to have its own processor and used a feature of Solaris 2.6 called processor sets to created dedicated 4 CPU and 2 CPU sets. This is done by using /usr/sbin/psrset(1M) and /usr/sbin/pbind(1M) to remove the effects of unrelated activity.
Now, we turn off 2 of the CPUs so that only 2 CPUs are active:
# /usr/sbin/psradm -f 4 5 To confirm that only 2 are active: # /usr/sbin/psrinfo 0 on-line since 01/22/99 10:01:50 1 on-line since 01/22/99 10:02:31 4 off-line since 01/25/99 22:26:17 5 off-line since 01/25/99 22:26:17 8 off-line since 01/25/99 17:00:26 9 off-line since 01/25/99 17:00:26 12 off-line since 01/25/99 17:00:26 13 off-line since 01/25/99 17:00:26
As expected, the job times roughly doubled. If 1 job/CPU ran in ~1.3 hours,
it would be expected that 2 jobs/CPU would take ~2.5 hours.
Job 1 |
Job 2 |
Job 3 |
Job 4 |
|
| Real | 2:33:04 | 2:42:39 | 2:37:05 | 2:42:58 |
| User | 1:20:15 | 1:20:11 | 1:20:12 | 1:20:06 |
Note that user
times are nearly identical to the single job/CPU times but that the wall
clock time was just about exactly double. This difference in time is attributable
to the fact that the user process was idle waiting for CPU time slices.
Both examples speak well for the linearity and predictability of CPU intensive jobs on the UltraSPARC architecture.
If there is valid reason to give priority to a group of "power users" or end of quarter/month/year processes, consider using processor sets. Use /usr/sbin/psrset(1M) and psradm(1M) to create and administer these.
Summary
We have shown the effects of CPU contention in terms of raw CPU performance
and or simple CPU contention. One area that is often overlooked when creating
budgets is the cost of the users time spent waiting around for their applications
to run. It is always extremely difficult to quantify the cost or value
of employees time and productivity, however, this should not be overlooked.
Another area of consideration would be for production jobs which may run
overnight or toward scheduled peak times and the window of time for completion
gets smaller and smaller as data set sizes grow.
If systems are upgraded by adding more CPUs, ensure that there are sufficient
I/O channels to handle the potential increase in data traffic.
Baseline Memory Recommendations
From the hardware/OS point of view, the area of memory is probably the single most important area responsible for performance increases or decreases as well as for predictable performance under load.
You can think of memory as money in the bank. Consider a family of 5, all of whom make deposits as well as withdrawals. Given that overall monthly cash in versus cash out is positive, there could still be many times where there might not be enough cash at any given moment to cover requested withdrawals. The issue of memory is the same. Continuing the bank account analogy, where there may be automatic monthly debits to the account, administrators and users must also take into account other system requirements when gathering memory usage requirements. You can have sufficient CPU cycles and an optimally configured I/O subsystem, but without sufficient memory resources, performance will be dismal.
Assuming sufficient CPU, I/O and network resources, the bottom line is that you want the sum total of requested memory + plus system requirements to fit within the confines of your physical RAM configuration. This is absolutely crucial for predictable performance under load.
As a ballpark starting point, we generally recommend .5 GB of memory per CPU. On a more granular level, allocate 32-64 MB for base Solaris, 32 MB for the windowing system and 32 MB per SAS session. We will discuss how any individual SAS session could very easily need to be increased to .5 GB or even over 1 GB. To accumulate SAS session memory requirements on a system level, we must know:
This sounds as if it should be fairly simple, but in practice, it is practically impossible to find this number.
The Solaris memory system provides virtual memory management on a demand paged basis. The swap device is used as a physical device for backing store for this virtual address space.
There is an excellent white paper detailing the Solaris Memory System written by Richard McDougall and available from ftp:playground.sun.com in the pub/memtool directory.
In many cases, the SAS software platform does virtual memory management on behalf of the SAS process. Memory is typically mmap(2)ed up to the limit of MEMSIZE if the specific application requires it or believes it can take advantage of it.
While Solaris does an excellent job of paging, allowing the SAS virtual memory management to page on top of Solaris is completely detrimental to application performance.
The key to predictable performance is to allow each SAS user a portion of memory such that the memory requirements for all concurrent SAS users can be resident in memory.
Let's examine paging from both the system and SAS users perspective.
From a system perspective: Here are 2 "quick & dirty" ways to determine if your system is experiencing excessive paging using tools bundled with Solaris:
$ cat checkswap echo " extended device statistics" echo " r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device" iostat -nxc 3 7 | grep c0t5d4
Note: the disk name substring (c0t5d4) is hardcoded above. It also assumes a single swap device in a complete partition.
Users can check the memory usage of their own applications with a new utility in Solaris 2.6, pmap(1). An excerpt of the output shows:
$ ps -e | grep sas 29859 pts/0 0:01 sas $ /usr/proc/bin/pmap -x 29859 29859: /u1/sas612/sas Address Kbytes Resident Shared Private Permissions Mapped File 00002000 8 8 - 8 read [ anon ] 00010000 2048 1792 8 1784 read/exec sas 0021E000 56 56 - 56 read/write/exec sas 0022C000 1272 1256 - 1256 read/write/exec [ heap ] EE210000 256 32 - 32 read/write/exec [ anon ] EE280000 2144 1200 8 1192 read/exec sasxxxpm EE4A6000 336 336 - 336 read/write/exec sasxxxpm EE500000 432 408 400 8 read/exec libX11.so.4 EE57A000 24 24 8 16 read/write/exec libX11.so.4 EE600000 2912 2304 8 2296 read/exec sasmotif ...... EF7D0000 112 112 112 - read/exec ld.so.1 EF7FA000 8 8 8 - read/write/exec ld.so.1 EFFF6000 40 40 - 40 read/write/exec [ stack ] -------- ------ ------ ------ ------ total Kb 13192 10184 1960 8224
This is a SAS session sitting in an idle display mode. Our resident size for this process is 10+MB and this is about the overhead for each SAS session.
Administrators can use tools bundled with Solaris such as sar(1), vmstat(1M), or even ps(1) which give you a snapshot of collective memory requirements. There are also freely available tools such as top or proctool and Memtools (references in the Appendix) l. Memtool is actually collection of various tools described in the Solaris Memory white paper above. Solaris 7 also includes sdtprocess(1) which is similar to top.
Let's take a look at some of the tools. Below we see a snapshot of proctool on a relatively quiet system.

Of particular interest, look at the SIZE and RSS columns. SIZE
represents the amount of virtual address space and RSS represents the amount
of memory which is actually resident. A simple way to think about whether
you have enough memory is to sum the SIZE values or virtual address space
(the amount all processes want) and the RSS values (the amount that all processes
have). If the 2 values are roughly equal, (Requested = Actual Size), then
you have enough memory. However, if the actual(RSS) is much less than
requested(SIZE), then paging is likely an issue. The sum of the RSS
column should equal roughly the amount of physical RAM in your system. The
(Memtool) memps -s command can give you a more finely grained breakdown
on the memory allocations. RSS or resident memory is actually composed
of shared and private memory segments. Thus, a large shared value among
like processes would not be consuming unique amounts of memory. For
example, most code segments for a specific binary are loaded in a read-only
shared mode. Thus multiple copies of that binary would only consume
the "shared" amount of memory. However, typically, SAS processes which
have large memory footprints consume mostly private data. Thus, in general,
when monitoring SAS processes, resident size values should be fairly close
to the size of the private segment. We show an excerpt from memps -s
below:
$ memps -s
SunOS ctcsun2 5.6 Generic_105181-11 sun4u 02/05/99
14:56:48 WaitCPU WaitIO AvgLoad Users CUsers
14:56:48 0.00 0.00 0.02 6 6
14:56:48
PID Size Resident Shared Private Process
28161 5912k 4720k 1304k 3416k ./proctool
11517 5904k 3752k 1344k 2408k dtgreet -display :0
1123 5728k 3840k 1440k 2400k /usr/dt/bin/dtlogin -daemon
11503 5752k 3840k 1440k 2400k /usr/dt/bin/dtlogin -daemon
28162 2848k 2056k 728k 1328k ./pmon
1106 3016k 2256k 1072k 1184k /usr/sbin/nsr/nsrindexd
932 2792k 2224k 1192k 1032k /usr/lib/autofs/automountd
1023 2416k 2088k 1072k 1016k /usr/sbin/nsr/nsrd
11 2744k 776k 24k 752k vxconfigd -m boot
951 2248k 1912k 1216k 696k /usr/sbin/nscd
936 4128k 1864k 1176k 688k /usr/sbin/syslogd
994 2168k 1712k 1080k 632k /usr/lib/sendmail -bd -q15m
1107 2520k 1912k 1280k 632k /usr/lib/nfs/mountd
11502 16208k 1672k 1080k 592k /usr/openwin/bin/Xsun :0 -nobanner
....
Although the listing above is not complete, the Size (Requested) column is fairly close to the RSS (Actual) and the totals fit well within our 1.5 GB RAM configuration.
The (Memtool) prtmem command is also very useful in showing the break down of application memory and Solaris buffer cache memory.
$ prtmem Total memory: 1468 Megabytes Kernel Memory: 117 Megabytes Application memory: 14 Megabytes Executable memory: 41 Megabytes Buffercache memory: 7 Megabytes Free memory: 1287 Megabytes
Note- Buffer cache memory is low, free memory is high.
$ vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s6 sd sd sd in sy cs us sy id 0 0 17 13960 7344 0 25 351 40 468 0 98 0 0 0 0 140 162 114 2 0 98 0 0 28 2718608 1318792 0 0 0 0 0 0 0 0 0 0 0 113 43 76 0 0 100 0 0 28 2718608 1318792 0 0 0 0 0 0 0 0 0 0 0 115 45 77 0 0 100
Vmstat(1) pretty much agrees with prtmem.
No activity on the swap device.
$ ./checkswap
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.2 0.1 1.5 3.4 0.0 0.0 1.2 18.9 0 0 c0t5d4
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t5d4
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t5d4
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t5d4
We saw a quiet system. Now, we start 2 SAS jobs which each will eventually request 1.2 GB of memory. These 2 jobs by themselves will collectively request 2.4 GB memory, and recall that we only have 1.5 MB of RAM. This particular proc (MDDB) requires a minimum MEMSIZE such that loading the N-way cube fit in virtual memory (1.2 MB in this case). If our system only had .5MB memory and were just running 1 job, you would have no choice but to allow Solaris to page on behalf of the SAS System.

Here we see a snapshot from proctool as the processes are getting started.
Each process is asking for ~612+ MB(Size column) but has been allocated only
~599 MB (RSS column).
$ prtmem Total memory: 1468 Megabytes Kernel Memory: 116 Megabytes Application memory: 1268 Megabytes Executable memory: 5 Megabytes Buffercache memory: 55 Megabytes Free memory: 22 Megabytes
Now, prtmem is showing most available memory allocated to application memory. High allocation to buffer cache memory is not necessarily bad(and is in fact usually a good thing). However, note that the "free" column in prtmem as well as vmstat may mislead one into thinking that there is a memory shortage when in fact it is just allocated to the buffer cache. This is where prtmem is helpful in that it can narrow the memory allocations between buffer cache and all other application or kernel usage.
As the job progresses, proctool below shows that each as requested
~1 GB but still has ~600 MB.
Another view from memps -m is showing basically the same thing:
$ memps -m SunOS ctcsun2 5.6 Generic_105181-11 sun4u 01/29/99 15:51:54 WaitCPU WaitIO AvgLoad Users CUsers 15:51:54 0.00 2.00 1.04 6 6 15:51:54 PID Size Resident Shared Private Process 20777 1161656k 677376k 1400k 675976k /u1/sas612/sas -memsize 1200m -auto 20781 1161664k 676776k 1392k 675384k /u1/sas612/sas -memsize 1200m -auto 20813 11240k 5640k 1400k 4240k /u1/sas612/sas -config /u1/sas612/a 20815 5912k 3008k 1232k 1776k ./proctool 11517 5904k 2152k 1224k 928k dtgreet -display :0 20816 2848k 1592k 688k 904k ./pmon .....
Our swap device is showing activity:
$ ./checkswap
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.2 0.1 1.5 3.5 0.0 0.0 1.2 18.9 0 0 c0t5d4
8.3 0.0 66.4 0.0 0.0 0.1 0.0 13.8 0 11 c0t5d4
10.0 0.0 80.0 0.0 0.0 0.1 0.0 12.8 0 13 c0t5d4
6.0 0.0 48.0 0.0 0.0 0.1 0.0 11.9 0 7 c0t5d4
10.7 0.0 85.3 0.0 0.0 0.1 0.0 13.6 0 14 c0t5d4
11.0 0.0 88.0 0.0 0.0 0.1 0.0 12.9 0 14 c0t5d4
9.0 0.0 72.0 0.0 0.0 0.1 0.0 12.5 0 11 c0t5d4
And vmstat is also showing a memory shortage with very high scan rates:
$ vmstat 5 procs memory page disk faults cpu r b w swap free re mf pi po fr de sr s6 sd sd sd in sy cs us sy id 0 0 17 14696 7504 0 25 353 41 469 0 99 0 0 0 0 140 163 114 2 0 98 0 1 49 398280 23816 9 168 1438 3224 3465 0 4116 0 0 0 0 384 642 410 2 2 96 0 1 49 398192 24272 0 156 1286 2896 3112 0 4081 0 0 0 0 374 539 406 0 2 98 0 1 49 398200 24216 3 162 1283 3404 3563 0 4117 0 0 0 0 433 694 426 1 2 97 0 1 49 398208 24088 44 157 1406 3644 3734 0 4056 0 0 0 0 407 601 418 1 3 96
We looked at proctool and the memtools during this process.
For completeness, we wanted to show similar output from
top. An advantage of top is that it does not require the
installation of a kernel driver; however, it is usually installed as
a setuid root program. Thus, it can be run by users without requiring
the root password. Top output as the same two previous MDDB memory
intensive jobs get started:
$ top
last pid: 4862; load averages: 1.84, 0.88, 0.36 13:31:03 62 processes: 56 sleeping, 2 zombie, 1 stopped, 3 on cpu CPU states: 75.3% idle, 20.3% user, 3.7% kernel, 0.7% iowait, 0.0% swap Memory: 1536M real, 30M free, 1241M swap in use, 1258M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 4858 root 4 0 0 515M 502M cpu9 2:47 11.76% sas 4859 root 3 0 0 515M 502M cpu4 2:46 11.75% sas 4862 root 1 0 0 2240K 1680K cpu0 0:04 0.31% top 1023 root 1 58 0 2416K 1144K sleep 3:49 0.00% nsrd 989 root 1 58 0 904K 656K sleep 0:04 0.00% utmpd 932 root 6 58 0 2848K 1936K sleep 20:17 0.00% automountd 880 root 1 58 0 2072K 912K sleep 1:13 0.00% rpcbind 951 root 11 50 0 2296K 1632K sleep 0:12 0.00% nscd 11 root 4 58 0 2744K 504K sleep 0:12 0.00% vxconfigd 1 root 1 58 0 824K 112K sleep 0:07 0.00% init 1106 root 1 58 0 3016K 1056K sleep 0:04 0.00% nsrindexd 945 root 1 48 0 1688K 440K sleep 0:03 0.00% cron ....
Similar to the 2nd proctool illustration above, the top command excerpt below demonstrates the same differential between SIZE (requested) and RES (actual or RSS) and thus we know that paging will be a performance factor. Both processes are asking for 1+ GB but able to secure only ~600 MB.
last pid: 4863; load averages: 1.03, 1.11, 0.64 13:36:22
63 processes: 57 sleeping, 2 zombie, 1 stopped, 3 on cpu
CPU states: 19.7% idle, 9.5% user, 3.0% kernel, 67.8% iowait, 0.0% swap
Memory: 1536M real, 22M free, 2114M swap in use, 386M swap free
PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
4859 root 3 20 0 1069M 619M cpu13 6:18 6.75% sas
4858 root 4 41 0 1000M 551M cpu4 5:50 4.63% sas
4862 root 1 0 0 2240K 1432K cpu5 0:12 0.31% top
936 root 20 52 0 4128K 1496K sleep 0:02 0.00% syslogd
932 root 6 58 0 2848K 1792K sleep 20:17 0.00% automountd
1023 root 1 58 0 2416K 784K sleep 3:49 0.00% nsrd
880 root 1 58 0 2072K 680K sleep 1:13 0.00% rpcbind
951 root 11 50 0 2296K 1552K sleep 0:12 0.00% nscd
11 root 4 58 0 2744K 152K sleep 0:12 0.00% vxconfigd
1 root 1 58 0 824K 120K sleep 0:07 0.00% init
1106 root 1 58 0 3016K 696K sleep 0:04 0.00% nsrindexd
989 root 1 58 0 904K 632K sleep 0:04 0.00% utmpd
945 root 1 48 0 1688K 360K sleep 0:03 0.00% cron
NOTE: PROCEDURE MDDB used:
time: memory:
real 12:22.520 page faults 3882
user cpu 7:26.896 page reclaims 0
system cpu 2:28.122 usage 1128.56 M
When 2 jobs are run, there should be plenty of CPU bandwidth since this is an 8 way system. However, due to the memory contention, the time to run goes from approximately 12 minutes to over 1 hour. All this extra time was spent paging. Recall that we had 1.5 GB RAM and the 2 jobs by themselves wanted 2.4 GB.
Time |
|
| job 1 | 1:05:16 |
| job 2 | 1:14:33 |
Let's look at this another way. We'll show how a faster system can produce
slower results if there is a memory shortage. We have 2 different MDDB procs
which respectively requires memory of 547 MB and 1.1 GB. Our "slower" system
has 250 MHz processors and 1.5 GB RAM and our "faster" system has 300 MHz
processors but with only 1 GB RAM.
| MDDB proc 1 547 MB |
MDDB proc 2 1120 MB |
|
| "Slower" system 1.5 GB RAM |
7:22.0 | 15:09.9 |
| "Faster" system 1 GB RAM |
6:36.7 | 35:52.4 |

From the CPU discussion above, we saw a near linear increase in performance for CPU bound applications. Thus we expect the "faster" system to outperform the "slower" system. The results show that this is true when problem "fit" in memory, but when it didn't, the time practically doubled and the "faster" system actually ran much slower.
We don't intend to single out proc MDDB in our examples. There are other SAS procs which can potentially require a large minimum amount of memory in order to run. Some examples include IML, GLM, and certain data mining procs.
SWAP - A Necessary Evil, Gotta' have it, even if you don't use it
As mentioned in the I/O Section earlier, plenty of SWAP space must be allocated. How much can only be determined collectively by the users and system administrators. There must be enough to back all mmap(2)ed requests or jobs could fail with insufficient memory. Thus, even if you have enough memory for jobs to be resident in memory, you could likely get an insuffcient memory error condition if there is not enough swap to service the reservation. Given that any user can ask for an arbitrarily large memory footprint by specifying MEMSIZE=<BIG VALUE> or MEMSIZE=0, it is not immediately obvious to systems administrators how much SWAP to allocate.
Also, don't confuse virtual memory with real memory. Physical memory accesses is several orders of magnitude faster than a disk access. The main goal is to eliminate or minimize the activity to the SWAP device.
Increasing MEMSIZE/SORTSIZE - when does it help, when does it not?
There are 2 options when running SAS applications which can control memory usage; MEMSIZE and SORTSIZE. They are set at 32 MB and 16 MB respectively in the default config file (config.sas612 or sasv7.cfg depending on the version of the SAS software).
MEMSIZE is the total amount of memory that a SAS application could allocate on behalf of a SAS process. The SORT procedure would use up to SORTSIZE amount of memory so as a general rule of thumb, MEMSIZE should be at least (SORTSIZE + ~4 MB) just to ensure that there is enough memory to meet the SAS requirements. Let's take a closer look at MEMSIZE. MEMSIZE is an upper limit. Consider 3 categories of SAS procs when looking at individual MEMSIZE settings:
i) Uses a small fixed amount of memory regardless of the value of
MEMSIZE
If you have MEMSIZE set to a large value and the fullstimer option reports
some small value of memory used, then increasing MEMSIZE won't help. For
instance
memory:
usage 57 K
Many data steps and procs (freq, tabulate, etc) use only a small amount of memory. However, if amount of memory reportedly used is close to MEMSIZE, increasing MEMSIZE may help.
ii) Uses more memory as MEMSIZE increases
The SORT procedure is a good example. It will use as much memory as
specified in SORTSIZE. However, in general, unless the entire data set can
fit in memory, performance will remain flat despite the fact that more memory
is used. Thus, you could hurt overall system performance by consuming
more memory even though your application experiences no benefit. Below
are the results of SORT on our 1 GB household data set.
| SORTSIZE |
Memory |
Time |
16m |
16.49 MB | 8:33.45 |
32m |
32.82 MB | 8:23.43 |
64m |
65.44 MB | 8:26.98 |
128m |
130.72 MB | 8:33.64 |
256m |
261.19 MB | 8:16.93 |
512m |
522.36 MB | 8:15.90 |
768m |
783.65 MB | 9:14.33 |
1024m |
1045.03 MB | 8:52.91 |
1300m |
1141.12 MB | 3:30.72 |

As you can see, the times were flat around 8-9 minutes until the data set
fit in memory and then it went down to 3+ minutes. Unless you are sure it
will fit in memory AND you won't contend with other user requirements,
don't change SORTSIZE. There is an undocumented SORT parameter, UBUFSIZE;
however, in our testing scenario we did not see any performance benefit from
changing it from its default value of 8K.
The good news is that with SAS Version 8, additional memory is not consumed unless SORTSIZE is large enough for the entire data set. However, recall that SORTSIZE could still exceed the amount of physical RAM or the amount of available memory at a given time. In this case the SAS System would be relying on the virtual memory paging facility of Solaris and performance will end up potentially be much worse than specifying a small or default SORTSIZE of 16 MB.
If SORT is not depending on a complete in-memory sort, the resulting runtime will typically be dominated by the I/O component. This can be seen as either a proportionally large system time in the -FULLSTIMER output. Alternatively, a result where the REAL time component is larger that the sum of the USER + SYSTEM components demonstrates the same effect. In this case, the I/O configuration will be critical to maximizing performance.
Similar affects can be seen with the LOGISTIC procedure. In this case, the threshold was basically the size of the data set. If MEMSIZE is set to something less than the size of the data set, only 2 MB memory was used. The performance was only marginally slower when 2 MB memory was used than when the whole data set fit in memory. Thus it is arguable whether the benefit gained is worth the cost of memory consumption.
Memory |
Time |
|
MEMSIZE |
||
16m |
1.3 MB | 7:01:12 |
32m |
1.3 MB | 7:11:51 |
64m |
1.3 MB | 7:14:42 |
128m |
1.3 MB | 7:07:31 |
256m |
1.3 MB | 7:08:01 |
512m |
416 MB | 6:28:32 |
In this case, 30 minutes out of 7 hours was saved by fitting the entire data set in memory. The cost for this savings was basically 500 MB of memory, about half the memory available to all users of the system.
Similar to the SORT procedure, the PHREG procedure (used for survival analysis studies) shows the same behavior, though in a more stairstep fashion. Although, the data set to fit in memory on the upper MEMSIZE values, we never saw any real increase in performance. Thus, increasing MEMSIZE in this case, only hurt because it consumed memory resources and making less available for other system processes.
Memory |
Time | |
MEMSIZE |
||
16m |
15.0 MB | 3:07 |
32m |
30.1 MB | 3:04 |
64m |
40.4 MB | 3:01 |
128m |
40.4 MB | 3:02 |
256m |
183.3 MB | 3:09 |
512m |
183.3 MB | 3:09 |
768m |
183.3 MB | 3:09 |
Not to revisit the paging issues discussed above, here is an example of proving
the value of finding the "sweet spot" for the MEMSIZE setting. We had a large
healthcare drug safety application using the IML procedure. Their system
had 1.5GB RAM. In initial testing, MEMSIZE values of 1 GB and 2 GB were used.
When 1 GB was used, the job did not run due to insufficient memory.
Because they weren't sure exactly how much memory proc IML required
in their situation, MEMSIZE of 2 GB was used. When set at 2 GB (.5 GB over
their physical RAM configuration), the job ran in 23 hours with user+system
time coming in around 13 hours. It was suggested that they try to discover
a MEMSIZE value which would allow the proc to run that was under the boundary
of the their physical RAM configuration. If it wasn't possible, they
would have to rely on the virtual memory system. However, there was
good news in that using a MEMSIZE setting of 1.2 GB, the real time was brought
down to 13 hours thus reducing their job time by 10 hours. It was
definitely worth their trial and error effort to discover the "minimal maxima"
memory requirement.
Thus, you can increase MEMSIZE (or SORTSIZE) with the invocation:
<SAS_INSTALL_DIR>/sas -memsize 1024m myprog.sas
This kind of testing is made more difficult if other people or processes are contending for system resources at the time of your tests. If you have a performance baseline, you can still do valid comparisons. The user and system times should remain about the same between runs regardless of CPU and memory contention. If the real or wall clock time wildly differs between runs, then it is likely there is contention for CPU, I/O or memory resources.
In cases where there is a fair amount of memory paging going on, we have seen that general application performance takes an unusually severe degradation. For instance, an "ls"(1) of a directory doesn't come back or a simple windowing system event doesn't get serviced (mouse click). In this scenario, it is possible that the application pages are being paged out due to the large amount of paging and/or file accesses.
With priority paging turned on, the algorithm allows the system to place a boundary around the file cache so that file system I/O does not cause paging of applications.
To use this, you need Solaris 7 or Solaris 2.6 with kernel patch 105181 (rev 09 or higher). Set the following in /etc/system and reboot:
set priority_paging=1
You can discover more information on this via:
http://www.sun.com/sun-on-net/performance/priority_paging.html
As discussed in the Solaris Buffer Cache section and in the memory section
where the prtmem command was shown, don't confuse the distinction
between the memory used for the buffer cache and application memory.
Priority paging is targeted for situations where heavy file system
I/O induces a situation where the application pages become second class citizens.
One critical note, if you enable priority paging, make sure that your large data files do not have execute permission on them. The criteria for determining an application page versus a pure file system I/O page is whether the file has execute permission.
Summary
In general, if an application is not particularly, I/O intensive, and your wall clock time is 2 or more times the combined user + system time, look closely for CPU and memory/paging constraints. Also, as noted in the I/O section, be sure that you have enough backing store (SWAP) or you may see unpredictable results.
Only the user and system administrators together can determine the cost/benefit ratio of increasing memory resources.
Increase MEMSIZE carefully and judiciously. If you know that you can benefit from an increase in memory, evaluate its effects on a system wide basis. More often than not, in this case, using a smaller MEMSIZE will gain you more predictable performance when the system is under load. Think of it as a large sandbox with alot of toys. The more kids in there, the fewer toys there will be to play. It's alot more pleasant for all the kids if they are to share rather than to fight and try to get toys exclusively for themselves.
In certain cases, it will be quite possible that every page the SAS application asks for causes a page fault, and performance will be dismal. If the application can't be recoded (fewer iterations or fewer variables used), the only way to improve performance would be to add memory to the system. If these jobs are critical for your job function, then it should be justifiable that more memory is needed.
We have hopefully demonstrated the effects of paging. When asked "Do SAS
applications require alot of memory?"; the answer is, "In general, no, SAS
applications do not require alot of memory". For "most" procedures,
the SAS System works extremely well in low memory configurations. However,
specific applications can have large memory requirements either by
user directive or if there are problems to solve which require large memory
configurations.
In a reasonably configured system, the most performance gain can usually be realized by tuning the application. The manual, SAS Programming Tips: A Guide to Efficient SAS Programming outlines basic principals for writing efficient SAS code, namely:
This manual does a nice job of cross referencing the tips and suggestions by resources saved: CPU, I/O, memory and should be a must-read for SAS application developers. The suggestions above are only a small sample of what is covered.
Again, you may find better results which are contrary to these suggestions. A user from a large consumer credit organization mentioned that they found that using the OR operator as opposed to IN provided better results.
Another reference highly recommended by SAS users, Efficiency: Improving the Performance of Your SAS(R) Applications, by RobertVirgile (order GO55960 or ISBN: 1-58025-228-1) can be ordered through SAS publications.
We discussed the efficiency of creating indexes. The recommended decision criteria as to whether or not to index a data set is roughly based on the expected size of the returned sample. On the average, indexing should be used if the typical returned sample is ~30% or less of the entire data set. As the returned subset gets larger, the cost of creating, maintaining and reading the index is one of diminishing returns.
Using our household data set, we perform a PROC FREQ of the state variable to understand the distribution of the data by state, pull a 6% sample, a 40-50% sample and then a 70-80% sample over the vanilla dataset, a sorted version and then an indexed version.
| Regular | SORTed | INDEXed | |
| SORT Time | 5:36.5 |
||
| INDEX Time | 2:02.7 |
||
| proc FREQ | 58.3 |
1:34.1 |
1:24.7 |
| 6% sample | 51.8 |
51.1 |
6.9 |
| 40-50% sample | 1:17.1 |
1:10.6 |
56.7 |
| 70-80% sample | 1:23.4 |
1:22.6 |
1:23.3 |
The cost of sorting and indexing is not trivial in terms of CPU cycles and
disk space required. This exercise was to demonstrate (albeit simplistically)
how application coding decisions can widely affect the results to get the
same answer.
Another example could be deciding between PROC SUMMARY or PROC SQL which could be used interchangeably. PROC SQL sorts the input data set while PROC SUMMARY sorts the output data set. If the input data set is large and the output data set small, then proc SUMMARY is probably more appropriate. If the reverse is true (output dataset > input dataset), then choosing PROC SQL would be more beneficial.
In another example, we were running PROC SUMMARY in Version 7. While running, we ran truss(1M) on the process and noticed that the process was endlessly showing:
# ps -e | grep sas 20926 pts/0 0:45 sas #/usr/bin/truss -p 20926 ... getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) getcontext(0xEE48D568) ....
It turned out the this was the result of the floating point exception checks. This code was needed to prevent PROC SUMMARY from terminating in the case of missing values. Knowing this was not an issue, we supplied the "notrap" option to PROC SUMMARY. The results were significant in that performance almost doubled:
| default | with notrap | |
Real time |
40:56.29 | 17:54.96 |
User cpu time |
25:08.09 | 17:03.29 |
System cpu time |
14:38.10 | 25.09 seconds |

While it may not have been obvious from the truss output that you needed to turn on "notrap", this info was enough to get quick and immediate reply from SAS engineering.
Version 8 data access
The SAS System, Version 8 provides many new features some of which include
more transparent access to different data set formats. Data sets can be accessed
across different hardware platforms without converting to transport format
via PROC CPORT/CIMPORT. For example, a SAS program running on Solaris
would be able to read a data set on a different hardware platform which had
a different byte ordering format. This is a very nice feature but it
does come at a performance cost in terms of the SAS system having to internally
do the conversion. If frequently used, large data sets are being utilized,
it could be well worth the conversion time and disk space to convert them
to native formats.
Interesting and Important But...
A key area of the system that was not discussed was that of the network configuration. This could probably warrant a paper by itself. A real strength of the SAS System is in the suite of SAS/CONNECT and SAS/ACCESS products which respectively offer you the ability to remote connect and/or distribute portions of the SAS application to remote servers as well as the ability to seemlessly access many different types of data stores/application suites. Obviously, a key component for successful implementation would be the network architecture.
There are both technologies and products from both SAS Institute and third parties which allow parallelization of typical data warehousing SAS processes and thus can exploit the multiprocessor environment of the Sun Enterprise series. Products such as the SPDS - Scalable Performance Data Server from the SAS Institute allow parallelization in areas such as fast parallel loads, batch updates, fast reads, bitmap and b-tree indexes, parallel sort for order by processing, and WHERE clause evaluation. or from third parties such as The SAS Analyzer from Ab Initio or Orchestrator for the SAS System from Torrent Systems all of which are available and shipping today.
Executive Summary
System performance must be approached holistically with cooperation between the users and system administrators by examining an application within the context of the system configuration. A good configuration must balance the CPU, I/O, and network system resources against the collective application requests. Increasing resources in one area without regard to the other areas could actually hurt overall performance. Lastly, don't overlook optimizations at the application code level as this is where the biggest payoff is most likely to be realized.
Classical SAS applications require relatively little memory however, some
applications can require a large amount. Getting a handle on the collective
memory requirements is a key factor for predictable performance
under load. Having a surplus of system memory, is the cheapest "insurance"
that one can buy when sizing a system for future growth and unexpected or
unpredictable peak load times. As a very general target, look to see if your
system has either .5 GB of memory configured per CPU or 32 MB per user.
Adjustments up or down should be made from those starting points.
There is relatively little tuning outside of "common sense" that should be
done to the system (i.e.: tuning of kernel parameters, adjusting OS I/O buffer
sizes, implementing more/less aggressive file caching and look ahead, etc).
Rather, the biggest wins can usually be gained at the application level.
With exception to sometimes changing the SAS MEMSIZE/SORTSIZE
parameters, it's probably best to generally leave all other SAS parameters
as well as Solaris parameters alone. The exception is if you have
consistent production level jobs which are periodically run on a regular
schedule. An example might be a weekly or monthly refresh of the data warehouse
or data mart. Only in these cases, where you have a well characterized job,
does it usually make sense to start looking at changing some of the other
SAS System or file system parameters.
In general, tune the application and not the hardware or OS or SAS
platform.
Hopefully, we have demonstrated the effects of contention for resources
on the I/O, CPU and memory level and when it might make sense to investigate
system upgrades and/or expansions. While the costs of reduced or limited
productivity are difficult to quantify, this factor should not be overlooked
if many analysts and users experience idle time waiting for their jobs and
applications to complete.
Note: top, proctool, Memtool, hstat are not officially supported. Additionally, all except top, require installation of kernel drivers. The tools which are bundled with Solaris are fully supported.
B) Detailed system configurations
Most testing was done on a pair of 8 way Sun Enterprise 4000's. Each had 1.5 GB RAM and utilized Sun Model 100 Storage Arrays configured with the VERITAS Volume Manager version 2.5. One system had 167 MHz processors, while the other had 250 MHz processors. Both were running Solaris 2.6. For the VERITAS File system tests, we were using version 3.2.5 of VxFS. On one system we were running the SAS System, Release 6.12 TS045 and TS050 on the other.
For the other clock speed tests, we used a 2 way Sun Enterprise 450 with 300MHz processors, 1 GB RAM, and internal UltraSCSI storage. The last system we used was an 8 way Enterprise 4500 with 336 MHz processors, 4 GB RAM, and Sun A5000 storage. Both systems had similar software configurations to the base test platforms.
We would like to thank the following people for their contributions of expertise and time:
If you have any comments, suggestions or would like to share any application
experiences, email one of the authors.
09June1999
Version 1.2
http://www.sas.com
/partners/directory/sun/performance/
![]() |
| Contact Us | Worldwide Sites | Search | Site Map | RSS Feeds | Terms of Use | Privacy Statement | Copyright © 2008 SAS Institute Inc. All Rights Reserved |