www.sas.com > SAS UK > In the Know Homepage Search | Contact Us     
SAS UK Newsletter Banner SAS - The power to know(tm)  

New Datastep Functions in SAS®9

Previously in version 8 we may have used multiple functions and/or logic within the datastep to achieve the desired results but now in version 9 in order to make this process easier we have introduced many new SAS functions. This will save time, effort and system resources when coding using the datastep.

  • New CHARACTER functions
  • New FIND and FINDC functions
  • New CAT functions
  • New COMP functions
  • New COUNT functions
  • New LENGTH functions
  • New Mathematical & Statistical functions
  • New VARIABLE functions
New CHARACTER Functions There are 13 of these character functions and they come in pairs. The 'ANY' group of functions search for the first matching character in the target string. The 'NOT' group of functions searches for the first character that does not belong to the target string.

Here's an example using the ANYLOWER function. An optional second parameter allows you to specify the starting position for the scan and the direction of the scan. In this example we are searching for a lower case character in a string starting at position 5 but moving from right to left.

 
     p=anylower('ALIcE', -5);   p=4
	
	
Note: The result is still given in the forward direction from left to right.

Here is a table summarising the new character functions.

"ANY" function "NOT" function Characters Considered
ANYALNUM NOTALNUM 0-9 a-z A-Z
ANYALPHA NOTALPHA a-z A-Z
ANYCNTRL NOTCNTRL Control characters
ANYDIGIT NOTDIGIT 0-9
ANYFIRST NOTFIRST Characters valid as first character in a SAS variable name under VALIDVARNAME=V7, i.e. a-z A-Z and underscore
ANYGRAPH NOTGRAPH Printable characters other than white space. (Definition varies depending on TRANTAB, ENCODING and LOCALE system options.)
ANYLOWER NOTLOWER a-z
ANYNAME NOTNAME Characters valid in a SAS variable name under VALIDVARNAME=V7 i.e. a-z A-Z 0-9 and underscore
ANYPRINT NOTPRINT Printable characters (including white space). (Definition varies, as for ANYGRAPH).
ANYPUNCT NOTPUNCT Punctuation (Definition varies, as for ANYGRAPH).
ANYSPACE NOTSPACE White space (including TAB, VT, CR, LF, FF)
ANYUPPER NOTUPPER A-Z
ANYXDIGIT NOTXDIGIT 0-9 a-f A-F

New FIND and FINDC Functions

The FIND and FINDC functions are enhanced versions of the INDEX and INDEXC functions respectively.

There are several optional parameters that can be added to the FIND or FINDC functions such as:

"i" = case insensitive
"t" = trim both string and substring
"v" = NOT

Here are some examples using FIND and FINDC:

 
  	a=find('ThetestString','test');  	   	*a=4 (t);
	b=find('thetestString','St',-13); 	   	*b=8 (S);	 
	c=find('Caravan ','car ','it'); 	  	*c=1 (c);
	
	a=findc('Alpha Test','abc ',-10);     	*a=6 (space);
	b=findc('Alpha Test','abc ',-10,'t');  	*b=5 (a);
	c=findc('Alpha Test','abc ','iv');     	*c=2 (l);
	
	
New CAT Functions

There are four new concatenation functions introduced in version 9. Here are examples of the new functions along with the equivalent code used in version 8.

CAT(OF X1-X4) concatenates "as is" .
Version 8 syntax: X1||X2||X3||X4

CATT(OF X1-X4) removes the trailing blanks.
Version 8 syntax: TRIM(X1)||TRIM(X2)||TRIM(X3)||TRIM(X4)

CATS(OF X1-X4) removes the leading and trailing blanks.
Version 8 syntax: TRIM(LEFT(X1))||TRIM(LEFT(X2))||TRIM(LEFT(X3))||TRIM(LEFT(X4))

CATX(SP, OF X1-X4) removes the leading and trailing blanks and inserts separators.
Version 8 syntax: TRIM(LEFT(X1))||SP||TRIM(LEFT(X2))||SP||TRIM(LEFT(X3))||SP||TRIM(LEFT(X4))

New COMP Functions

There are three new functions for comparing pairs of strings.

The COMPARE function returns the first character position at which the two strings differ. The sign is used to tell us which of the two comes first in the sorting sequence.

The COMPGED function calculates the "Generalized Edit Distance" between two strings. The GED value returned is a measure of how likely one is to be a misprint for the other as a minimum-cost sequence.

The COMPLEV function calculates the "Levenshtein Edit Distance" between two strings. This is a special case of the General Edit Distance.

There are additional optional parameters that can be added to these functions:

"i" = case insensitive
"l" = ignore leading blanks

 	   	
	d=compare('Stationary',' stationery','il');  *d= -8  (because a comes before e);
	GED=compged('baXboon', 'baboon');     *GED= 100 (deleting X);
	

New COUNT Functions

The COUNT function counts the number of times that a specific substring of characters appears within a character string that you specify.

 
	c=count("This is a thistle? Yes, this is a thistle.",'this','i');   *c=4;
	
The COUNTC function counts the number of specific characters that either appear or do not appear within a character string that you specify.
 
	c=countc('3.14159','12345','vo');   *c=2   (i.e. point and 9)	;
	
New LENGTH Functions

In version 8 the LENGTH function returns the length of a non-blank character string, excluding trailing blanks, and returns 1 for a blank character string.

Three new LENGTH functions have been introduced in version 9.

LENGTHC returns the length of a non-blank character string, including trailing blanks.

 
	len=lengthc("The string has trailing spaces    ");    *len=35;
	
LENGTHN returns the length of a non-blank character string excluding trailing blanks, and returns 0 for a blank character string.
  
	len=lengthn(" ");  *len=0;
	
LENGTHM returns the amount of memory (in bytes) allocated to the string. Here four spaces within a string reserves four bytes for the string.
  
	len=lengthm("   ");  *len=4;
	
New Mathematical & Statistical Functions

The MEDIAN function returns the median of all the non-missing arguments.

 
	y=median(5,8,0,3,4);                   *y=4;
	
The MAD function returns the Median Absolute Deviation from the median.
 
	mad=mad(2,4,1,3,5,999999);      *mad=1.5;
	
The IQR function returns the inter-quartile range of a set a values.
 
	iqr=iqr(2,4,1,3,999999);             * iqr=2 ;
	
The PCTL function calculates percentiles. In this example we calculate the 25th percentile of the other 9 parameters.
 
	n=pctl(25,1,2,3,4,5,6,7,8,9);        *n=3 ;
	
The LARGEST and SMALLEST functions select values from the desired position in a list. In this example it will return the third-largest of the other 9 parameters.
 
	n=largest(3,1,2,3,4,5,6,7,8,9);     *n=7;
	
New VARIABLE Functions The VVALUE function returns the formatted value that is associated with the variable that you specify. Within version 8 you would use the PUT function with the variable's own default format. The argument to VVALUE must be a variable.
 
	y=9999; 
	format y comma10.2; 
	v=vvalue(y); 
	v=9,999.00
	
The VVALUEX function returns the formatted value of an expression. The expression specifies any SAS character expression that evaluates to a variable name. The value of the specified expression cannot denote an array reference.
 
	date1='02feb06'd; 
	date2='date1'; 
	format date1 date9.; 
	datevalue=vvaluex(date2); 
	datevalue=02FEB2006	
	

The Power to Know
   Contact Us      Worldwide Sites     Search     Site Map     RSS Feeds     Terms of Use    Privacy Statement   Copyright © 2008 SAS Institute Inc. All Rights Reserved