www.sas.com > SAS UK > In the Know Homepage Search | Contact Us    
SAS UK Newsletter Banner SAS - The power to know(tm)  

A Technical Look at What's New in SASŪ9 - Base SAS Language

New and enhanced features in Base SAS save you time, effort, and system resources by providing faster processing and easier data access and management, more robust analysis, and improved data presentation.

In addition to the existing SAS regular expressions (RX), SAS 9 has introduced the ability to use Perl regular expression (PRX) functions and CALL routines to also manipulate character values.

Perl is a scripting programming language that is similar to C in syntax and includes a number of popular Unix functions. It is most commonly used for developing CGI programs because of its excellent text manipulation facilities.

Pattern matching enables searching and extracting multiple patterns from a character string as well as to make several substitutions in a string in one step.

SAS uses a modified version of Perl as a pattern matching language to parse character strings. With the new Perl functions you can:

  • search for a pattern of characters within a string
  • extract a substring from a string
  • search and replace text with other text
  • parse large amounts of text, such as Web logs or other text data, more quickly than with existing SAS regular expressions.
Here is an example of using the PRXPARSE and PRXMATCH Perl regular expressions for pattern-matching character values to select valid Reading based postcodes.


DATA Postcode_screening (drop=RE POSITION);
   IF _N_ = 1 THEN DO ;
      RE = PRXPARSE("/\RG|rg|Rg|rG(1|5|4)\ ?\w\w\w/");
         IF MISSING(RE) THEN DO;
         PUT "ERROR IN COMPILING REGULAR EXPRESSION";
         STOP;
      END;
   END;
   RETAIN RE;
   INPUT POSTCODE $CHAR8.;
   POSITION = PRXMATCH(RE,POSTCODE);
   IF POSITION GT 0 THEN OUTPUT;
DATALINES;
RG30 3xy
Rg10 7uj
tg19 078
Rg15np
DG13 2wd
;
RUN;
 
PROC PRINT DATA=Postcode_screening NOOBS;
   TITLE "List of Reading based postcodes only";
RUN;

Here is a list of the new PRX functions and Call routines:

PRXMATCH Searches for a pattern match and returns the position at which the pattern is found.
PRXPAREN Returns the last bracket match for which there is a match in a pattern.
PRXPARSE Compiles a Perl regular expression that can be used for pattern-matching a character value.
CALL PRXCHANGE Performs a pattern-matching substitution.
CALL PRXDEBUG Enables Perl regular expressions in a DATA step to send debug output to the SAS log.
CALL PRXFREE Frees unneeded memory that was allocated for a Perl regular expression.
CALL PRXNEXT Returns the position and length of a substring that matches a pattern and iterates over multiple matches within one string.
CALL PRXPOSN Returns the start position and length for a capture buffer.
CALL PRXSUBSTR Returns the position and length of a substring that matches a pattern.

Further documentation for SAS 9 can be found on the SAS Products and Solutions website at:
http://support.sas.com/documentation/onlinedoc/index.html

For more comprehensive examples on using Perl Regular expressions, please search the SAS OnlineDoc for:

SAS 9 Language Reference: Concepts ' Pattern Matching Using SAS Regular Expressions (RX) and Perl Regular Expressions (PRX).

Note: The above code has been tested using SAS 9.1.2 on the Windows operating system.