SAS Institute. The Power to Know

Learning Center

Question from the Field

Question: Why do I get unexpected results when I use the following program to read embedded data following the CARDS (or DATALINES) statement:
data retirees2;
   infile cards dsd dlm=' ';
   input empid $ contrib @@;
cards;
E00973 1400 E00192  E00543 1500
E00123 4500 
E00444 123
;
proc print data=retirees2;
run;
(note the extra space delimiter in the first record between the value E00192 and E00543)

Answer: Embedded data following a CARDS statement (or DATALINES statement) do not have end-of-line characters. A data step that reads embedded data uses an input buffer with a length of 80 bytes (80 characters). When a data line is loaded into the input buffer it is padded with blanks to the buffer length of 80 characters. The DLM=' ' and DSD options on the INFILE statement specify the blank as the delimiter and that two consecutive delimiters enclose a missing value. This is correct for reading the data lines up to where the padding occurs. Since the padding consists of blanks and the blank is the specified delimiter, this data step will assume that each set of two consecutive blanks enclose a missing value. This produces unwanted observations with missing values in the output. Note that the double trailing @ on the INPUT statement holds a data line in the input buffer for consecutive iterations of the data step until the end of record at 80 bytes is reached. At that point the data step will load the next record and repeat the process.

Possible solutions:
  1. Use a different delimiter, for example the comma.
    data retirees2;
       infile cards dsd dlm=',';
       input empid $ contrib @@;
    cards;
    E00973,1400,E00192,,E00543,1500
    E00123,4500 
    E00444,123
    ;
    proc print data=retirees2;
    run;
    
  2. Read the data from an external file. Text files created on Windows and Unix use a variable record format (recfm=v) by default with an end-of-line character to mark the end of each data line. When a data line is loaded into the input buffer, the end-of-line character is included. When the data step reaches the end-of-line character it loads the next data line.
    data retirees2;
       infile 'retirees.dat' dsd dlm=' ';
       input empid $ contrib @@;
    run;
    proc print data=retirees2;
    run;
    


    View question archives.

    These sample files and code examples are provided by SAS Institute Inc. "as is" without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Recipients acknowledge and agree that SAS Institute shall not be liable for any damages whatsoever arising out of their use of this material. In addition, SAS Institute will provide no support for the materials contained herein.