SAS Support
Self Service Support
Assisted Support
Premium Support
Infrastructure Support
Logging a Call
Service Levels
Hints and Tips
 

IMPORT CSV FILES AND CREATE SAS DATA SETS ON THE FLY

(Written by Mohammad Abbas)

SAS users and programmers are familiar with the traditional way in which SAS reads external files such as CSV or TXT files using Filename and Infile statement. For example:

filename oscar "H:\ACCOUNTS\People\Mohammad\stat_data\Barley.csv";


data Barley;

infile oscar;

input yield variety $ year site $;

run;

As can be seen this method of reading external files works well, however on an industrial scale, and in high pressure working environments where there is a need for speed, the traditional approach becomes less manageable when handling large number of CSV files that are not of fixed data structure. For example

1. If the user wants to read multiple CSV files and to create multiple SAS data sets simultaneously.
2. If the CSV files to be read are regularly changeable both in variable names, formats and data values.

The Macro, I am submitting, works well both in terms practicality and selectivity. It offers the user the chance to select which data sets to be imported. It also offers the user the chance to generate summary statistics by customizing the macro to include one or more of SAS Procedures.

Here, I describe the practical aspects of using the Macro.

1. Create "what I called Datalog" which is essentially an excel sheet saved as in CSV format containing the Name and Path of the CSV files to be imported into SAS.

Datalog.csv

Name Path
Barley
Bloodpressure
C:\stat_data\Barley.csv
C:\stat_data\Bloodpressure.csv

2. The body of the macro is divided into 3 key sections

Section 1:

Defines macro variables for Datalog, file path and libref. In this case I created 2 file paths to add flexibility to using the macro for example at home or at work.


options symbolgen mlogic nomprint;  

/*Leave options on as SAS will issue a message when Do While Loop is false*/

%let dsname = Datalog;                                                   

%let path1 = C:\stat_data\&&dsname..csv;       

%let path2 = H:\accounts\people\mohammad\stat_data\&&dsname..csv;   

%let libto = Stat;

Section 2:

In this section, 3 positional parameters are defined (n, x, index) n: is a counting macro variable representing the lower limit of the number of CSV files to be read into SAS.
x: is a macro variable to switch between path1 or path2 (e.g. C:\ or H:\ ).
index: is a variable that can be thought of as observation number( row number) representing the upper limit of the number of CSV files to be read into SAS.


%Macro get_data(n,x,index);       

/*Check if datalog exists in dir, if it exists then import it from Excel and create SAS data set called Datalog*/

%IF &n <= &index %then %Do;

%let foundit = %sysfunc(exist(&&path&x..&dsname));



proc import datafile= "&&path&x"

     out=&libto..&dsname

     dbms=dlm

     replace;

     delimiter= ",";              

     getnames=yes;

run;



Title "Comma Delimited Files Listing In &dsname as of &sysdate";



proc print data=&libto..&dsname;

run;



Title " ";

Section 3:

Here, the macro uses the Do While loop to iterate through the observations of datalog. As long as &n is less than or equal to &index, (i.e. the Do While condition is true) the macro will extract the name, path and index of each CSV dataset within the bounds of the positional parameters &n and &index in the macro call %get_data(n,x,index).

This part of the macro used DO While to loop through the name, path of each CSV file and extracts the data sets specified by the upper limit index and the lower limit n.


%DO %WHILE (&n <=&index);



data _NULL_;

set &libto..&dsname end=last;

if _N_ =&n then Do;

call symput('path',Path);                /*  Get file path from Datalog*/

call symput('dsn',Name);	   /* Get dataset name from Datalog*/						

if not last  then index+1;                 /* Create macro variable Index */

else if last then call symput ('index',index);  

End;

run;



proc import datafile= "&path"             

     out= &libto..&dsn

     dbms=dlm

     replace;

     delimiter=","  ;            

     getnames=yes;

run;



Title "Data set &dsn";



proc print data=&libto..&dsn;

run;



Title " ";



%let n=%eval(&n+1);             

%let log1 = C:\log\statlog.txt;                 /* output the log to log1*/

%let log2 = H:\..\..\..\statlog.txt;              /* output the log to log2*/



proc printto  log="&&log&x";

run;



%End;

%End;

Two additional points to mention here:

1. As you can see the details of the log is written to a local folder. This is for clarity and ease of viewing of Note, Warnings and Error Messages. I would like to add that if SAS stops executing the macro for whatever reason, the error messages will be written to the external log txt file.

2. To view the CSV files in HTML Format as they are uploaded into SAS, Go to Tools-Options -Preferences-Select the Results Tab
Under Listing Checkbox Deselect Create Listing
Under HTML checkbox select Create HTML
Point 2 can also be achieved using ODS HTML to output HTML viewer to a local folder or to a web browser.


ODS LISTING CLOSE;

ODS HTML FILE = "C:\sas_output\&dsn.html";



proc print data=&libto..&dsn;

run;

Section 4:

In this section, the data step here is for checking if &n gt &index. If it is then an Error Message is written to the log to notify the user that the macro has stopped executing.


Data _NULL_;

set &libto..&dsname;

%IF &n gt &Index %then %Put ERROR Ending execution: Check values of n and Index (n must be less than or equal to index);

run;



%Mend get_data;



/* Macro Call*/

%get_data(1,2,2);

To Conclude:

I believe this macro is simple, efficient and it offers the following advantages

  • Speeds up input of TXT or CSV files
  • Central repository for all CSV files. It requires very little effort in maintaining the listing of CSV file names and paths. It also prevents the datasets from being spread in various folders and hence minimizing clutter and loss of data files.
  • Although this macro is ran locally, it non-the-less, remained very efficient when reading CSV files to a SAS library on the UNIX SERVER. It took approximately 4.5 minutes to read 67 data sets ranging from 10 observations to 32000 when the server was relatively busy.

Did you find this page useful?

If you have any comments or questions, feel free to contact us.




0845 402 9907