IMPORT CSV FILES AND CREATE SAS DATA SETS ON THE FLY
(Written by Mohammad Abbas)
SAS users and programmers are familiar with the traditional way in which SAS reads external files such as CSV or TXT files using Filename and Infile statement. For example:
filename oscar "H:\ACCOUNTS\People\Mohammad\stat_data\Barley.csv";
data Barley; infile oscar; input yield variety $ year site $; run;
As can be seen this method of reading external files works well, however on an industrial scale, and in high pressure working environments where there is a need for speed, the traditional approach becomes less manageable when handling large number of CSV files that are not of fixed data structure. For example
1. If the user wants to read multiple CSV files and to create multiple SAS data sets simultaneously.
The Macro, I am submitting, works well both in terms practicality and selectivity. It offers the user the chance to select which data sets to be imported. It also offers the user the chance to generate summary statistics by customizing the macro to include one or more of SAS Procedures.
Here, I describe the practical aspects of using the Macro.
1. Create "what I called Datalog" which is essentially an excel sheet saved as in CSV format containing
the Name and Path of the CSV files to be imported into SAS.
2. The body of the macro is divided into 3 key sections
Defines macro variables for Datalog, file path and libref. In this case I created 2 file paths to add flexibility to using the macro for example at home or at work.
options symbolgen mlogic nomprint; /*Leave options on as SAS will issue a message when Do While Loop is false*/ %let dsname = Datalog; %let path1 = C:\stat_data\&&dsname..csv; %let path2 = H:\accounts\people\mohammad\stat_data\&&dsname..csv; %let libto = Stat;
In this section, 3 positional parameters are defined (n, x, index)
n: is a counting macro variable representing the lower limit of the number of CSV
files to be read into SAS.
%Macro get_data(n,x,index); /*Check if datalog exists in dir, if it exists then import it from Excel and create SAS data set called Datalog*/ %IF &n <= &index %then %Do; %let foundit = %sysfunc(exist(&&path&x..&dsname)); proc import datafile= "&&path&x" out=&libto..&dsname dbms=dlm replace; delimiter= ","; getnames=yes; run; Title "Comma Delimited Files Listing In &dsname as of &sysdate"; proc print data=&libto..&dsname; run; Title " ";
Here, the macro uses the Do While loop to iterate through the observations of datalog. As long as &n is less than or equal to &index, (i.e. the Do While condition is true) the macro will extract the name, path and index of each CSV dataset within the bounds of the positional parameters &n and &index in the macro call %get_data(n,x,index).
This part of the macro used DO While to loop through the name, path of each CSV file and extracts the data sets specified by the upper limit index and the lower limit n.
%DO %WHILE (&n <=&index); data _NULL_; set &libto..&dsname end=last; if _N_ =&n then Do; call symput('path',Path); /* Get file path from Datalog*/ call symput('dsn',Name); /* Get dataset name from Datalog*/ if not last then index+1; /* Create macro variable Index */ else if last then call symput ('index',index); End; run; proc import datafile= "&path" out= &libto..&dsn dbms=dlm replace; delimiter="," ; getnames=yes; run; Title "Data set &dsn"; proc print data=&libto..&dsn; run; Title " "; %let n=%eval(&n+1); %let log1 = C:\log\statlog.txt; /* output the log to log1*/ %let log2 = H:\..\..\..\statlog.txt; /* output the log to log2*/ proc printto log="&&log&x"; run; %End; %End;
Two additional points to mention here:
1. As you can see the details of the log is written to a local folder. This is for clarity and ease of viewing of Note, Warnings and Error Messages. I would like to add that if SAS stops executing the macro for whatever reason, the error messages will be written to the external log txt file.
2. To view the CSV files in HTML Format as they are uploaded into SAS, Go to Tools-Options
-Preferences-Select the Results Tab
ODS LISTING CLOSE; ODS HTML FILE = "C:\sas_output\&dsn.html"; proc print data=&libto..&dsn; run;
In this section, the data step here is for checking if &n gt &index. If it is then an Error Message is written to the log to notify the user that the macro has stopped executing.
Data _NULL_; set &libto..&dsname; %IF &n gt &Index %then %Put ERROR Ending execution: Check values of n and Index (n must be less than or equal to index); run; %Mend get_data; /* Macro Call*/ %get_data(1,2,2);
I believe this macro is simple, efficient and it offers the following advantages
Did you find this page useful?
If you have any comments or questions, feel free to contact us.