www.sas.com > SAS UK > In the Know Homepage Search | Contact Us    
SAS UK Newsletter Banner SAS - The power to know(tm)  

Creating a SAS report from information on the web


There is lots of useful information on the web, but how easy is it to get hold of? Maybe there is a web page with some useful information which you would like in a report.

You can use the FILENAME, URL Access Method to read data directly from a web page. Then use data step programming to extract the information you need.
NB: The coding can get very complex, depending on the structure of the underlying HTML.

For example, SAS Institute has a web page with recent hot fixes listed on it, and you would like to get a simple report of what those are. Luckily for us, the web page concerned is well structured with html tables and tag attributes.
http://ftp.sas.com/techsup/download/hotfix/82_SBCS_whats_new.html.

We would like a 3 column report with the Date the hot fix was released, the Hot Fix number, and the product that the hot fix relates to.

Set up a fileref that points to the html page. We need to use the URL device type

filename hotfixes URL "http://ftp.sas.com/techsup/download/hotfix/82_SBCS_whats_new.html" ;
data work.recent_hotfixes ;
   infile hotfixes ;
   input ;

This creates an automatic variable called _INFILE_, that we can use without having to worry about the length of the current line of HTML

   length Date $ 20
      Hot_Fix $ 6
      Product $ 50 ;

The information we require is in HTML lines that have been given HTML TAG Attributes of CLASS=, so check to see if that HTML line contains the CLASS attributes that we require, and then extract the information we need.

HTML TAGS are usually defined within < and >, and text on the web page is not, so we can use scan and delimiters of < and > to get the text

   select ;
     when(index(_infile_,'class=ITPbluesep')) Date = scan(scan(_infile_,4,'<>'),2,':') ;
     when(index(_infile_,'class=ITPhfsep')) do ;

The hot fix number and product are always in the same columns, so can use substr to get the parts of the input record we need

              Hot_Fix = substr(scan(_infile_,3,'<>'),1,6) ;
              Product = substr(scan(_infile_,3,'<>'),12) ;
              output ;
             end ;
      otherwise delete ;
    end ;
    retain date ;
run ;
ods html file='d:\temp\Hot Fix Report.HTML' style=barrettsblue ;
ods listing close ;
proc print data=work.recent_hotfixes noobs label ;
   label hot_fix = "Hot Fix" ;
run ;
ods html close ;

NB : This code was written for SAS 8.2 under Windows, but can be amended to work in earlier releases of SAS and under other operating systems

Back