FDA Standards for Electronic Submissions
|Q.||There are two SAS transport file formats. Which one is the FDA prepared to use?|
|A.||FDA can accept data in the SAS XPORT Transport Format that is processed by the XPORT engine in Version 6 of SAS software and later, and by PROC XCOPY in Version 5.|
|Q.||What are the two SAS transport formats?|
|A.||The XPORT Transport Format selected by the FDA, and the CPORT Transport Format. Both XPORT and CPORT are established mechanisms for data exchange that are well tested and well documented. They are not new or at-risk technology. The XPORT Transport Format is supported on all platforms and releases of the SAS System (it is machine and release independent) from Version 5 on. The CPORT Transport Format was invented in Version 6 and is supported from Version 6 on.|
|Q.||Why did FDA choose the XPORT Transport Format over CPORT Transport Format?|
|A.||XPORT is an open format, while CPORT is a proprietary format.|
|Q.||What do you mean, the XPORT format is "open?"|
|A.||Specifications for the XPORT transport format are in the public domain. Data can be translated to and from the XPORT transport format to other commonly used formats without the use of programs from SAS Institute or any specific vendor.|
|Q.||Where are specifications for the XPORT transport format published?|
|A.||At support.sas.com, under Technical Support, Technical Document TS-140.|
|Q.||Why does FDA want an open format?|
|A.||By US law, the FDA must remain "vendor neutral." The FDA cannot endorse or require use of any specific vendor's product.|
|Q.||What is the XPORT transport format, generally?|
|A.||It is a text file, with record length = 80 columns. It looks and feels so much like a text file that it is a good idea to avoid using ".txt" as a file name extension so that the operating system won't treat it as a text file.|
|Q.||Are there any naming conventions for transport files?|
|A.||Beginning with Version 6, the process of installing SAS on a PC automatically registers 2 filename extensions to MS Windows. This means PCs on which SAS software has been installed will recognize a file named *.stx as a SAS System Xport Transport File, and a file named *.stc as a SAS System Cport Transport File.|
|Q.||Does ".xpt" work for an XPORT filename extension?|
|A.||This is a popular extension that works fine with the SAS System and is supported by JMP, the SAS System Viewer, and soon will be supported by the UODBC driver.|
|Q.||How does the XPORT transport format store numeric data?|
|A.||The canonical transport format for floating point numbers in XPORT format is the IBM mainframe machine double precision format.|
|Q.||Isn't the IBM mainframe format antiquated? Why would FDA want to use an old format?|
|A.||The numeric representation of data in a data interchange format is the "lowest common denominator" for transferring data across different hardware and operating systems. For many years, people have been using the "lowest common denominator" provided by the XPORT transport format because it provides a reliable method for data interchange.|
|Q.||Do variable lengths remain stable in the XPORT transport format?|
|A.||Variables that are two bytes long get converted to three bytes on operating systems that have a minimum variable length of three bytes (OpenVMS and some others). While this will not present a problem with browsing the files, it could cause a SAS program error if program logic is based on two bytes. This will be a problem only if a sponsor submits SAS programs with the data - which is likely. We would like to suggest that three bytes be used as a minimum variable length. This assures portability across operating systems.|
|Q.||CPORT has been updated in Version 7 to take advantage of the new V7 data set features. Will FDA adopt the CPORT transport format as a new standard for data archival in order to take advantage of the new features?|
|A.||No, because the CPORT format will not be a public format. There are security issues with password protected data sets. It is not clear how to maintain the security that passwords provide and still document the format of the file.|
|Q.||Does Version 7 SAS software maintain compatibility with the XPORT transport format?|
|A.||Yes, SAS software maintains compatibility with the XPORT transport format in Version 7.|
|Q.||SAS users may want to use Version 7 features like long variable names. As computer systems evolve, new standards for numeric representation will continue to emerge. Will the FDA update their standard for data archival as technology advances?|
|A.||If the FDA would like to create an updated transport format SAS Institute will be willing to work with the FDA to develop a new standard. Updating the transport format could allow for new functionality such as long variable names. The new standard could involve a different format for numeric representation (or "lowest common denominator") such as IEEE numeric representation.|
|Q.||Does the XPORT transport format have any year 2000 problems?|
There are no year 2000 problems so far as the data itself. The header record contains only 2 spaces for the year, and that can be a cosmetic problem.
The layout of the header record for a transport library dates back to version 5. The version 5 format has 2 date fields in it (date time created and date time modified). The date time modified is ignored by the system (since you cannot update a transport library). The date time created is displaced by utilities (like PROC CONTENTS).
Since no transport file could have been created before 1980, Version 7 will assume that any 2 digit year between 80 and 99 has a century of 19 and any date between 00 and 79 will have a century of 20. This assures that version 7 will always display the correct century for the date time created on a transport file.
Version 6 does have a cosmetic year 2000 problem reading transport files. In version 6, the value of the yearcutoff option will determine how the version 6 system displays the date time created from a transport file. This bug will not prevent the V6 system from processing the file. It could mean, however, that reports (like PROC CONTENTS) could show that the file was created in the wrong century.
This file format does not have a year 2000 problem until the year 2080.
|Q.||Does the XPORT transport format retain variable labels?|
|Q.||Does the XPORT transport format retain variable formats?|
|A.||Formats of individual variables are retained in the XPORT transport format.|
|Q.||What about custom formats that are not provided with SAS software?|
Assigning a format to a variable associates text or numbers with variable values. As an example, the variable GENDER could have two possible values, 1 or 2. A format named GENDRFMT can be associated with GENDER so that the text "Male" appears on the computer screen or on paper when GENDER = 1 and "Female" appears when GENDER=2. Custom formats like GENDRFMT can be created and stored in a format catalog.
It is important to be able to archive custom variable formats along with the data. It is possible to do this with the XPORT transport format. The format catalog must be saved as a SAS data set, and the resulting data set can be saved in XPORT transport format along with corresponding data sets. When data sets are extracted from transport format, the data set containing the custom formats can be saved as a format catalog. Thus, variable formats can be archived and retrieved along with the corresponding data.
|Q.||Is it difficult to detect the end of file when reading a data set stored in XPORT transport format?|
Under certain circumstances it is difficult to detect the end of file when reading a data set in XPORT transport format. Below is an explanation and a suggested fix that will alleviate the problem.
The problem occurs when the record length of a file is less than 80 characters. The transport format writes 80 byte records. The last observation may not fill up an 80 byte record. When this happens the SAS System will pad the 80 byte record with blanks. When reading a transport file, the SAS System treats trailing blanks as insignificant. If the records are shorter than 80 bytes and if the last record written contains nothing but blank characters then the file comes up one record short.
The key to avoiding this potential problem is to undertake preventative measures described below.
One preventative measure could be to make sure that the circumstances do not exist. Avoid creating a data set where:
Another preventative measure could be to save original data in transport format, extract new data from the transport file, then run PROC COMPARE to compare the original data to the new data. Output from PROC COMPARE can validate that the new data extracted from the transport file is identical to the original data.
|Q.||Are there rounding problems when using the XPORT transport format to move data across hardware platforms or operating systems?|
Numeric representation is an issue in any computer application because of hardware limitations. Any particular hardware configuration can only represent a finite amount of numbers. The real number system is infinite and there is no way to represent each one of the real numbers in a unique way with currently available hardware.
It is obvious that extremely large and extremely small numbers cannot be represented with a finite amount of numbers. It is less obvious why representing fractions can be a problem. To illustrate the problem with fractions, consider the following example. The fraction 2/3 cannot be represented in our base 10 number system as a decimal fraction with a finite amount of numbers. We write 2/3 as 0.666..., where "..." represents an infinite series of sixes after the decimal. To represent 2/3 as a decimal fraction with a finite amount of numbers, rounding or truncation must occur. That is the nature of a problem that occurs in representing fractions on computers.
The SAS System and most computers use floating-point representation to store numeric values. All platforms on which the SAS System runs use 8 bytes for floating-point numbers.
Floating-point representation is an implementation of what is generally known as scientific notation, in which values are represented as numbers between 0 and 1 times a power of 10. Using scientific notation, the basic parts of a floating-point number can be identified as the mantissa or fraction portion, the base, and the exponent.
Different computers can have different specifications for floating-point representation. Table 1 summarizes various representations of floating-point numbers that are stored in 8 bytes. The canonical transport format for floating point numbers in SAS transport format is the IBM mainframe machine double precision format.
Differences in specifications for floating-point representation may cause a loss of precision or magnitude/range when data are moved from one platform to another. The more bits that are reserved for the mantissa, the more precise the number, and the more bits that are reserved for the exponent, the more magnitude the number can have.
As an example of differences in precision, the IBM representation reserves more bits for the mantissa than the IEEE, and this provides the IBM format more precision than the IEEE format.
Differences in floating-point representation on different platforms may be an issue, but should not be a problem when the differences are known and accounted for in software applications. The XPORT transport format is published, and the file specifications are readily available on SAS Institute's external web. Knowing the format for floating point numbers in SAS transport format makes it possible to allow for and minimize the effect of differences in floating-point representations.