Lesson 3

Creating and reading SAS system files

SAS data sets are named in two parts with the parts separated by a period.  An example data set name is work.file1.   The first part of the name is called the SAS data library reference (libref for short).  The second part is the file name that uniquely identifies the file in the library.  SAS uses work.filename to indicate the temporary library work and uses the filename you give to refer to the specific file. The file work.file1 is created, for example, when the following data step is used:
DATA   file1;
Previous lessons have used only temporary files; these temporary files have been noted in the log window when you ran programs.   In this lesson you will create and use permanent SAS data sets.  The advantage of permanent SAS files is that you do not have to name or label variables or read the data each time you want to run a procedure.  To create a permanent SAS data set you must create a library reference.

The data set pulsedat.txt will be used for this assignment and for several future assignments.  Click the above link to this data set.  Use Edit, SELECT ALL, and COPY this data set to your word processor and save the file pulsedat.txt (please use this name) on diskette (drive a) in TEXT ONLY (or DOS TEXT) format for future use.  If you placed this file in a directory on the diskette, please note the name of the directory where you stored the file.

The pulse.txt data set contains 88 observations on 8 variables which are described below:
 

Variable Description
1 First pulse rate
2 Second pulse rate
3 1 = subject ran in place 
2 = subject did not run
4 1 = subject is a smoker 
2 = subject is not a smoker
5 1 = male 
2 = female
6 Height in inches
7 Weight in pounds
8 Physical activity level 
1 = slight or none 
2 = moderate 
3 = lots of activity

 We will start by making a simple permanent SAS data set with the pulse data, then refine it.  The initial permanent SAS data set, named LIBRARY.p1, is formed and stored on the floppy in drive a using the following program:

DM 'CLEAR LOG';
DM 'CLEAR OUTPUT';
OPTIONS LINESIZE=72 NODATE NONUMBER;
LIBNAME LIBRARY 'a:\';
DATA    LIBRARY.p1;
        INFILE 'a:\pulsedat.txt';
        INPUT pulse1 pulse2 ran smoker gender
              height weight activity;
RUN;
Click the link save1.txt , click EDIT, SELECT ALL, COPY, and paste this program into the SAS Program window.

Run the program and read the Log window.  Look at the OUTPUT window and notice that nothing is output.  A SAS procedure, e.g., PROC TTEST, could have been included in this program and would have generated output, but this is not required to create a permanent SAS data set.  Using your word processor use FILE & OPEN with ALL FILE TYPES to examine the contents of the location where you stored the library (a:\ ).  You should see a file called p1.  If you open this file in your word processor it will be a jumble of special characters.  Do not make any changes to this file.  SAS permanent data sets are stored in a machine readable form for fast processing.

The following program retrieves and uses the permanent SAS data set  LIBRARY.p1:

DM 'CLEAR LOG';
DM 'CLEAR OUTPUT';
OPTIONS LINESIZE=72 NODATE NONUMBER;
LIBNAME LIBRARY 'a:\';
PROC TABULATE DATA=LIBRARY.p1;
     CLASS gender ran;
     TABLE gender*ran;
RUN;
Click retrieve1.txt , click EDIT, SELECT ALL, then COPY.  Paste this program into the SAS Program window.

Run the program and check the Log and Output windows.  The PROC TABULATE command creates a table showing the number of runners and nonrunners for each gender.

A more refined (informative and useful) version of a permanent SAS data set for the pulse data can be created by adding variable labels and value labels.  The following program reads the original pulsedat.txt file, provides a variable label for the pulse1 variable and value labels for the variables ran and gender.  The program also creates and stores a new permanent SAS data set called LIBRARY.p2.

DM 'CLEAR LOG';
DM 'CLEAR OUTPUT';
OPTIONS LINESIZE=72 NODATE NONUMBER;
LIBNAME LIBRARY 'a:\';
PROC FORMAT LIB=LIBRARY;
            VALUE SEXFORM 1='Male'
                          2='Female';
            VALUE RANFORM 1='Ran in place'
                          2='Did not run';
DATA    LIBRARY.p2;
        INFILE 'a:\pulsedat.txt';
        INPUT pulse1 pulse2 ran smoker gender
              height weight activity;
        LABEL  pulse1 = 'Resting pulse rate';
        FORMAT ran ranform. gender sexform.;
RUN;
Click save2.txt , EDIT, SELECT ALL, COPY, and paste this program into the SAS program window.  Run the program, check the Log window.  The output window will be empty.

Now click retrieve2.txt , EDIT, SELECT ALL, COPY, and paste this program into the SAS program window.  Run the program.  This program retrieves the new LIBRARY.p2 file and runs the same PROC TABULATE command as before.  Examine the output.  You should notice some important changes.  The Label command is not used by this output but will be in the homework.

HOMEWORK #3

Read about PROC FORMAT in section 5.7 of the text.  You may also want to review sections 3.11, 3.12, 3.13, on listing the contents of a SAS data set, temporary versus permanent SAS data sets and using the LIBNAME statement.

1.  Create a complete permanent SAS data set called LIBRARY.pulse which provides

a) value labels for gender, smoker, ran, and activity,
b) variable labels for pulse1, pulse2, height, and weight, and
c) creates a new variable bulk=weight/height.
2. Retrieve LIBRARY.pulse and include the following two procedures
 
PROC CONTENTS DATA=LIBRARY.pulse;
PROC TABULATE DATA=LIBRARY.pulse;
            CLASS smoker;
            VAR pulse1;
            TABLE smoker*pulse1*(MEAN N);

Explain what the PROC TABULATE command did.

If part 1 above was done correctly, the results of PROC CONTENTS should look like the screen below.  Your variable descriptions may be different but you should have formating codes like ACTFORM, RUNFORM, SMOFORM, SEXFORM for all four variables activity, ran, smoke, and gender.