Location 1 |
Location 2 |
||||||
Variables | Variables | ||||||
w |
x |
y |
z |
w |
x |
y |
z |
|
3 | m | 5.6 | 2 | 4 | f | 10.9 |
1 | 4 | m | 4.5 | 2 | 6 | f | 12.3 |
1 | 5 | f | 5.7 | 2 | 8 | m | 13.4 |
The program below illustrates the use of the SET command to stack these two data sets which are in files stacdat1.txt and stacdat2.txt (on floppy disk), construct a plot, and write out some of the variables to a new file (newdat.txt; on floppy):
Copy stacdat1.txt and stacdat2.txt onto your floppy drive by clicking the links here, EDIT, SELECT ALL, COPY, paste into your word processor and use FILE, SAVE AS with TEXT ONLY (or ASCII DOS TEXT). To run the program on the data, click stack1.txt, use EDIT, SELECT ALL, COPY, paste this program into the SAS Program window and run it. Examine the Log window and graph window to see what happened.DM 'CLEAR LOG'; DM 'CLEAR OUTPUT'; OPTIONS LINESIZE=72 NODATE NONUMBER; DATA file1; /*read in first data set*/ INFILE 'a:\stacdat1.txt'; INPUT w x y $ z; DATA file2; /*read in second data set*/ INFILE 'a:\stacdat2.txt'; INPUT w x y $ z; DATA file3; /combine data into one*/ SET file1 file2; j= w/x; /*calculate a new variable*/ PROC PLOT DATA=file3; PLOT j*z; /*construct a plot for all data*/ DATA _NULL_; SET file3; /write out a new data set to floppy */ FILE 'a:\newdat'; PUT X 1-5 Y $ 8 Z 10 - 15 .2 J 17-25 .2; RUN;
You can exercise some control in how the data are brought together using PROC SORT and a BY statement in the DATA step which creates the combined file3. The program below illustrates what happens with and without a BY statement in the SET command. Recall that a BY statement requires that the data are sorted.
Click stack2.txt , EDIT, SELECT ALL, COPY, paste this program into the SAS Program window, and run it. Examine the Log window and the output to see what happened. Using LOCAL and RECALL TEXT, edit the program in the Program window reversing the order of file1 and file2 in the SET command to see if it makes a difference. That is,DM 'CLEAR LOG'; DM 'CLEAR OUTPUT'; OPTIONS LINESIZE=72 NODATE NONUMBER; DATA file1; INFILE 'a:\stacdat1.txt'; INPUT w x y $ z; PROC SORT DATA=file1; BY y; DATA file2; INFILE 'a:\stacdat2.txt'; INPUT w x y $ z; PROC SORT DATA=file2; BY y; DATA file3; SET file1 file2; BY y; PROC PRINT DATA=file3; TITLE 'Result of SET with a BY statement'; DATA file4; SET file1 file2; PROC PRINT DATA=file4; TITLE 'Result of SET without a BY statement'; RUN;
change SET file1 file2; to SET file2 file1;Rerun the program. Examine the Log and Output window to determine if there was any effect of reordering the files in the SET command.
Data set 1
|
Data set 2
|
Suppose that plot id and sample number uniquely identify plot and time of sampling so the data must be merged using these two variables. SAS requires that the two data sets be sorted on the merging variables so each data set must be sorted. The variable x has been measured twice on the same plot and sample, e.g., perhaps subsampling occurred, so we really have x1 in data set 1 and x2 in data set 2. The variable x must be relabeled as the two data sets are combined. The variable y only occurs in data set 2.
The following program reads the data from the two files (mergdat1.txt and mergdat2.txt) on floppy disk, sorts each, merges the files into a single file, writes out a new file which includes all the original data and the mean and standard deviation of the two values of x (x1 and x2) for each plot and sample.
Copy mergdat1.txt and mergdat2.txt onto your floppy drive by clicking the links here, EDIT, SELECT ALL, COPY, paste into your word processor and use FILE, SAVE AS with TEXT ONLY (or ASCII DOS TEXT). To run the program on the data, click merge.txt, use EDIT, SELECT ALL, COPY, paste this program into the SAS Program window and run it. Examine the Log window and 3 pages of Output window to see what happened.DM 'CLEAR LOG'; DM 'CLEAR OUTPUT'; OPTIONS LINESIZE=72 NODATE NONUMBER; TITLE1 'Merging data sets'; DATA file1; INFILE 'a:\mergdat1.txt'; INPUT x plot sample; PROC PRINT DATA = file1; Title2 'mergdat1 data'; PROC SORT DATA=file1; BY plot sample; DATA file2; INFILE 'a:\mergdat2.txt'; INPUT y plot sample x; PROC PRINT DATA = file2; Title2 'mergdat2 data'; PROC SORT DATA=file2; BY plot sample; DATA file3; FILE 'a:\newfile.dat'; MERGE file1 (RENAME=(x=x1)) file2 (RENAME=(x=x2)); BY plot sample; MEANX=MEAN(x1,x2); STDX=STD(x1,x2); PUT plot sample x1 x2 meanx stdx y; PROC PRINT DATA =file3; TITLE2 'complete merged data'; RUN;
1. The file sub1.txt contains the pulse data, variables gender, pulse1, pulse2, and weight for male non smokers. The file sub2.txt contains the pulse data variables gender, pulse1 pulse2, and weight for female nonsmokers. Pulse1 has been set to missing if pulse1 is less than 60 in these two data sets. Copy these two data sets onto your floppy disk, examine them, and write a SAS program to do the following:
a. Read in the two data sets, combine them, and order them with respect to pulse1 (see section 6.3 of your text). Use PROC PRINT to get a listing of the combined file.In your homework solution include a copy of the two original data sets, the documented program, the PROC print output of the combined data, and a copy of the combined file which was written out to floppy disk.
b. Using the combined data set, construct a single plot of pulse1 (vertical axis) versus weight (horizontal axis) which is labeled by gender (see section 5.12 in your text).
c. Write out a new file to floppy disk which contains the combined data set.
2. The two files earlyhw.txt and latehw.txt describe homework scores for a collection of students before (3 homework scores) and after the midterm (4 homework scores) in a statistics course. Students are identified by the last four digits of their social security number in each file. Copy these two data sets onto your floppy disk, examine them, and write a SAS program to do the following:
a. Read in the two data sets, merge them based on the last four digits of the social security number,In your homework solution include a copy of the two original data sets, the documented program, the PROC print output of the combined data, and a copy of the combined file which was written out to floppy disk.
b. Determine the total homework score for each student using SUM(x1, x2, ...) as MEAN(x1,x2) was used in merge.txt to determine the average in this lesson.
c. Use PROC PRINT to output the combined file with the total for each student
d. Use PROC MEANS to output the average score for each of the 7 homework assignments, and
e. Write out the combined data set, including the total score for each student, to a new file on floppy.