Lesson 9

Analysis of Variance

SAS procedures useful for experimental design and analysis of variance (anova) include the following:

PROC ANOVA (balanced data for two-way or higher)
PROC GLM (general)
PROC NESTED (hierarchical, random effects models)
PROC NPAR1WAY (Kruskal-Wallis nonparametric method)
PROC PLAN (randomizing for simple and complex experiments)
PROC VARCOMP (variance component models and estimation)

Consistent with the prerequisities for this course, only one-way analysis of variance will be addressed in the homework. However, in this lesson, examples for analysis of variance for a variety of experimental designs are included for your future reference.

One-way Analysis of Variance

PROC ANOVA can be used for balanced or unbalanced one-way analysis of variance. Balanced data occurs when each group has the same number of observations. The program below completes a one-way analysis of variance for three groups with five observations each and outputs the sample size, mean, and standard deviation for each group.

DM 'CLEAR LOG';
DM 'CLEAR OUTPUT';
OPTIONS LINESIZE=72 NODATE NONUMBER;
DATA file1;
     INPUT group $ y @@;
     CARDS;
A  70  A  75  A  73  A  64  A  81  B  47  B  45
B  50  B  56  B  58  C  51  C  56  C  47  C  61 C  62
;
PROC ANOVA DATA=file1;
     TITLE1 'One-way ANOVA for balanced data with 3 groups';
     TITLE2 '5 observations per group';
     CLASS group;
     MODEL y = group;
     MEANS group;
RUN;

Click anova1.txt, EDIT, SELECT ALL, COPY, paste this program into the SAS Program window and run it. Examine the Log and Output window. You should be able to read the Output to determine if the group means for this data are significantly different at the 5% level and determine the p-value of the F-test.

Multiple comparison tests, useful for determining where significant differences occur between pairs of groups, can be conducted in the PROC ANOVA. The Bonferroni and Tukey multiple comparison procedures are two of the options, among many, available in SAS. The Tukey (honestly significant differences) technique is conducted in PROC ANOVA by replacing the MEANS group; line above with the following line which also specifies a significance level of 5% for the multiple comparisons:

MEANS group / TUKEY ALPHA=.05;

In the SAS Program window use LOCAL and RECALL TEXT to reload the previous program. Edit the program by changing the MEANS command to request the Tukey multiple comparison procedure and rerun the program. Examine the Log and Output windows.

If you would like to test PROC ANOVA on unbalanced data in the one-way anova setting, edit the data lines to read as follows;

A  70  A  75  A  73  B  47  B  45
B  50  B  56  C  51  C  56  C  47  C  61 C  62

This deletes the last two group A observations, the last group B observation, and leaves the C observations as they were. If you would like to keep a copy of this run for your portfolio, you should also edit the TITLE line to indicate the unbalanced data. Rerun the program to examine the results.

Using SAS to randomize in experiments

PROC PLAN is useful for randomly assigning experimental units, e.g., plots, animals, or human subjects, to treatments. The examples below illustrate how to randomize for completely randomized designs and for a randomized block design.

Completely randomized design - Randomly assigning 12 subjects to 3 treatments (4 replications each):

PROC PLAN;
     FACTORS UNIT=12;
     TREATMENTS trmt=12 cyclic (111122223333);
     output out=b;
PROC SORT;
     BY UNIT;
PROC PRINT;
RUN;

The output includes the following assignment of treatments (TRMT) to experimental units (UNIT):

                          OBS    UNIT    TRMT
                            1      1       1
                            2      2       1
                            3      3       3
                            4      4       3
                            5      5       1
                            6      6       2
                            7      7       1
                            8      8       2
                            9      9       3
                           10     10       2
                           11     11       3
                           12     12       2

Randomized Block Design - If you have not studied experimental design or have never heard of a randomized block design, you may scroll down to the homework. The following commands randomly assign six plots in each of four blocks to six treatments.

PROC PLAN;
     FACTORS block=4 ordered plots=6 ordered;
     TREATMENTS trmt=6;
     OUTPUT=C;

The output includes the following table showing the random assignment:

   BLOCK [ PLOTS TRMT ]
-------- -----+-----+-----+-----+-----+-----+

       1  [1 3] [2 2] [3 4] [4 1] [5 5] [6 6]

       2  [1 1] [2 2] [3 5] [4 3] [5 6] [6 4]

       3  [1 2] [2 6] [3 3] [4 5] [5 4] [6 1]

       4  [1 1] [2 4] [3 2] [4 3] [5 5] [6 6]

Two-way Analysis of Variance

PROC ANOVA can be used for balanced data but not for unbalanced data in the two-way analysis of variance setting. PROC GLM will handle both balanced and unbalanced data in the two-way setting.

Two data sets are shown below; one balanced (L9dat1.txt) and one unbalanced (L9dat2.txt) for a two-way anova setting.

Balanced data (L9dat1.txt)

Treatment Group

Gender A B C

Male 70
85
64 48
50
58 50
55
61

Female 90
88
78 59
54
57 52
66
64

Unbalanced data (L9dat2.txt)

Treatment Group

Gender A B C

Male 70
85 48
50
58
56 50
55
61

Female 90
88
78
84 59
54
57 52
64

For the analyses illustrated below the factors Gender and Treatment Group are considered fixed (not random). SAS will handle random and mixed analysis of variance but this is not illustrated in this example.

The program below completes a two-way analysis of variance for the balanced data using PROC ANOVA:

DM 'CLEAR LOG';
DM 'CLEAR OUTPUT';
OPTIONS LINESIZE=72 NODATE NONUMBER;
DATA file1;
     INFILE 'a:\L9dat1.txt';
     INPUT gender $ group $ y;
PROC ANOVA DATA=file1;
     TITLE 'Two-way ANOVA (2 x 3) for balanced data';
     CLASS gender group;
     MODEL y = gender | group; /* the | symbol causes SAS to include */
     MEANS gender | group;     /* the interaction term gender*group  */
RUN;

If you do not have the background for two-way analysis of variance or beyond you can scroll down to the homework. If you have the background for two-way analysis of variance, copy L9dat1.txt and L9dat2.txt to your word processor and save these files onto your floppy disk in TEXT only or ASCII (DOS) TEXT. Click two-way.txt, EDIT, SELECT ALL, COPY, paste the program into the SAS Program window and run it. If you run the unbalanced data through PROC ANOVA you will get Output and the following warning in the SAS Log window:

WARNING: PROC ANOVA has determined that the number of observations in each cell is not equal. PROC GLM may be more appropriate.

PROC GLM is generally better to use in such cases; take STAT 401 or read in a linear models text as to why this is so. To run a PROC GLM for the unbalanced data set (L9dat2.txt) use LOCAL and RECALL TEXT in the SAS Program window, change the PROC ANOVA to PROC GLM, and rerun the program. The output is different than the result for PROC ANOVA.

Examples of Other Analysis of Variance Settings

This section is for your reference. If you do not have the background for this material, please feel free to skip to the homework section below.

Randomized Block Design

PROC GLM  DATA=file1;
     CLASS trmt block;
     MODEL response = block trmt;
     LSMEANS trmt/STDERR;
     MEANS trmt/DUNNETT ('4') ALPHA = .05;

Latin Square Design -

PROC GLM file1;
     CLASS trmt row column;
     MODEL response = row column trmt;
     LSMEANS trmt/STDERR;

Nested Anova & Variance Component Estimation - The program below estimates variance components and conducts F tests for the contribution of components as in the tobacco budworm example on page 149-151 of Robert Kuehl's text "Statistical Principles of Research Design and Analysis", Duxbury Press, Belmont California, 1994, 686pp. Here y = weight, A = strain, and B = parent (strain).

PROC VARCOMP DATA=file1 METHOD=TYPE1;  /* estimates components  */
     CLASS A B;
     Model y = A B(A);        /* B(A) denotes B nested within A */
PROC GLM DATA=file1;          /* PROC GLM tests components      */
     CLASS A B;
     MODEL y = A B(A)/E1;
     RANDOM A B(A)/test;

Split-plot ANOVA - The commands below are useful for conducting a randomized block split-plot design where A is the whole-plot factor and B is the split-plot factor such as in Example 14.1 of Robert Kuehl's text "Statistical Principles of Research Design and Analysis", Duxbury Press, Belmont California, 1994, 686pp.

PROC GLM  DATA=file1;
     CLASS block A B;
     MODEL y = block A|B block*B; /* A|B is equivalent to A B A*B */
     RANDOM block*B/test;

Homework #9

Read 7.5 and 7.6 in your text.

Copy the data set L9hwdat.txt into your word processor and save it to your floppy drive as TEXT ONLY or ASCII (DOS) TEXT. This data set contains four observations of gas mileage for each of five gasoline brands (A, B, C, D, E), i.e., a total of 20 observations.

Do the following:

1. Use PROC PLAN to randomly assign 16 experimental units to 4 treatments in a completely randomized design.

2. a. Conduct a one-way analysis of variance on brand for these data. Interpret the F-test and give the p-value for the test.

b. Using the MEANS command, give the mean and standard deviation of gas mileage for each brand and conduct a Tukey multiple comparison test. Describe which brands differ significantly (use 5% level) from one another.