LAB SESSION 15

ANALYSIS OF VARIANCE

INTRODUCTION:  In earlier sessions you have examined and compared means from

two samples.  We will now practice a technique that tests hypothesis about several

means.  While we could compare the means in pairs as we have done before, the process could become too unwieldy to be of any use.  Analysis of variance (ANOVA) allows us

to test all the means at the same time to see if there is any significant difference between them.

The Logic Underlying The Anova Technique

We will be forming a comparison between two estimates of the population variance: one based on the variance within each set of data and the other between the sets of data.   We

will use the F distribution for this comparison.  If there is relatively little difference

within each group and a large difference between the sample means, we will reject the

null hypothesis.  (Remember we always word the null hypothesis to say “there is no difference...”).  If there is a lot of variance within a group and little between groups, we cannot conclude that the population means are different.  We also need to know that the groups under investigation are approximately normally distributed and independent.

ANOVA is presented as a table, and we need to define our terms in order to understand

what the table is telling us.  The Factor is the variable whose means we are interested in studying.  When we first set up our data charts in Excel, each column will represent different Levels of the Factor we are examining.  Each row will be a data value from

repeated samplings, called a Replicate.  The ANOVA table will give a summary of the data with the different levels of the Factor in the first column, followed by each levels’ count, sum average and variance in subsequent columns across the rows.  It then gives you a chart  describing the sources of variation both Between Groups and Within Groups

PERFORMING AN ANOVA ANALYSIS

This sample problem will take you through the steps of entering the data and generating

the ANOVA table for Illustration 12-1 in your textbook. The FACTOR we are looking at

is temperature and whether it has any effect on production.  We will examine production

at three different temperature levels: 68°, 72°, 76°.  These levels form our columns.  The production amounts are the replicates and form the rows of the data table.  You can name

the columns and enter the data directly into the worksheet.

If we did a Box Plot of the three columns some interesting things are shown.   Note that the points within each level are fairly close, but the three levels hardly overlap at all.

The commands for generating the ANOVA table is as follows:

Choose:            Tools > Data Analysis > ANOVA: Single Factor > OK

Enter:               Input Range: A3:C8

Select:              Output Range:

Enter:               A10                             (or the upper left corner where you want it)

The worksheet will look like this:

 Illustration 12.1 Sample from 68o Sample from 72o Sample from 76o 10 7 3 12 6 3 10 7 5 9 8 4 7 Anova: Single Factor SUMMARY Groups Count Sum Average Variance Sample from 68o 4 41 10.25 1.583333 Sample from 72o 5 35 7 0.5 Sample from 76o 4 15 3.75 0.916667 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 84.5 2 42.25 44.47368 1.05E-05 4.102816 Within Groups 9.5 10 0.95 Total 94 12

Compare the output to the calculations in Illustration 12-1 in the text.

Note in particular that the calculated value for F*  =  44.47.  To make our decision, we

need to compare this to the critical value  F(2,10,.05) = 4.10.  We can therefore conclude

that at least one of the temperatures has an effect on the production level. The p-value

given in the chart can also be used to determine the conclusion.  How would you

interpret it?

Exercise 12.32 in the chapter compares the stopping distances for four brands of tires.

Using the data given there, is there sufficient evidence to conclude that there is a

difference in the mean stopping distances at the  a = .05 level?

This data may be found on the Student Suite CD as ex12-28

a) State your null and alternative hypotheses.

b) Find your critical region and value for F.

c) 1) Enter your data in columns 1 - 4, naming them A, B, C, D respectively.

2) Do a box plot to get a feel for how the data interact.

3) Perform an ANOVA to calculate F*.  What does the p value tell you?

Explain.

d) Draw your conclusion about the null hypothesis and explain what it means to                  you.  How would your conclusion change if a changed?

ASSIGNMENT:  Do Exercises 12.17, 12.18, and 12.22 in your text. The data for

Exercises 12.17, 12.18 and 11.22 may be found on the Student Suite CD.