LAB SESSION 14

ANALYZING ENUMERATIVE DATA

INTRODUCTION: The data used in this lab is enumerative -- that is, the data is placed in categories and counted.  The observed frequencies list exactly what happened in the sample.  The expected frequencies represent the theoretical expected outcomes (what is expected to happen “on the average”).  These expected values must always add up to n.

When we perform a hypothesis test on these two sets of values, we are really asking, “how different are they”?  If the difference is small, we may attribute it to the chance variation in the samples.  However, if the difference is large there may be a difference in the proportions in the population and we may reject the null hypothesis.  We can use the c2 distribution in our test.  We will first make inferences concerning multinomial experiments and then extend that to contingency tables.

MULTINOMIAL EXPERIMENTS

A multinomial experiment consists of n independent trials, whose outcome fits into only one of k possible cells.  The probabilities of each of these cells remains constant and the sum of all the probabilities = 1.   For multinomial experiments, we will always use a right tail critical region of the distribution.  The expected frequency for each cell is obtained by multiplying the probability for that cell by the total number of trials, n.

We can use Excel to calculate the Chi-Square statistic by entering the data, and the probability for each cell, calculating the expected values for each cell, and the Chi-Square value for each cell.  We then need to sum each of these columns.  Let us do Illustration 11-1 from the text, implementing Excel to do the calculations.

Enter the column headings in Row 1

Enter the seven observed values into column B.

Since there are seven sections, we can assume the probability of choosing any one of them would be 1/7 of the 119 students.  Therefore, we will enter 17 in seven rows of column C.  These are the Expected values.

Next, calculate the sums of each column by using the S from the toolbar.

In Cell D2  enter the formula   = B2 – C2, and copy it down the column.

In Cell E2 enter the formula  = D2*D2, and copy it down the column.

In Cell F2 enter the formula  = E2/ C2, and copy it down the column.

Calculate the sums of columns D and F.  The sum in column D should be 0, to give you a check on your data.  The sum in Column F is the value of C2.

Compare your results to the text.

Let’s enter some data to make our chart complete.

In Cell A11 enter   a = 0.05

In Cell A12 enter   df = 6

In Cell A13 enter   C 2  =  12.94

In Cell A14 enter   p-value =

To calculate your p-value click on Cell B14.

Click: Insert > fx > Statistical > CHITEST

Enter: Actual Range:  B2:B8

Expected Range: C2:C8

OK

This is what you should see:

 Number Observed Values (O) Expected  Values (E) O-E (O-E)2 (O-E)2/E 1 18 17 1 1 0.058823529 2 12 17 -5 25 1.470588235 3 25 17 8 64 3.764705882 4 23 17 6 36 2.117647059 5 8 17 -9 81 4.764705882 6 19 17 2 4 0.235294118 7 14 17 -3 9 0.529411765 Sums 119 119 0 12.94117647 alpha = 0.05 df = 6 C2 = 12.94 P-value = 0.04397964

You now have to finish the test and state your conclusion.

ASSIGNMENT: Do Exercises 11.11, 11.14, and 11.16 in your text.

Contingency tables arrange data into a two-way classification.  It involves two variables, and the first question we need to ask is are they independent or dependant.  The two tests that use contingency tables are the Test of Independence and the Test for Homogeneity.

In a new sheet, enter the data from Illustration 11-4, including appropriate titles.

 Illustration 11-4 Type of Residence Favor Oppose Total Urban 143 57 200 Suburban 98 102 200 Rural 13 87 100 Total 254 246 500

Click on an empty cell.

Choose: Tools > Data Analysis Plus > Contingency Table > OK

Enter: Input range:  B5:C7  > OK

Select: Labels (if necessary)

Enter: Alpha  a  (.05)

A new sheet will be created (note the tab name) that will contain the following:

 Contingency Table Favor Oppose Total TOTAL 143 57 200 400 Urban 98 102 200 400 Suburban 13 87 100 200 Rural 254 246 500 1000 TOTAL 508 492 1000 2000 chi-squared Stat 91.7155 df 6 p-value 0 chi-squared Critical 12.5916

You now have to complete the test, noting that your df = 2, and state your conclusion.

Let us perform the procedure using the data from Exercise 11.31.  First, label your columns and rows.  Enter your data.

 Exercise 11.31 Day of the Week Mon Tues Wed Thurs Fri Nondefective 85 90 95 95 90 Defective 15 10 5 5 10

Click on an empty cell.

Choose: Tools > Data Analysis Plus > Contingency Table > OK

Enter: Input range:  B5:C7

Select: Labels (if necessary)

Enter: Alpha  a  (.05)

A new sheet will be created (note the tab name) that will contain the following:

 Contingency Table Mon Tues Wed Thurs Fri TOTAL Nondefective 85 90 95 95 90 455 Defective 15 10 5 5 10 45 TOTAL 100 100 100 100 100 500 chi-squared Stat 8.547 df 4 p-value 0.0735 chi-squared Critical 9.4877

You will still need to frame the null and alternative hypothesis; set the criteria, and then, using the above results, draw your conclusion.

ASSIGNMENT: Do the following Exercises  11.33, 11.34, 11.49, 11.58 in your text.