LAB SESSION 14

 ANALYZING ENUMERATIVE DATA

 

INTRODUCTION: The data used in this lab is enumerative -- that is, the data is placed in categories and counted.  The observed frequencies list exactly what happened in the sample.  The expected frequencies represent the theoretical expected outcomes (what is expected to happen “on the average”).  These expected values must always add up to n. 

 

When we perform a hypothesis test on these two sets of values, we are really asking, “how different are they”?  If the difference is small, we may attribute it to the chance variation in the samples.  However, if the difference is large there may be a difference in the proportions in the population and we may reject the null hypothesis.  We can use the c2 distribution in our test.  We will first make inferences concerning multinomial experiments and then extend that to contingency tables.

 

MULTINOMIAL EXPERIMENTS

A multinomial experiment consists of n independent trials, whose outcome fits into only one of k possible cells.  The probabilities of each of these cells remains constant and the sum of all the probabilities = 1.   For multinomial experiments, we will always use a right tail critical region of the distribution.  The expected frequency for each cell is obtained by multiplying the probability for that cell by the total number of trials, n.

 

We can use Excel to calculate the Chi-Square statistic by entering the data, and the probability for each cell, calculating the expected values for each cell, and the Chi-Square value for each cell.  We then need to sum each of these columns.  Let us do Illustration 11-1 from the text, implementing Excel to do the calculations.

           

            Enter the column headings in Row 1

Enter the seven observed values into column B.

Since there are seven sections, we can assume the probability of choosing any one of them would be 1/7 of the 119 students.  Therefore, we will enter 17 in seven rows of column C.  These are the Expected values.


 

Next, calculate the sums of each column by using the S from the toolbar.

 

            In Cell D2  enter the formula   = B2 – C2, and copy it down the column.

            In Cell E2 enter the formula  = D2*D2, and copy it down the column.

            In Cell F2 enter the formula  = E2/ C2, and copy it down the column.

 

Calculate the sums of columns D and F.  The sum in column D should be 0, to give you a check on your data.  The sum in Column F is the value of C2.

Compare your results to the text.

 

Let’s enter some data to make our chart complete.

In Cell A11 enter   a = 0.05

In Cell A12 enter   df = 6

In Cell A13 enter   C 2  =  12.94

In Cell A14 enter   p-value =

 

To calculate your p-value click on Cell B14.

Click: Insert > fx > Statistical > CHITEST

Enter: Actual Range:  B2:B8

            Expected Range: C2:C8

            OK

 

This is what you should see:

 

Number

Observed Values (O)

Expected  Values (E)

O-E

(O-E)2

(O-E)2/E

1

18

17

1

1

0.058823529

2

12

17

-5

25

1.470588235

3

25

17

8

64

3.764705882

4

23

17

6

36

2.117647059

5

8

17

-9

81

4.764705882

6

19

17

2

4

0.235294118

7

14

17

-3

9

0.529411765

Sums

119

119

0

 

12.94117647

 

 

 

 

 

 

alpha = 0.05

 

 

 

 

df = 6

 

 

 

 

 

C2 = 12.94

 

 

 

 

 

 

 

 

 

 

 

P-value =

0.04397964

 

 

 

 

 

You now have to finish the test and state your conclusion.

 

ASSIGNMENT: Do Exercises 11.11, 11.14, and 11.16 in your text.

 

INFERENCES ABOUT CONTINGENCY TABLES

Contingency tables arrange data into a two-way classification.  It involves two variables, and the first question we need to ask is are they independent or dependant.  The two tests that use contingency tables are the Test of Independence and the Test for Homogeneity.

 

In a new sheet, enter the data from Illustration 11-4, including appropriate titles.

 

Illustration 11-4

 

 

 

 

 

 

Type of Residence

Favor

Oppose

Total

 

 

 

 

Urban

143

57

200

Suburban

98

102

200

Rural

13

87

100

Total

254

246

500

 

Click on an empty cell.

            Choose: Tools > Data Analysis Plus > Contingency Table > OK

            Enter: Input range:  B5:C7  > OK

            Select: Labels (if necessary)

            Enter: Alpha  a  (.05)

A new sheet will be created (note the tab name) that will contain the following:

 

Contingency Table

 

 

 

 

 

 

 

 

 

Favor

Oppose

Total

TOTAL

 

143

57

200

400

Urban

98

102

200

400

Suburban

13

87

100

200

Rural

254

246

500

1000

TOTAL

508

492

1000

2000

 

 

 

 

 

chi-squared Stat

 

91.7155

 

df

 

 

6

 

p-value 

 

 

0

 

chi-squared Critical

 

12.5916

 

           

 

 

 

 

 

 

 

 

 

 

 

 

You now have to complete the test, noting that your df = 2, and state your conclusion.

 

 

Let us perform the procedure using the data from Exercise 11.31.  First, label your columns and rows.  Enter your data.

 

Exercise 11.31

 

 

 

 

 

 

 

 

 

 

Day of the Week

Mon

Tues

Wed

Thurs

Fri

Nondefective

85

90

95

95

90

Defective

15

10

5

5

10

 

 

 

 

 

 

 

 

Click on an empty cell.

                        Choose: Tools > Data Analysis Plus > Contingency Table > OK

            Enter: Input range:  B5:C7  

            Select: Labels (if necessary)

            Enter: Alpha  a  (.05)

 

A new sheet will be created (note the tab name) that will contain the following:

 

Contingency Table

 

 

 

 

 

 

 

 

 

 

 

 

 

Mon

Tues

Wed

Thurs

Fri

TOTAL

Nondefective

85

90

95

95

90

455

Defective

15

10

5

5

10

45

TOTAL

100

100

100

100

100

500

 

 

 

 

 

 

 

chi-squared Stat

 

8.547

 

 

 

df

 

 

4

 

 

 

p-value 

 

 

0.0735

 

 

 

chi-squared Critical

 

9.4877

 

 

 

 

 

You will still need to frame the null and alternative hypothesis; set the criteria, and then, using the above results, draw your conclusion.

 

 

 

ASSIGNMENT: Do the following Exercises  11.33, 11.34, 11.49, 11.58 in your text.