LAB SESSION 14
ANALYZING ENUMERATIVE DATA
INTRODUCTION: The data used in this lab is enumerative
-- that is, the data is placed in categories and counted. The observed frequencies list exactly what
happened in the sample. The expected
frequencies represent the theoretical expected outcomes (what is expected to
happen “on the average”). These
expected values must always add up to n.
When
we perform a hypothesis test on these two sets of values, we are really asking,
“how different are they”? If the difference
is small, we may attribute it to the chance variation in the samples. However, if the difference is large there
may be a difference in the proportions in the population and we may reject the
null hypothesis. We can use the c2 distribution in our test.
We will first make inferences concerning multinomial experiments and
then extend that to contingency tables.
MULTINOMIAL EXPERIMENTS
A multinomial experiment consists of n
independent trials, whose outcome fits into only one of k possible cells. The probabilities of each of these cells
remains constant and the sum of all the probabilities = 1. For multinomial experiments, we will always
use a right tail critical region of the distribution. The expected frequency for each cell is obtained by multiplying
the probability for that cell by the total number of trials, n.
We can use Excel to calculate the
Chi-Square statistic by entering the data, and the probability for each cell,
calculating the expected values for each cell, and the Chi-Square value for
each cell. We then need to sum each of
these columns. Let us do Illustration
11-1 from the text, implementing Excel to do the calculations.
Enter the column headings in Row 1
Enter the seven observed values into
column B.
Since there are seven sections, we can
assume the probability of choosing any one of them would be 1/7 of the 119
students. Therefore, we will enter 17
in seven rows of column C. These are
the Expected values.
Next, calculate the sums of each column
by using the S from the toolbar.
In
Cell D2 enter the formula = B2 – C2, and copy it down the column.
In
Cell E2 enter the formula = D2*D2, and
copy it down the column.
In
Cell F2 enter the formula = E2/ C2, and
copy it down the column.
Calculate the sums of columns D and
F. The sum in column D should be 0, to
give you a check on your data. The sum
in Column F is the value of C2.
Compare your results to the text.
Let’s enter some data to make our chart
complete.
In Cell A11 enter a = 0.05
In Cell A12 enter df = 6
In Cell A13 enter C
2 = 12.94
In Cell A14 enter p-value =
To calculate your p-value click on Cell
B14.
Click: Insert > fx
> Statistical > CHITEST
Enter: Actual Range: B2:B8
Expected
Range: C2:C8
OK
This is what you should see:
Number |
Observed Values (O) |
Expected Values (E) |
O-E |
(O-E)2 |
(O-E)2/E |
1 |
18 |
17 |
1 |
1 |
0.058823529 |
2 |
12 |
17 |
-5 |
25 |
1.470588235 |
3 |
25 |
17 |
8 |
64 |
3.764705882 |
4 |
23 |
17 |
6 |
36 |
2.117647059 |
5 |
8 |
17 |
-9 |
81 |
4.764705882 |
6 |
19 |
17 |
2 |
4 |
0.235294118 |
7 |
14 |
17 |
-3 |
9 |
0.529411765 |
Sums |
119 |
119 |
0 |
|
12.94117647 |
|
|
|
|
|
|
alpha =
0.05 |
|
|
|
|
|
df = 6 |
|
|
|
|
|
C2 = 12.94 |
|
|
|
|
|
|
|
|
|
|
|
P-value = |
0.04397964 |
|
|
|
|
You now have to finish the test and state
your conclusion.
ASSIGNMENT:
Do Exercises 11.11,
11.14, and 11.16 in your text.
INFERENCES
ABOUT CONTINGENCY TABLES
Contingency tables arrange data into a
two-way classification. It involves two
variables, and the first question we need to ask is are they independent or
dependant. The two tests that use
contingency tables are the Test of Independence and the Test for Homogeneity.
In a new sheet, enter the data from
Illustration 11-4, including appropriate titles.
Illustration
11-4 |
|
|
|
|
|
|
|
Type of Residence |
Favor |
Oppose |
Total |
|
|
|
|
Urban |
143 |
57 |
200 |
Suburban |
98 |
102 |
200 |
Rural |
13 |
87 |
100 |
Total |
254 |
246 |
500 |
Click on an empty cell.
Choose:
Tools > Data Analysis Plus > Contingency Table > OK
Enter:
Input range: B5:C7 > OK
Select:
Labels (if necessary)
Enter:
Alpha a (.05)
A new sheet will be created (note the tab
name) that will contain the following:
Contingency Table |
|
|
|
|
|
|
|
|
|
|
Favor |
Oppose |
Total |
TOTAL |
|
143 |
57 |
200 |
400 |
Urban |
98 |
102 |
200 |
400 |
Suburban |
13 |
87 |
100 |
200 |
Rural |
254 |
246 |
500 |
1000 |
TOTAL |
508 |
492 |
1000 |
2000 |
|
|
|
|
|
chi-squared Stat |
|
91.7155 |
|
|
df |
|
|
6 |
|
p-value |
|
|
0 |
|
chi-squared Critical |
|
12.5916 |
|
You now have to complete the test, noting
that your df = 2, and state your conclusion.
Let us perform the procedure using the
data from Exercise 11.31. First, label
your columns and rows. Enter your data.
Exercise 11.31 |
|
|
|
|
|
|
|
|
|
|
|
Day of the Week |
Mon |
Tues |
Wed |
Thurs |
Fri |
Nondefective |
85 |
90 |
95 |
95 |
90 |
Defective |
15 |
10 |
5 |
5 |
10 |
Click on an empty cell.
Choose: Tools > Data Analysis
Plus > Contingency Table > OK
Enter:
Input range: B5:C7
Select:
Labels (if necessary)
Enter:
Alpha a (.05)
A new sheet will be created (note the tab
name) that will contain the following:
Contingency
Table |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Mon |
Tues |
Wed |
Thurs |
Fri |
TOTAL |
Nondefective |
85 |
90 |
95 |
95 |
90 |
455 |
Defective |
15 |
10 |
5 |
5 |
10 |
45 |
TOTAL |
100 |
100 |
100 |
100 |
100 |
500 |
|
|
|
|
|
|
|
chi-squared
Stat |
|
8.547 |
|
|
|
|
df |
|
|
4 |
|
|
|
p-value |
|
|
0.0735 |
|
|
|
chi-squared
Critical |
|
9.4877 |
|
|
|
You will still need to frame the null and
alternative hypothesis; set the criteria, and then, using the above results,
draw your conclusion.
ASSIGNMENT: Do the following Exercises 11.33, 11.34, 11.49, 11.58 in your text.