LAB SESSION 13
INFERENCES INVOLVING TWO POPULATIONS
INTRODUCTION: When comparing two populations we need
two samples, one from each population.
Two kinds of samples can be used: dependent or independent, determined
by the source of the data. The methods
of comparison are quite different.
CASE
1. DEPENDENT SAMPLE (PAIRED DATA): The two data values, one from each set,
that come from the same source are called paired data. They are compared by using the difference in
their values, called the paired difference, d.
Because the distribution of the paired difference, d = x1 - x2,
will be approximately normally distributed when paired observations are
randomly selected from normal populations, we will use the t-test. We wish to make inferences about µd
where the random variable (d) involved has an approximately normal distribution
with an unknown standard deviation (sd).
Confidence Interval
Consider the data presented in exercise
10.20 of your text. Use Excel to
generate the 95% confidence interval for the mean improvement in memory resulting from taking the memory
course. ( d = after - before)
Retrieve the data file for ex10-020 from
the Student Suite CD or enter it yourself in columns A and B.
Form the paired difference and put it in
Column C.
Before |
After |
Difference |
93 |
98 |
-5 |
86 |
92 |
-6 |
72 |
80 |
-8 |
54 |
62 |
-8 |
92 |
91 |
1 |
65 |
78 |
-13 |
80 |
89 |
-9 |
81 |
78 |
3 |
62 |
71 |
-9 |
73 |
80 |
-7 |
To generate the interval:
Click
into any empty cell.
Choose: Tools>Data Analysis Plus
> t-Estimate: Mean
Enter:
Input range: C2:C11
Select: Labels (if necessary)
Enter: Alpha: a ( or 0.05)
The output follows in a separate sheet:
t-Estimate:
Mean |
|
|
|
|
|
|
|
|
|
|
Difference |
Mean |
|
|
-6.1 |
Standard
Deviation |
|
4.7947 |
|
LCL |
|
|
-9.52989 |
UCL |
|
|
-2.67011 |
|
|
|
|
Hypothesis Testing
To demonstrate the procedure for a
hypothesis test on mean difference we will do
Exercise 10.28.
Enter the data for Before in column A and
for After in column B or by retrieving it
from the Student Suite CD (ex10-028) and calculate the paired differences.
Exercise 10.28 |
|
|
|
|
|
Before |
After |
Paired Diff |
29 |
30 |
1 |
22 |
26 |
4 |
25 |
25 |
0 |
29 |
35 |
6 |
26 |
33 |
7 |
24 |
36 |
12 |
31 |
32 |
1 |
46 |
54 |
8 |
34 |
50 |
16 |
28 |
43 |
15 |
Then perform a t-test on the paired
differences (After – Before).
Choose: Tools
> Data Analysis>t-Test: Paired Two Sample for Means
Enter: Variable 1 Range: B4:B14
Enter: Variable 2 Range: A4:A14
Select: Labels
Enter: a (example 0.05)
Select: Output Range
Enter: A15 (or any empty cell)
Click: OK
The
results you get look like this:
t-Test:
Paired Two Sample for Means |
|
|
|
|
|
|
After |
Before |
Mean |
36.4 |
29.4 |
Variance |
94.48888889 |
46.26666667 |
Observations |
10 |
10 |
Pearson
Correlation |
0.810662928 |
|
Hypothesized
Mean Difference |
0 |
|
Df |
9 |
|
t Stat |
3.821341258 |
|
P(T<=t)
one-tail |
0.002040758 |
|
t
Critical one-tail |
1.833113856 |
|
P(T<=t)
two-tail |
0.004081516 |
|
t Critical
two-tail |
2.262158887 |
|
Note:
t statistic = 3.82 and the p-value = 0.0041.
How would you interpret these results?
ASSIGNMENT: Do Exercises 10.21, 10.23, 10.24, 10.30 in your text.
CASE
2. INDEPENDENT SAMPLES: If two samples are selected, one from each
of the populations, the two samples are independent if the selection of objects
from one population is unrelated to the selection of objects from the other
population. Since the samples provide
the information for determining the standard error, the t distribution will be
used as the test statistic, and the degrees of freedom will be calculated by
Excel.
a) Complete the hypothesis test presented
in Exercise 10.61 of your text.
Retrieve the data from the Student Suite
CD and note that the data for Diet A is in Column A and Diet B is in Column B.
DietA |
DietB |
5 |
5 |
14 |
21 |
7 |
16 |
9 |
23 |
11 |
4 |
7 |
16 |
13 |
13 |
14 |
19 |
12 |
9 |
8 |
21 |
Perform a t-test as follows:
Choose: Tools > Data Analysis > t-Test: Two Sample Assuming
Unequal
Enter: Variable 1 Range: B1:B11
Enter: Variable 2 Range: A1:A11
Hypothesized Difference: 0.0
Select: Labels
Enter: a (example 0.10)
Select: Output Range
Enter: A15 (or any empty cell)
Click: OK
We
then get the following output:
t-Test:
Two-Sample Assuming Unequal Variances |
||
|
|
|
|
DietA |
DietB |
Mean |
10 |
14.7 |
Variance |
10.4444444 |
46.0111111 |
Observations |
10 |
10 |
Hypothesized
Mean Difference |
0 |
|
df |
13 |
|
t Stat |
-1.978083 |
|
P(T<=t)
one-tail |
0.03475571 |
|
t
Critical one-tail |
1.7709317 |
|
P(T<=t)
two-tail |
0.06951142 |
|
t
Critical two-tail |
2.16036824 |
|
Do
the data justify the conclusion that the mean weight gained on diet B was
greater than the mean weight gained on diet A, at the a = .05 level of significance?
Now
that we have concluded that there is a difference, let us consider giving a 90%
confidence interval estimate for this difference. The ToolPak does not print a confidence interval directly, but
the output from the t-test provides us with the information to construct one.
To complete the interval you must compute the formula for the confidence
interval. You can do this directly in
the worksheet as follows:
Difference
of the Means (Diet A - Diet B) |
-4.7 |
||
SE =
SQRT(E9/E10 + F9/F10) |
|
2.376037785 |
|
t* |
|
|
1.770931704 |
ME= (t*)( SE) |
|
|
4.207800642 |
lower =
mean Diff - ME |
|
-8.907800642 |
|
upper =
Mean Diff + ME |
|
-0.492199358 |
|
|
|
|
|
So the
90% interval for the difference of means is: (-8.91, -0.49) |
b)
Consider Exercise 10.45 in your text.
Retrieve the data from the Student Suite CD: the data for the males is in Column A and the females is
in Column B.
Example 10.45 |
|
|
|
|
|
males |
females |
diffs |
76 |
76 |
0 |
76 |
70 |
6 |
74 |
82 |
-8 |
70 |
90 |
-20 |
80 |
68 |
12 |
68 |
60 |
8 |
90 |
62 |
28 |
70 |
68 |
2 |
90 |
80 |
10 |
72 |
74 |
-2 |
76 |
60 |
16 |
80 |
62 |
18 |
68 |
72 |
-4 |
72 |
|
72 |
96 |
|
96 |
80 |
|
80 |
Doing
a t-test as above gives the following output:
Example
10.45 |
|
|
|
|
|
|
|
|
|
|
|
males |
females |
diffs |
|
|
|
76 |
76 |
0 |
|
|
|
76 |
70 |
6 |
t-Test:
Two-Sample Assuming Unequal Variances |
|
|
74 |
82 |
-8 |
|
|
|
70 |
90 |
-20 |
|
males |
females |
80 |
68 |
12 |
Mean |
77.375 |
71.07692308 |
68 |
60 |
8 |
Variance |
69.71666667 |
85.07692308 |
90 |
62 |
28 |
Observations |
16 |
13 |
70 |
68 |
2 |
Hypothesized
Mean Difference |
0 |
|
90 |
80 |
10 |
df |
25 |
|
72 |
74 |
-2 |
t Stat |
1.907486345 |
|
76 |
60 |
16 |
P(T<=t)
one-tail |
0.034004546 |
|
80 |
62 |
18 |
t
Critical one-tail |
2.485103323 |
|
68 |
72 |
-4 |
P(T<=t)
two-tail |
0.068009092 |
|
72 |
|
72 |
t
Critical two-tail |
2.787437552 |
|
96 |
|
96 |
|
|
|
80 |
|
80 |
|
|
|
|
|
|
Difference
of the Means (males-females) |
6.298076923 |
|
|
|
|
SE =
SQRT(E9/E10 + F9/F10) |
3.301767764 |
|
|
|
|
t* |
2.485103323 |
|
|
|
|
ME= (t*)( SE) |
8.205234042 |
|
|
|
|
lower =
mean Diff - ME |
-1.90715712 |
|
|
|
|
upper =
Mean Diff + ME |
14.50331096 |
|
So
the interval is (-1.91, 14.50). What
does this imply? (Note the interval
includes 0).
ASSIGNMENT: Do Exercises 10.60, and 10.62 in your
text. Both sets of data are found on the Student Suite CD.
Enrichment Assignment: Do Exercise 10.64 or 10.65. Turn in a typed paper detailing your
procedures and results. Include the
session commands you used and a printed copy of your output to substantiate
your conclusions