LAB SESSION 13

INFERENCES INVOLVING TWO POPULATIONS

INTRODUCTION: When comparing two populations we need two samples, one from each population.  Two kinds of samples can be used: dependent or independent, determined by the source of the data.  The methods of comparison are quite different.

CASE 1. DEPENDENT SAMPLE (PAIRED DATA): The two data values, one from each set, that come from the same source are called paired data.  They are compared by using the difference in their values, called the paired difference, d.  Because the distribution of the paired difference, d = x1 - x2, will be approximately normally distributed when paired observations are randomly selected from normal populations, we will use the t-test.  We wish to make inferences about µd where the random variable (d) involved has an approximately normal distribution with an unknown standard deviation (sd).

Confidence Interval

Consider the data presented in exercise 10.20 of your text.  Use Excel to generate the 95% confidence interval for the mean improvement in      memory resulting from taking the memory course. ( d = after - before)

Retrieve the data file for ex10-020 from the Student Suite CD or enter it yourself in columns A and B.

Form the paired difference and put it in Column C.

 Before After Difference 93 98 -5 86 92 -6 72 80 -8 54 62 -8 92 91 1 65 78 -13 80 89 -9 81 78 3 62 71 -9 73 80 -7

To generate the interval:

Click into any empty cell.

Choose:            Tools>Data Analysis Plus > t-Estimate: Mean

Enter:               Input range: C2:C11

Select:              Labels (if necessary)

Enter:               Alpha:  a ( or 0.05)

The output follows in a separate sheet:

 t-Estimate: Mean Difference Mean -6.1 Standard Deviation 4.7947 LCL -9.52989 UCL -2.67011

Hypothesis Testing

To demonstrate the procedure for a hypothesis test on mean difference we will do Exercise 10.28.

Enter the data for Before in column A and for After in column B or by retrieving             it from the Student Suite CD (ex10-028) and calculate the paired differences.

 Exercise 10.28 Before After Paired      Diff 29 30 1 22 26 4 25 25 0 29 35 6 26 33 7 24 36 12 31 32 1 46 54 8 34 50 16 28 43 15

Then perform a t-test on the paired differences  (After – Before).

Choose:            Tools > Data Analysis>t-Test: Paired Two Sample for Means

Enter: Variable 1 Range: B4:B14

Enter: Variable 2 Range: A4:A14

Select: Labels

Enter:   a  (example 0.05)

Select: Output Range

Enter:   A15 (or any empty cell)

Click:  OK

The results you get look like this:

 t-Test: Paired Two Sample for Means After Before Mean 36.4 29.4 Variance 94.48888889 46.26666667 Observations 10 10 Pearson Correlation 0.810662928 Hypothesized Mean Difference 0 Df 9 t Stat 3.821341258 P(T<=t) one-tail 0.002040758 t Critical one-tail 1.833113856 P(T<=t) two-tail 0.004081516 t Critical two-tail 2.262158887

Note: t statistic = 3.82 and the p-value = 0.0041.  How would you interpret these results?

ASSIGNMENT:  Do Exercises 10.21, 10.23, 10.24, 10.30  in your text.

CASE 2. INDEPENDENT SAMPLES:  If two samples are selected, one from each of the populations, the two samples are independent if the selection of objects from one population is unrelated to the selection of objects from the other population.  Since the samples provide the information for determining the standard error, the t distribution will be used as the test statistic, and the degrees of freedom will be calculated by Excel.

a) Complete the hypothesis test presented in Exercise 10.61 of your text.

Retrieve the data from the Student Suite CD and note that the data for Diet A is in Column A and Diet B is in Column B.

 DietA DietB 5 5 14 21 7 16 9 23 11 4 7 16 13 13 14 19 12 9 8 21

Perform a t-test as follows:

Choose:  Tools > Data Analysis > t-Test: Two Sample Assuming Unequal

# Variances

Enter: Variable 1 Range: B1:B11

Enter: Variable 2 Range: A1:A11

Hypothesized Difference: 0.0

Select: Labels

Enter:   a  (example 0.10)

Select: Output Range

Enter:   A15 (or any empty cell)

Click:  OK

We then get the following output:

 t-Test: Two-Sample Assuming Unequal Variances DietA DietB Mean 10 14.7 Variance 10.4444444 46.0111111 Observations 10 10 Hypothesized Mean Difference 0 df 13 t Stat -1.978083 P(T<=t) one-tail 0.03475571 t Critical one-tail 1.7709317 P(T<=t) two-tail 0.06951142 t Critical two-tail 2.16036824

Do the data justify the conclusion that the mean weight gained on diet B was greater than the mean weight gained on diet A, at the  a = .05 level of  significance?

Now that we have concluded that there is a difference, let us consider giving a 90% confidence interval estimate for this difference.  The ToolPak does not print a confidence interval directly, but the output from the t-test provides us with the information to construct one. To complete the interval you must compute the formula for the confidence interval.  You can do this directly in the worksheet as follows:

 Difference of the Means (Diet A - Diet B) -4.7 SE = SQRT(E9/E10 + F9/F10) 2.376037785 t* 1.770931704 ME=  (t*)( SE) 4.207800642 lower = mean Diff - ME -8.907800642 upper = Mean Diff + ME -0.492199358 So the 90% interval for the difference of means is: (-8.91, -0.49)

b) Consider Exercise 10.45 in your text.  Retrieve the data from the Student Suite     CD: the data for the males is in Column A and the females is in Column B.

 Example 10.45 males females diffs 76 76 0 76 70 6 74 82 -8 70 90 -20 80 68 12 68 60 8 90 62 28 70 68 2 90 80 10 72 74 -2 76 60 16 80 62 18 68 72 -4 72 72 96 96 80 80

Doing a t-test as above gives the following output:

 Example 10.45 males females diffs 76 76 0 76 70 6 t-Test: Two-Sample Assuming Unequal Variances 74 82 -8 70 90 -20 males females 80 68 12 Mean 77.375 71.07692308 68 60 8 Variance 69.71666667 85.07692308 90 62 28 Observations 16 13 70 68 2 Hypothesized Mean Difference 0 90 80 10 df 25 72 74 -2 t Stat 1.907486345 76 60 16 P(T<=t) one-tail 0.034004546 80 62 18 t Critical one-tail 2.485103323 68 72 -4 P(T<=t) two-tail 0.068009092 72 72 t Critical two-tail 2.787437552 96 96 80 80 Difference of the Means (males-females) 6.298076923 SE = SQRT(E9/E10 + F9/F10) 3.301767764 t* 2.485103323 ME=  (t*)( SE) 8.205234042 lower = mean Diff - ME -1.90715712 upper = Mean Diff + ME 14.50331096

So the interval is (-1.91, 14.50).  What does this imply?  (Note the interval includes 0).

ASSIGNMENT: Do Exercises 10.60, and 10.62 in your text.  Both sets of data are    found on the Student Suite CD.

Enrichment Assignment: Do Exercise 10.64 or 10.65.  Turn in a typed paper detailing your procedures and results.  Include the session commands you used and a printed copy of your output to substantiate your conclusions