LAB SESSION 16

LINEAR REGRESSION ANALYSIS

INTRODUCTION: In an earlier lab, we looked at bivariate data, and used the linear correlation coefficient to see if there was a relationship between the two variables.  You also looked at a method of developing a line of best fit.  In this lab we will look at a method of deciding whether the equation of that line is of any use to us in making point predictions and developing confidence intervals.

Before beginning this lab, you should review the commands for performing a regression analysis in Lab 5.  Use the data in Exercise 13.33 just to refresh your memory.

Enter x values in column A  (independent variable)

Enter y values in  column B  (dependant variable)

Choose: Tools>Data Analysis>Regression      OK

Enter:  Input y range:  B1: B11 (or select cells)

Input X range;  A1:A11 ( or select cells)

Select: Labels  (if you labeled your columns)

Confidence level:   95% (or desired level)

Select: Output Range:  C3  (or upper left corner of where you

want output to appear)

Check the following boxes, depending on desired output:

Residuals: to obtain predicted values   and their residuals

Residual Plots: scatterplot of residuals against their x values

Line Fit Plots: scatterplot of y against x

Note: You can make the output nicer looking if you choose

Format>Column>Autofit

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.973362

R Square

0.947433

0.940863

Standard Error

10.17383

Observations

10

ANOVA

df

SS

MS

F

Significance F

Regression

1

14924.45

14924.45

144.1882

2.13E-06

Residual

8

828.0539

103.5067

Total

9

15752.5

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

-13.4135

7.167862

-1.87134

0.098204

-29.9426

3.115613

-29.9426

3.115613

x

2.3028

0.191775

12.00784

2.13E-06

1.860566

2.745034

1.860566

2.745034

RESIDUAL OUTPUT

Observation

Predicted y

Residuals

1

14.22008

0.779918

2

18.82568

6.174318

3

23.43128

6.568718

4

32.64248

-2.64248

5

39.55088

-9.55088

6

92.51528

-12.5153

7

101.7265

-11.7265

8

97.12088

-2.12088

9

101.7265

8.273522

10

113.2405

16.75952

There are several steps in doing a linear regression analysis.  First we obtain a least squares estimate for the model equation y = b0 +b1 x +e, using the REGRESSION command as practiced above.  Next, we need to check our assumptions about the random error   component, e. (The mean value of the experimental error is zero.  We must also assume that the distribution of the y’s is approximately normal and the variances s2 of the distribution of random errors is a constant.)   Lastly, we can construct a confidence interval for our predictions.

We will use Exercise 13.59 to demonstrate the procedure.  Enter the data into columns B and C

Median Income

% Immunized

17723

43

27005

51

33424

62

43337

66

19226

46

29775

59

40607

65

45496

62

Correlation Coefficient

0.930250135

Line of Best Fit:

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.930250135

R Square

0.865365314

0.842926199

Standard Error

3.517836207

Observations

8

ANOVA

df

SS

MS

F

Significance F

Regression

1

477.248971

477.2489705

38.56504

0.000805

Residual

6

74.2510295

12.37517158

Total

7

551.5

# Coefficients

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

Lower 95.0%

Upper 95.0%

Intercept

31.719566

4.21814734

7.519786161

0.000286

21.39812

42.04101

21.39812

42.04101

Median Income

0.000780393

0.00012567

6.210075577

0.000805

0.000473

0.001088

0.000473

0.001088

RESIDUAL OUTPUT

Observation

Predicted % Immunized

Residuals

1

45.55047742

-2.5504774

2

52.79408854

-1.7940885

3

57.80343348

4.19656652

4

65.5394728

0.4605272

5

46.72340863

-0.7234086

6

54.95577813

4.04422187

7

63.40899894

1.59100106

8

67.22434205

-5.2243421

Note the coefficients :  The intercept is 31.72  and the slope  is  0.00078 so we have the following equation:

Percent = 31.72 + 0.00078(Median Income)

We used the default of 95% for the confidence level.  Note the upper and lower 95% confidence limits appear in blue on the above chart.

ASSIGNMENT: Do Exercises 13.58, 13.60, 13.75, 13.76 in your text.