LAB SESSION 16
LINEAR REGRESSION ANALYSIS
INTRODUCTION: In an earlier lab, we looked at bivariate
data, and used the linear correlation coefficient to see if there was a
relationship between the two variables.
You also looked at a method of developing a line of best fit. In this lab we will look at a method of
deciding whether the equation of that line is of any use to us in making point
predictions and developing confidence intervals.
Before
beginning this lab, you should review the commands for performing a regression
analysis in Lab 5. Use the data in
Exercise 13.33 just to refresh your memory.
Enter x values in column A (independent variable)
Enter y values in column B
(dependant variable)
Choose: Tools>Data
Analysis>Regression OK
Enter: Input y range: B1: B11 (or select cells)
Input X
range; A1:A11 ( or select cells)
Select:
Labels (if you labeled your columns)
Confidence level: 95% (or desired level)
Select:
Output Range: C3 (or upper left corner of where you
want
output to appear)
Check the
following boxes, depending on desired output:
Residuals:
to obtain predicted values and their
residuals
Residual
Plots: scatterplot of residuals against their x values
Line Fit
Plots: scatterplot of y against x
Note: You
can make the output nicer looking if you choose
Format>Column>Autofit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SUMMARY
OUTPUT |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
||
Multiple
R |
0.973362 |
|
|
|
|
|
|
|
|
R Square |
0.947433 |
|
|
|
|
|
|
|
|
Adjusted
R Square |
0.940863 |
|
|
|
|
|
|
|
|
Standard
Error |
10.17383 |
|
|
|
|
|
|
|
|
Observations |
10 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
|
Regression |
1 |
14924.45 |
14924.45 |
144.1882 |
2.13E-06 |
|
|
|
|
Residual |
8 |
828.0539 |
103.5067 |
|
|
|
|
|
|
Total |
9 |
15752.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
-13.4135 |
7.167862 |
-1.87134 |
0.098204 |
-29.9426 |
3.115613 |
-29.9426 |
3.115613 |
|
x |
2.3028 |
0.191775 |
12.00784 |
2.13E-06 |
1.860566 |
2.745034 |
1.860566 |
2.745034 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
RESIDUAL
OUTPUT |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Observation |
Predicted y |
Residuals |
|
|
|
|
|
|
|
1 |
14.22008 |
0.779918 |
|
|
|
|
|
|
|
2 |
18.82568 |
6.174318 |
|
|
|
|
|
|
|
3 |
23.43128 |
6.568718 |
|
|
|
|
|
|
|
4 |
32.64248 |
-2.64248 |
|
|
|
|
|
|
|
5 |
39.55088 |
-9.55088 |
|
|
|
|
|
|
|
6 |
92.51528 |
-12.5153 |
|
|
|
|
|
|
|
7 |
101.7265 |
-11.7265 |
|
|
|
|
|
|
|
8 |
97.12088 |
-2.12088 |
|
|
|
|
|
|
|
9 |
101.7265 |
8.273522 |
|
|
|
|
|
|
|
10 |
113.2405 |
16.75952 |
|
|
|
|
|
|
For
additional review, do exercise 3.38 in your text.
There
are several steps in doing a linear regression analysis. First we obtain a least squares estimate for
the model equation y = b0 +b1 x +e, using the
REGRESSION command as practiced above.
Next, we need to check our assumptions about the random error component, e. (The mean value of the experimental error is zero. We must also assume that the distribution of
the y’s is approximately normal and the variances s2 of the distribution of random errors is a constant.) Lastly, we can construct a confidence
interval for our predictions.
We will use Exercise 13.59 to
demonstrate the procedure. Enter the
data into columns B and C
|
Median Income |
% Immunized |
|
|
|
|
|
|
|
|
17723 |
43 |
|
|
|
|
|
|
|
|
27005 |
51 |
|
|
|
|
|
|
|
|
33424 |
62 |
|
|
|
|
|
|
|
|
43337 |
66 |
|
|
|
|
|
|
|
|
19226 |
46 |
|
|
|
|
|
|
|
|
29775 |
59 |
|
|
|
|
|
|
|
|
40607 |
65 |
|
|
|
|
|
|
|
|
45496 |
62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Correlation
Coefficient |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
0.930250135 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Line of
Best Fit: |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
SUMMARY
OUTPUT |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Regression Statistics |
|
|
|
|
|
|
|
||
Multiple R |
0.930250135 |
|
|
|
|
|
|
|
|
R Square |
0.865365314 |
|
|
|
|
|
|
|
|
Adjusted R
Square |
0.842926199 |
|
|
|
|
|
|
|
|
Standard
Error |
3.517836207 |
|
|
|
|
|
|
|
|
Observations |
8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ANOVA |
|
|
|
|
|
|
|
|
|
|
df |
SS |
MS |
F |
Significance F |
|
|
|
|
Regression |
1 |
477.248971 |
477.2489705 |
38.56504 |
0.000805 |
|
|
|
|
Residual |
6 |
74.2510295 |
12.37517158 |
|
|
|
|
|
|
Total |
7 |
551.5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
Lower 95.0% |
Upper 95.0% |
|
Intercept |
31.719566 |
4.21814734 |
7.519786161 |
0.000286 |
21.39812 |
42.04101 |
21.39812 |
42.04101 |
|
Median
Income |
0.000780393 |
0.00012567 |
6.210075577 |
0.000805 |
0.000473 |
0.001088 |
0.000473 |
0.001088 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
RESIDUAL
OUTPUT |
|
|
|
|
|
|
|
||
|
|
|
|
|
|
|
|
|
|
Observation |
Predicted % Immunized |
Residuals |
|
|
|
|
|
|
|
1 |
45.55047742 |
-2.5504774 |
|
|
|
|
|
|
|
2 |
52.79408854 |
-1.7940885 |
|
|
|
|
|
|
|
3 |
57.80343348 |
4.19656652 |
|
|
|
|
|
|
|
4 |
65.5394728 |
0.4605272 |
|
|
|
|
|
|
|
5 |
46.72340863 |
-0.7234086 |
|
|
|
|
|
|
|
6 |
54.95577813 |
4.04422187 |
|
|
|
|
|
|
|
7 |
63.40899894 |
1.59100106 |
|
|
|
|
|
|
|
8 |
67.22434205 |
-5.2243421 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Note
the coefficients : The intercept is
31.72 and the slope is
0.00078 so we have the following equation:
Percent = 31.72 +
0.00078(Median Income)
We
used the default of 95% for the confidence level. Note the upper and lower 95% confidence limits appear in blue on
the above chart.
ASSIGNMENT: Do Exercises 13.58, 13.60, 13.75, 13.76
in your text.