LAB SESSION 7

             NORMAL APPROXIMATION OF THE BINOMIAL

 

INTRODUCTION:   The normal distribution is one of the most important distribution functions in statistics. We will now see how the binomial probabilities can be reasonably estimated by using the normal probability distribution.  Later we will need to determine whether normality is a reasonable assumption.  We will start our investigation with a few specific binomial distributions.

 

Step 1:Entering the data.   For this demonstration we will use columns A, D, and G to hold a series of numbers.  The corresponding probabilities will be placed into B, E and H.   Enter the numbers 0, 1, 2, 3, and 4 into column A.  Similarly, set D to the numbers 0, 1, ..., 8 and to set G to the numbers 0, 1, 2, ..., 24.  These three columns will be used for three specific situations: n = 4, n = 8, and n = 24.

 

Step 2:Calculating and Storing the Probabilties.   We will now place the binomial probabilities for A into B using BINOMDIST with n = 4 and p = .5.

 

Reminder on how to do this from Lab #6:  Activate cell B1 and continue with:

    Choose:  Insert function, fx > Statistical > BINOMDIST > OK

     Enter:     Number_s:       A1:A5, or select cells

                   Trials:              4

                   Probability_s:  .5

                   Cumulative:    false > OK

    Drag:      fill handle down to give other probabilities

 

 

Place the binomial probabilities for D into E and G into H, being sure to use n = 8 and n = 24, respectively (keep p = 0.5.)

 


Step 3:Plotting the Probabilities  Now we will plot each of the probabilities of x for 0 to n for n = 4 by using the Chart Wizard and procedures identical to earlier constructions of charts:         

Select cells to be used for the chart

            Enter:  ChartWizard > 1st picture > Next

            Select:   the Series tab

            Activate:  Category (x) axis labels:  select appropriate cells (A1:A5) > Next

            Enter:  appropriate titles > Next > Finish

            Chart can then be modified to remove the gaps

 

 

Repeat this procedure for plotting E versus D and H versus G.  What can we say about the distribution as n becomes larger?


 

Step 4:Interpreting the results.

Let's see how the normal distribution approximates a binomial with p = .5 and n = 8.  The approximating normal distribution has mean mu = 8(.5) = 4 and  standard deviation sigma = sqrt((8)(.5)(.5)) =  1.414

 

First, we need to place the normal probabilities for each x (column D) into another column, say column F. 

Activate cell F1 and enter: =NORMDIST(D1:D9, 4 ,SQRT(8*.5*.5),FALSE)

Click and drag:  fill handle to generate normal probability for each x

 

To draw the graph of a the normal probability curve along with a binomial

probability curve, activate cells D1 through F9, then continue with

            Choose:  Chart Wizard > XY (Scatter) > 2nd picture > Next

            Select:  the series tab and enter binomial for the name of series 1, then

click on  series 2 and enter the name normal > Next

                        Select:  the titles tab and enter appropriate titles > Next > Finish

 

 

The chart just executed plotted the probability distribution function for the binomial and for the normal approximation on the same axes.  This will help us see why we can approximate a binomial by a normal and how to do the appropriate calculations. 

 

 

 

You should visualize the histogram corresponding to the binomial probabilities.  The height of a bar is the probability the binomial variable is equal to the corresponding value.  For example, the height of the bar centered at 5 is the probability the binomial is equal to 5.  The base of a bar is 1 unit wide.  Therefore, the area of a bar is equal to its height, and is thus equal to the corresponding probability.

 

Also visualize the normal curve.

 

Here are some calculations that will help the explanation.  Suppose we want the probability that the binomial variable has a value from 5 to 7.  This probability is the sum of the probabilities at 5, 6, and 7.  (Look in Rows 6, 7 and 8 in column E:  the sum is 0.359375)  The area under the normal curve that goes from 4.5 to 7.5 approximates the area of the three binomial bars.  How could we determine this area?

 

 

The probability the binomial variable has a value from 5 to 7 is .359375 .  The approximation obtained from the normal probability distribution is  .353205 without continuity correction, which is very close to the true probability.  If we were to use a normal approximation for a binomial with p = .5 and n = 24 (like in columns G and H), the approximation would look even better.  In the exercises, we'll look at other values of p.

 

 

 

ASSIGNMENT:

1.   (a)  Make plots as in the first part of the lab, but use p = .4 instead of p = .5. 

Use n = 4, 8 and 24.

      (b)  Repeat part (a) using p = .2.

      (c)  What can you say about the normal approximation to the binomial? 

For what values of n and p does it seem to work best?

 

2.   Suppose X has a binomial distribution with p = .8 and n = 25. Use Excel to calculate each of the probabilities below exactly.  Also compute the normal approximation to these probabilities.  Compare the binomial results with the normal approximations.

      (a)  P(X = 21)       

      (b)  P(X < 21)

      (c)  P(X  > 24)

      (d)  P(21 < X < 24)

 

3. Do Exercises 6.83 and 6.85 in your text