LAB SESSION 3

NUMERICAL PRESENTATION OF UNIVARIATE DATA

INTRODUCTION:  The basic idea of descriptive statistics is to describe a set of data in a variety of abbreviated ways.  In this lab you will investigate measures of central tendency and dispersion.  The box-and-whiskers display, a graphical display of the 5-number summary of a set of data, will also be introduced.

MEASURES OF CENTRAL TENDENCY AND DISPERSION

Measures of central tendency and variation are the foundation of descriptive statistics but most of these formulas are quite tedious to compute, even with a calculator.  Fortunately, we can find a number of commonly used descriptive statistics using just a single command.  Enter the data in Exercise 2.66 into column A.

Get a histogram of your data and visually approximate the "center". Calculate the mean (and median) using the following commands.

Activate a cell for the answer, then continue with:

Choose:  Insert function, fx > Statistical > AVERAGE  (or MEDIAN)> OK

Enter:     Number 1:  (A2:A16 or select cells)

 mean median 6.933333 7

We can also compute the midrange by using the statistical functions MAX and

MIN as follows:

select a cell to hold the result, then click in the formula box and type (selecting

the appropriate statistical function - shown in bold)

.5*(MAX(A1:A15)+MIN(A1:A15))

 midrange 7.5

Visually locate the three calculated centers on the histogram.  Notice the three measures of central tendency are approximately the same.  How well did you visually approximate the center?

Now, place the values of hours of sleep (column A) plus 4 into column B, do a histogram, visually locate the 'center', then determine the mean, median and midrange.

 mean 10.93333 median 11 midrange 11.5 How did the three measures of central tendency (mean, median, and midrange) change?

Next, place the values of column A times 3 into column C, and follow the procedure above.

 mean 20.8 median 21 midrange 22.5 Compare the three measures of central tendency for the columns of data A, B and C.  How and why did a change in the measures occur?  If a different transformation was performed (such as dividing each entry in A by 2) could you make an educated guess about the effect on these three measures?

Consider Exercise 2.61 in the text.  Retrieve the data (App02-02), do a histogram and calculate the mean, median and midrange.  What is there about the distribution of these ten data values that causes these three averages to be so different?

 mean 35400 median 33375 midrange 39750 mode 31500 Compare the standard deviations for each of the previous four examples, along with how similar or how different the three measures of central tendency were.  Can we use the standard deviation to predict whether we expect these three measures of central tendency to be quite similar or quite different?

FREQUENCY DISTRIBUTIONS

When the sample data are in the form of a frequency distribution, we can still use Excel to describe the distribution.  The class marks need to be listed in one column with the corresponding frequencies in another.  Start a new Excel workbook.  (Choose: File > New > Workbook), and enter the following information, where X represents the number of radios in a household and Frequency  is the number of households having X radios:

X             Freq

1              20

2              35

3              100

4              90

5              65

6              40

7              5

Name column A as Radios, and B as Frequency.  Create column C to be x*f and

D to be x2*f as follows:

Activate C2

Enter:  =A2*B2

Drag:    Bottom right corner of C2 down to give other products

Activate D2 and repeat above commands replacing the formula with =A2*C2

Activate the data in columns B, C and D.

Choose:  AutoSum

To find mean, activate E2, then continue with:

Enter:  =(C9/B9)

To find the variance, activate E3, then continue with:

Enter:  =(D9-(C9^2/B9))/(B9-1)

To find the standard deviation, activate E4, the continue with:

Enter:  =SQRT(E3)

*Reminder: in the case of a grouped frequency distribution enter the class marks in one column and the corresponding frequencies in another.

BOX-AND-WHISKER DISPLAY

The boxplot (Excel's name for the box-and-whisker display) is a simple graph that gives a graphic 5-number summary.  Information about the center, dispersion, and skewness of a data set will be illustrated.  Retrieve the data for Exercise 2.18 (EX02-018) and construct a boxplot for each of columns A, B and C (Aaron, Ruth and Maris).

Choose:  Tools > Data Analysis Plus > BoxPlot > OK

Enter:     A2:A24 or select cells > OK

Aaron Smallest = 10

Q1 = 26

Median = 34

Q3 = 44

Largest = 47

IQR = 18

Outliers:

Ruth Smallest = 11

Q1 = 31.5

Median = 46

Q3 = 51.5

Largest = 60

IQR = 20

Outliers:

Maris Smallest = 5

Q1 = 13

Median = 23

Q3 = 33

Largest = 61

IQR = 20

Outliers:

A rectangle is constructed between the two quartiles, with a line across the box indicating the location of the median.  The box encloses the middle half of the data.  The whiskers extend in either direction to indicate the maximum and minimum values.

Although “side-by-side” BoxPlots cannot be constructed in Excel, we can generate BoxPlots with the same scale for better comparison of the distributions.  Activate all of columns A, B and C, then

Choose:  Tools > Data Analysis Plus > Box Plot > OK > OK   Consider again the salary data presented in Application 2 - 2.  Retrieve the data from the Student Suite CD and perform a BoxPlot of the data in column A. The red oval in the boxplot indicates an outlier- a data value that is far removed from the rest of the data.

ASSIGNMENT: