LAB SESSION 17

ELEMENTS OF NON-PARAMETRIC STATISTICS

INTRODUCTION: All the previous methods we have studied are parametric statistics - based on a population that has a certain distribution and can be applied only when

special criteria are met.  Non-parametric statistical methods can be applied when these

criteria are not able to be met and assumptions about the parent population (such as normality) cannot be made, since these techniques do not rely on the distribution of the

parent population.  Non-parametric methods tend, unfortunately , to waste information

and are less sensitive than their parametric counterparts.  This, however, can be

compensated for very nicely by increasing the sample size.  Non-parametric techniques

are generally easier to apply and are only slightly less efficient than parametric techniques.

THE SIGN TEST

The Sign test is one of the easiest tests to use, since it reduces the data to plus and minus signs.  It can be used in hypothesis test for a single median or for two dependent samples using a paired difference.  The basic concept is that because the median is the middle

piece of data, with 50% of the data above it (represented by +) and 50% below

(represented by - ), then P(+) = .5 and P(-) = .5 .  The method is fairly simple: all zeroes

are rejected and the rest of the data is assigned positive and negative signs.  The test

statistic is the number of the less frequent sign.  This is actually a binomial random

variable (outcome either + or -) with a probability of 1/2.

Z is calculated by the formula

z = (x¢- n/2)/ [(1/2) Ön]

We will use the data from Exercise 14.3 as a sample for using Excel to

perform a sign test of whether the median high temperature is 48..

a) State the hypotheses:

H0: The median high temperature = 48

Ha: The median high temperature ¹ 48.

b) Set test criteria:  This process will compute the differences between the data values and the hypothesized median,  sort the results, so that the number of plusses and minuses can be counted.

First enter a new worksheet, placing the data in column  A (20 entries).

Click on cell B1

Using the menu and choosing the following commands:

Click on the Insert drop down menu and choose   fx>ALL>SIGN    OK

Enter  A1 – 48     (the hypothesized median temperature)

Drag:  The lower right corner of B1 down to B20 to copy the function .

Select columns A and B from 1 through 20.

Choose DATA> SORT

Enter:  Sort by :  Column B

Select:  Ascending

We do not count those values that equal the median, so we have a sample size of

19 and  a = .05.  From Table 12, we find the critical value for the less frequent

sign is 4.  If the number of the less frequent sign is < 4 we must reject the H0.

c) Notice we have only 3 temperatures above the stated median and 16 below.

The actual median of the sample is 45.5.

d) We therefore reject the H0 in favor of the Ha.

ASSIGNMENT:  Do  Exercise 14.10 in your text.

The Sign test can also be used for paired differences with two dependent samples.

Do Exercise 14.11 in  the Practice Test in your text.

THE MANN WHITNEY TEST

This is an alternative method for the t-test on two independent random samples in which

the random variable is continuous (also called Mann-Whitney-Wilcoxon test).

By default, a two-sided test is performed. To do one-sided tests, select the test you want

from the Alternative dialogue box.

The test is carried out as follows: First, the two samples are ranked together, with the

smallest observation given rank 1, the next largest given rank 2, and so on.  Then the

sum of the ranks of the first sample is calculated.  If the sum is small, it indicates the observations from the first sample are smaller than those from the second sample, etc.

The attained significance level of the test is calculated using a normal approximation

(with a continuity correction factor).

The following problem demonstrates Illustration 14-6 in your text:

We first name and enter data in column A the grades from test A and test B, creating one data list.

Input the source of the data into column B.

Choose:  Tools>Data Analysis Plus>Wilcoxon Rank Sum Test   > OK

Enter:             variable 1 range::  A1: A20

variable 2 range::  B1: B20

Select:  Labels (if necessary)

Choose: Alpha:  a ( .05)

 Ranked Data Source Rank 49 A 1 52 A 2 56 A 3 62 B 4 64 A 5 65 A 6 71 B 7 72 B 8 74 B 9 78 A 10.5 78 A 10.5 80 B 12 81 B 13 86 A 14 88 B 15 90 A 17 90 A 17 90 B 17 91 B 19 98 B 20 Sum of A ranks = 86 Sum of B ranks = 124 Total sum of ranks = 210

Note: the two occurrences of 78 share ranks 10 and 11.

Running the test results in an additional sheet containing the following:

 RankSum SampSize 49 1 210 20 52 1 56 1 62 1 64 1 65 1 71 1 72 1 74 1 78 1 78 1 80 1 81 1 86 1 88 1 90 1 90 1 90 1 91 1 98 1

Doing the following calculations in an empty cell:

Ua = na * nb +  (nb)(nb + 1)  - Rb   = (10)(10) +   (10)(10+1) -  124  = 31

2                    2

Ub = na * nb +  (na)(na + 1)  - Rb   = (10)(10) +   (10)(10+1) -  86  = 69

3                    2

So U* = 31,  the lower value.

The critical region is two-tailed, so we use Table 13A (for  a = .05).

In  Table 13 you can see at the intersection of n1 = 10 and n2 = 10,   the value 23. The critical region is thus U < 23 .  Since U* is not in the critical region we Fail to Reject the hypothesis.

ASSIGNMENT: Do Exercises 14.23, 14.24, and 14.25 in your text.  Be sure to clearly                    state the hypotheses and test criteria. Ex 14-024 is on the Student Suite CD.

RUNS TEST FOR RANDOMNESS

How do we really know when a set of outcomes is truly random?   It cannot be in just counting the number of outcomes, but also in looking at the order in which those

outcomes arise -- their arrangement.  A particular run is a sequence of outcomes that

have a common property.  When that property changes the current run ends and a new

one begins with the new property.  The random variable to be considered is V, the

number of runs.  Its critical value is found in Table 14.

Illustration 14-8 is used to demonstrate the EXCEL technique:

a) State your hypotheses:               H0: The numbers are random

Ha: the numbers are not random

b) State criteria: A two tail test with  a = .05 and critical values 2 and 10 from                       Table 14.

c) Perform the test:  Note: Excel will only compute the differences between the data values and the median.  To complete the test, you will need to count the number of runs created by the + and – signs.

First enter the data  in column A.

Click on B1.

Enter:  A1 – MEDIAN(A1:A10)

Drag the lower right corner of B1 to B10 to copy the formula for each data entry.

Count the number of runs.   You should get V* = 4   (4 observed runs)

 Data A1-Median(A1:A10) Sign 2 -1.5 -1 3 -0.5 -1 1 -2.5 -1 1 -2.5 -1 4 0.5 1 2 -1.5 -1 6 2.5 1 6 2.5 1 6 2.5 1 7 3.5 1

d) Calculate the P – value by using Table 14 and determine if it is smaller than a.

e) Conclusion:  State your results  (Accept or Reject)

ASSIGNMENT: Do Exercises 14.37-14.39, and 14.41 in your text.

(Exercises 14.38, 14.39, and 14.41 are on Student Suite CD)