LAB SESSION 17
ELEMENTS OF NON-PARAMETRIC STATISTICS
INTRODUCTION: All the previous methods we have studied
are parametric statistics - based on a population that has a certain
distribution and can be applied only when
special
criteria are met. Non-parametric
statistical methods can be applied when these
criteria
are not able to be met and assumptions about the parent population (such as
normality) cannot be made, since these techniques do not rely on the
distribution of the
parent
population. Non-parametric methods
tend, unfortunately , to waste information
and
are less sensitive than their parametric counterparts. This, however, can be
compensated
for very nicely by increasing the sample size.
Non-parametric techniques
are
generally easier to apply and are only slightly less efficient than parametric
techniques.
THE SIGN TEST
The
Sign test is one of the easiest tests to use, since it reduces the data to plus
and minus signs. It can be used in
hypothesis test for a single median or for two dependent samples using a paired
difference. The basic concept is that
because the median is the middle
piece
of data, with 50% of the data above it (represented by +) and 50% below
(represented
by - ), then P(+) = .5 and P(-) = .5 .
The method is fairly simple: all zeroes
are
rejected and the rest of the data is assigned positive and negative signs. The test
statistic
is the number of the less frequent sign.
This is actually a binomial random
variable
(outcome either + or -) with a probability of 1/2.
Z is calculated by the formula
z = (x¢- n/2)/ [(1/2) Ön]
We will use the data from Exercise 14.3
as a sample for using Excel to
perform a sign test of whether the median
high temperature is 48..
a) State the hypotheses:
H0: The median high
temperature = 48
Ha: The median high
temperature ¹ 48.
b) Set test criteria: This process will compute the differences
between the data values and the hypothesized median, sort the results, so that the number of plusses and minuses can
be counted.
First enter a new worksheet, placing the
data in column A (20 entries).
Click on cell B1
Using the menu and choosing the following
commands:
Click
on the Insert drop down menu and choose
fx>ALL>SIGN OK
Enter
A1 – 48 (the hypothesized median temperature)
Drag: The lower right corner of B1 down to B20 to
copy the function .
Select
columns A and B from 1 through 20.
Choose
DATA> SORT
Enter: Sort by : Column B
Select: Ascending
We do not count those values that equal
the median, so we have a sample size of
19 and
a = .05.
From Table 12, we find the critical value for the less frequent
sign is 4. If the number of the less frequent sign is < 4 we must
reject the H0.
c) Notice we have only 3 temperatures
above the stated median and 16 below.
The actual median of the sample is 45.5.
d) We therefore reject the H0
in favor of the Ha.
ASSIGNMENT: Do Exercise 14.10 in your text.
The Sign test can also be used for paired
differences with two dependent samples.
Do Exercise 14.11 in the
Practice Test in your text.
THE MANN WHITNEY TEST
This
is an alternative method for the t-test on two independent random samples in
which
the
random variable is continuous (also called Mann-Whitney-Wilcoxon test).
By
default, a two-sided test is performed. To do one-sided tests, select the test
you want
from
the Alternative dialogue box.
The
test is carried out as follows: First, the two samples are ranked together,
with the
smallest
observation given rank 1, the next largest given rank 2, and so on. Then the
sum
of the ranks of the first sample is calculated. If the sum is small, it indicates the observations from the first
sample are smaller than those from the second sample, etc.
The
attained significance level of the test is calculated using a normal
approximation
(with
a continuity correction factor).
The
following problem demonstrates Illustration 14-6 in your text:
We first name and enter data in column A
the grades from test A and test B, creating one data list.
Input the source of the data into column
B.
Choose:
Tools>Data Analysis Plus>Wilcoxon Rank Sum Test > OK
Enter: variable
1 range:: A1: A20
variable
2 range:: B1: B20
Select:
Labels (if necessary)
Choose: Alpha: a ( .05)
Ranked Data |
Source |
Rank |
49 |
A |
1 |
52 |
A |
2 |
56 |
A |
3 |
62 |
B |
4 |
64 |
A |
5 |
65 |
A |
6 |
71 |
B |
7 |
72 |
B |
8 |
74 |
B |
9 |
78 |
A |
10.5 |
78 |
A |
10.5 |
80 |
B |
12 |
81 |
B |
13 |
86 |
A |
14 |
88 |
B |
15 |
90 |
A |
17 |
90 |
A |
17 |
90 |
B |
17 |
91 |
B |
19 |
98 |
B |
20 |
|
|
|
|
|
|
Sum of A ranks = 86 |
|
|
Sum of B ranks = 124 |
|
|
Total sum of ranks = 210 |
|
|
Note:
the two occurrences of 78 share ranks 10 and 11.
Running
the test results in an additional sheet containing the following:
|
|
RankSum |
SampSize |
49 |
1 |
210 |
20 |
52 |
1 |
|
|
56 |
1 |
|
|
62 |
1 |
|
|
64 |
1 |
|
|
65 |
1 |
|
|
71 |
1 |
|
|
72 |
1 |
|
|
74 |
1 |
|
|
78 |
1 |
|
|
78 |
1 |
|
|
80 |
1 |
|
|
81 |
1 |
|
|
86 |
1 |
|
|
88 |
1 |
|
|
90 |
1 |
|
|
90 |
1 |
|
|
90 |
1 |
|
|
91 |
1 |
|
|
98 |
1 |
|
|
Doing
the following calculations in an empty cell:
Ua
= na * nb + (nb)(nb
+ 1) - Rb = (10)(10) + (10)(10+1) -
124 = 31
2
2
Ub
= na * nb + (na)(na
+ 1) - Rb = (10)(10) + (10)(10+1) -
86 = 69
3
2
So U* =
31, the lower value.
The critical region is two-tailed, so we use Table
13A (for a = .05).
In Table 13
you can see at the intersection of n1 = 10 and n2 =
10, the value 23. The critical region
is thus U < 23 . Since U* is
not in the critical region we Fail to Reject the hypothesis.
ASSIGNMENT: Do
Exercises 14.23, 14.24, and 14.25 in your text. Be sure to clearly state the hypotheses and test criteria. Ex 14-024 is on the
Student Suite CD.
RUNS TEST FOR RANDOMNESS
How do we really know when a set of outcomes
is truly random? It cannot be in just
counting the number of outcomes, but also in looking at the order in which
those
outcomes
arise -- their arrangement. A
particular run is a sequence of outcomes that
have
a common property. When that property
changes the current run ends and a new
one
begins with the new property. The
random variable to be considered is V, the
number
of runs. Its critical value is found in
Table 14.
Illustration 14-8 is used to demonstrate
the EXCEL technique:
a) State your hypotheses: H0:
The numbers are random
Ha: the numbers are not random
b) State criteria: A two tail test
with a = .05 and critical
values 2 and 10 from Table 14.
c) Perform the test: Note: Excel will only compute the
differences between the data values and the median. To complete the test, you will need to count the number of runs
created by the + and – signs.
First enter the data in column A.
Click on B1.
Enter: A1 – MEDIAN(A1:A10)
Drag the lower right corner of B1 to B10
to copy the formula for each data entry.
Count the number of runs. You should get V* = 4 (4 observed runs)
Data |
A1-Median(A1:A10) |
Sign |
2 |
-1.5 |
-1 |
3 |
-0.5 |
-1 |
1 |
-2.5 |
-1 |
1 |
-2.5 |
-1 |
4 |
0.5 |
1 |
2 |
-1.5 |
-1 |
6 |
2.5 |
1 |
6 |
2.5 |
1 |
6 |
2.5 |
1 |
7 |
3.5 |
1 |
d) Calculate the P – value by using
Table 14 and determine if it is smaller than a.
e) Conclusion: State your results (Accept or Reject)
ASSIGNMENT: Do Exercises 14.37-14.39, and 14.41 in
your text.
(Exercises 14.38, 14.39, and 14.41 are on Student Suite CD)