Nonparametric Statistical Methods

Nonparametric Statistical Methods

Definition When the data is generated from process (model) that is known except for finite number of unknown parameters the model is called a parametric model. Otherwise, the model is called a non-parametric model Statistical techniques that assume a non-parametric model are called non-parametric.

For example If you assume that your data has come from a normal distribution with mean mand standard deviation s(both unknown) then the data is generated from process (model) that is known except for two of parameters.(mand s) The model is called a parametric model. Models that do not assume normality (or some other distribution with a finite no. of paramters) are non-parametric

We will consider two nonparametric tests • The sign test • Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

Nonparametric Statistical Methods

Single samplenonparametric tests for central location • The sign test • Wilcoxon’s signed rank test These are tests for the central location of a population. They are alternatives to the z-test and the t-test for the mean of a normal population

Both the z-test and the t-test assumes the data is coming from a normal population If the data is not coming from a normal population, properties of the z-test and the t-test that require this assumption will no longer be true. The probability of a type I error may be different than the desired value (0.05 or 0.01)

Single sample non parametric tests If the data is not coming from a normal population we should then use one of the two nonparametric tests • The sign test • Wilcoxon’s signed tank test These tests do not assume the data is coming from a normal population

The sign test A nonparametric test for the central location of a distribution

We want to test: H0: median = m0 against HA: median ≠m0 (or against a one-sided alternative)

The Sign test: • The test statistic: S = the number of observations that exceed m0 Comment: If H0: median =m0 is true we would expect 50% of the observations to be above m0, and 50% of the observations to be below m0,

If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. 50% 50% median = m0

If H 0 is not true then S will still have a binomial distribution. However p will not be equal to 0.50. m0 > median p< 0.50 p median m0

m0 < median p> 0.50 p median m0 p= the probability that an observation is greater than m0.

Summarizing: If H 0 is true then S will have a binomial distribution with p = 0.50, n = sample size. n = 10

The critical and acceptance region: n = 10 Choose the critical region so that a is close to 0.05 or 0.01. e. g. If critical region is {0,1,9,10} then a= .0010 + .0098 + .0098 +.0010 = .0216

e. g. If critical region is {0,1,2,8,9,10} then a= .0010 + .0098 +.0439+.0439+ .0098 +.0010 = .1094 n = 10

Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

The data

Suppose we want to test H0: the drug is not effective median reduction ≤ 0 against HA: the drug is effective median reduction > 0 The Sign test S = the no. of positive obs

The Sign test The test statistic S = the no. of positive obs = 8 We will use the p-value approach p-value = P[S ≥ 8] = 0.0439 + 0.0098 + 0.0010 = 0.0547 Since p-value > 0.05 we cannot reject H0

Summarizing: To carry out Sign Test We • Compute S = The # of observations greater than m0 • Let sobserved = the observed value of S. • Compute the p-value = P[S ≤sobserved] (2 P[S ≤sobserved] for a two-tailed test). Use the table for the binomial dist’n (p = ½ , n = sample size) • Conclude HA(Reject H0) if p-value is less than 0.05 (or 0.01).

Sign Test for Large Samples

If n is large we can use the Normal approximation to the Binomial. Namely S has a Binomial distribution with p = ½ and n = sample size. Hence for large n, S has approximately a Normal distribution with mean and standard deviation

Hence for large n,use as the test statistic (in place of S) Choose the critical region for z from the Standard Normal distribution. i.e. Reject H0 if z < -za/2 or z > za/2 two tailed ( a one tailed test can also be set up.

Nonparametric Confidence Intervals

Assume that the data, x1, x2, x3, … xn is a sample from an unknown distribution. Now arrange the data x1, x2, x3, … xn in increasing order x(1) < x(2) < x(3) < … < x(n) Hence x(1) = the smallest observation x(2) = the 2nd smallest observation x(n) = the largest observation

Consider the kth smallest observation and the kth largest observation in the data x1, x2, x3, … xn x(k) and x(n – k + 1) P[x(k) < median < x(n – k + 1) ] Hence = P[at least k observations lie below the median and at least k observations lie above the median ] If at least k observations lie below the median than x(k) < median If at least k observations lie above the median than median < x(n – k + 1)

Thus P[x(k) < median < x(n – k + 1) ] = P[at least k observations lie below the median and at least k observations lie above the median ] = P[The number of observations below the median is at least k and at most n-k] = P[k ≤S≤ n-k] where S = the number of observations below the median S has a binomial distribution with n = the sample size and p =1/2.

Hence P[x(k) < median < x(n – k + 1) ] = P[k ≤S≤n-k] = p(k) + p(k + 1) + … + p(n-k) = P where p(i)’sare binomial probabilities with n = the sample size and p =1/2. This means that x(k) to x(n – k + 1) is a P100% confidence interval for the median

Summarizing x(k) to x(n – k + 1) is a P100% confidence interval for the median where P = p(k) + p(k + 1) + … + p(n-k) and p(i)’sare binomial probabilities with n = the sample size and p =1/2.

n = 10 and k =2 Example: Binomial probabilities P = p(2) + p(3) + p(4) + p(5) + p(6) + p(7) + p(8)= .9784 Hence x(2) to x(9) is a 97.84% confidence interval for the median

Example Suppose that we are interested in determining if a new drug is effective in reducing cholesterol. Hence we administer the drug to n = 10 patients with high cholesterol and measure the reduction.

The data

The data arranged in order x(2) = -3 to x(9) =15 is a 97.84% confidence interval for the median

Example In the previous example to repeat the study with n = 20 patients with high cholesterol.

The data

The binomial distribution with n = 20, p = 0.5 Note: p(6) + p(7) + p(8) + p(9) + p(10) + p(11) + p(12) + p(13) + p(14) = 0.037 + 0.0739 + 0.1201 + 0.1602 + 0.1762 + 0.1602 + 0.1201 + 0.0739 + 0.037 = 0.9586 Hence x(6) to x(15) is a 95.86% confidence interval for the median reduction in cholesterol

The data arranged in order x(6) = -1 to x(15) = 9 is a 95.86% confidence interval for the median

For large values of n one can use the normal approximation to the Binomial to find the value of k so that x(k) to x(n – k + 1) is a 95% confidence interval for the median. i.e. we want to find k so that

The Wilcoxon signed rank test The Wilcoxon signed rank test is an alternative to the Sign test, a test for the central location of a single population Next we will consider:

The sign test A nonparametric test for the central location of a distribution

We want to test: H0: median = m0 against HA: median ≠m0 (or against a one-sided alternative)

The Sign test: • The test statistic: S = the number of observations that exceed m0 • Comment: If H0: median =m0 is true then • The distribution of S is binomial • n = sample size, • p = 0.50

To carry out the The Sign test: • Compute the test statistic: S = the number of observations that exceed m0 = sobserved Compute the p-value of test statistic, sobserved : p-value = P [S ≥ sobserved ] ( = 2 P [S ≥ sobserved ] for 2-tailed test) where S is binomial, n = sample size, p = 0.50 Reject H0 if p-value low (< 0.05)

Non-parametric confidence intervals for the median of a population x(k) to x(n – k + 1) is a (1 – a)100% = P100% confidence interval for the median where x(k) = kthsmallest xiand x(n – k + 1) = kthlargest xi P= p(k) + p(k + 1) + … + p(n-k) and p(i)’sare binomial probabilities with n = the sample size and p =1/2.

The Wilcoxon Signed Rank Test An Alternative to the sign test

Situation • A sample of size n , (x1 , x2 , … , xn) from an unknown distribution and we want to test H0 :the centre of the distribution, m= m0 , against HA:m≠m0 ,

For the sign test we would count S, the number of positive values of (x1 – m0 , x2 – m0 , … , xn – m0). • We would reject H0 if S was not close to n/2

Nonparametric Statistical Methods