290 likes | 763 Views
Non-parametric. Statistics:. An Overview of Median Tests. Definition of Non-Parametric Statistics. Non-parametric statistics are a branch of statistics that are applied when populations are not normal, or there are severely skewed data. Titles of Non-parametric Tests. One Sample Median Test
E N D
Non-parametric Statistics: An Overview of Median Tests
Definition of Non-Parametric Statistics Non-parametric statistics are a branch of statistics that are applied when populations are not normal, or there are severely skewed data.
Titles of Non-parametric Tests • One Sample Median Test • Two Sample Location Test • Two Sample Dispersion Test • One-Way Layout • Independence Test
Focus: Median tests This presentation will cover: • What median tests are • Why they are used • When they are used • How they are used
What are median tests? • They are tests similar to the mean tests covered in a college introduction to statistics. • They include confidence intervals, and significance tests.
When to use a median test:(as opposed to a mean test) • When data or population does not fulfill conditions for mean tests. • The ONLY condition is a simple random sample!
Remember these conditions? • 30>n>15 with slight skewness • N>30 • Or population is normal They are NOT necessary!
Why do we use median tests? Because they are more robust!
Medians are more robust than means • The mean of these salaries is $109,000 • The median of these salaries is clearly between #7 and #8, or $32,500 • Just from looking at the list of salaries, the median seems to describe the middle of the distribution much more accurately, since salary #14 pulls the mean so far up
More robustness The rest of the procedure of the median test is more robust than the t-distribution. This combination of a robust statistic and robust procedure allows for statistical inference on very skewed data.
Confidence Intervals for MediansThe two main types: • Exact: needs tables and or computer software • Approximate: simpler tables, appropriate for larger samples We will concentrate on the approximations
Introduction to the Confidence Intervals It is necessary to understand “rank” The rank of a value in a distribution is simply its numbered place in the list of ordered values Example: in the distribution of letters {a, b, c, d, e, f} “b” has a rank of 2 from the left, and a rank of 5 from the right.
Steps for Approximate Confidence Intervals Order the distribution from smallest to largest values Find the median of the distribution. Find the rank* of each limit depending on the sample size from a table like the one shown on the next slide. Take the rank number and count in that many data points from each side of the ordered data. * Note that these ranks are computed by complicated formulas, then put neatly into a table for users, and treated like the definition of rank seen before.
1 2 3 Example: Using the same salary data from before, with sample size 14 and rank 3, proceed as follows This is the lower confidence limit of the interval This is the upper confidence limit of the interval So, the 95% confidence interval is ($23000, $60000) 3 2 1
Significance test for medians Remember duality? “What is not contained in the confidence interval is significant at the same alpha-level.” This property of confidence intervals can be used to test for significance.
Steps for Significance Test at alpha=.05 • Create a confidence interval at this alpha-level. • Check to see if the accepted population value is included in interval. • Draw Conclusion: • If value IS included sample is NOT significant • If value is NOT includedsample IS significant
Sample Significance Test Assume that the commonly accepted median of salaries at company A is $53,000, and that the sample shown before was drawn.
Test hypotheses • Ho: M=$53,000 or that the true median of salaries in company A is $53,000. • Ha: M≠$53,000 or that the true median of salaries in company A is NOT $53,000.
Our previous 95% confidence interval was ($23000, $60000), so: • the accepted median, $53,000, is within the interval, • The outcome is not significant, • We do not reject the accepted median.
Mean Tests VS. Median Tests • Consider a population of children, with a distribution of the number of toys each one has. • True mean Mu of 7.3 toys per child • True median M of 7 toys per child
2 SRS’s from the Population of Children Both look very similar. The only difference is the movement of one bar, to be a far out outlier. # of children # of toys # of children # of toys
Sample 1: 95% Mean Confidence Interval Sample 1, with no outlier • Sample mean x-bar=7.1 toys • Sample standard deviation=1.9877 • Sample size n=28 • Sigma of x-bar=1.9877/√28=.3756 • Z-score z*=1.95996 • CI: 7.1+/-(1.95996*.3756): (6.358, 7.842) (use calculator 1-var stats)
Sample 1: 95% Median Confidence Interval Sample 1, with no outlier • Sample median=7 toys • Sample size n=28 • Rank (see table) =9 • Lower confidence limit=6 • Upper confidence limit=7 • CI: (6, 7)
Sample 2: 95% Mean Confidence Interval Sample 2, with outlier • Sample mean x-bar=8.4 toys • Sample standard deviation=4.8722 • Sample size n=28 • Sigma of x-bar=4.8722/√28=.9208 • Z-score z*=1.95996 • CI: 8.4+/-(1.95996*.9208): (6.595, 10.205) (use calculator 1-var stats)
Sample 2: 95% Median Confidence Interval Sample 2, with outlier • Sample median=7 toys • Sample size n=28 • Rank (see table) =9 • Lower confidence limit=6 • Upper confidence limit=7 • CI: (6, 7) These statistics match up EXACTLY with the median CI for the first sample. The outlier did not affect the outcome, demonstrating the test’s robustness.
Comparison of different intervals Median CI (6,7) Mean CI (6.358, 7.842) Sample 1 Median CI (6, 7) Mean CI (6.595, 10.205) Sample 2
Discussion of differences • The outlier pulled the mean confidence interval to be much larger, making it less useful • The median interval stayed the same, and capture the true median very closely (as 7 is captured from 6 to 7)
Conclusion When data is skewed, a median test can be much more useful than a mean test in estimating the true parameter.