160 likes | 264 Views
Quiz 8 Today HW 7 Due 5:00 PM Bonus E Due 5:00 PM Projected Schedule: Today: Normal Quantile & Rank Sum R Oct. 26 - Intro to F, One Way ANOVA & Tukey Procedure. T Oct. 31 - Blocked ANOVA & Two Way ANOVA R Nov. 2- Review day T Nov. 7 - EXAM II Exam - All material since EXAM I Pencil
E N D
Quiz 8 Today HW 7 Due 5:00 PM Bonus E Due 5:00 PM Projected Schedule: Today: Normal Quantile & Rank Sum R Oct. 26 - Intro to F, One Way ANOVA & Tukey Procedure T Oct. 31 - Blocked ANOVA & Two Way ANOVA R Nov. 2- Review day T Nov. 7 - EXAM II Exam - All material since EXAM I Pencil Scantron Reference Page Announcements
Two Oversights • We have shown that for large samples (n > 30) the sampling distribution of the sample mean is normal & substituting the sample SD for the pop SD leads to the t-distribution • The above result holds for samples of any size from a normal population as well • What about small samples from non-normal populations? • How do we tell if a population is normal?
Is it normal? • A normal quantile plot can help answer whether or not a population is normal. • (x,y) points are plotted where x is the “expected value” from a normal population and y is the observed value • If the population is normal, our sample should have a linear plot!
Is it normal? • Recall - our plot is based on a sample - so it won’t be exactly linear even if it is normal • However, obvious departure from normality is noticed in curvature of the plot • For example, this is right-skewed data (heavy right tail)
More Examples • This is a left-skewed data set - it has a long left tail • The curvature is obvious & the points are below the line on both ends
More Examples • This is symmetric with heavier tails than the normal • The points at the left are below the line and the points to the right are above it
One More Case • This data set came from a symmetric population with tails that are lighter than the normal curve • Points are the left are above the line and points to the right are below it
Tire Example Using the tire data from HW 7 Symmetric - no outliers Light tailed Our method is conservative We’re OK
When t based procedures are OK • The t-based procedures will work for large samples (n>30) from ANY population. • The t-based procedures will work for small samples from normal populations. • The t-based procedures are conservative (Type I probability is actually lower than a) for symmetric light-tailed populations. This is OK as well. • The t-based procedures fail for small samples taken from skewed or heavy tailed populations.
Rank-Sum Procedure • The rank-sum procedure was developed as an alternative to the two-sample t test & CI • The rank-sum procedure works for any type of population as long as the two population histograms have the same shape • Works for any sample size, but the two sample t should be used when both samples are large
A new method is proposed to develop instant film New method is supposedly faster than presently used method Wish to compare two methods for developing instant film (Polaroid) H0: gnew-gold >= 0, a = 0.05 (g = median) Take 8 pictures using old method Times (sec): 8.6, 5.1, 4.5, 5.4, 6.3, 6.6, 5.7, 8.5 8 pictures with new method: 5.5, 4.0, 3.8, 6.0, 5.8, 4.9, 7.0, 5.7 Rank Sum Example
Rank Sum Example • Think of the two samples as one sample and rank the values • Lowest observation gets rank 1 • Ties get the average of what would have been their ranks • Add up the ranks for each group • This is the rank sum for the group • A low rank-sum for the new process supports HA in this case • Is it low enough to reject the null hypothesis? • There are tables for comparison
StataQuest can compute a p-value Enter the data with one column having the group number and one column having the times Go to Statistics: Nonparametric Tests: Mann-Whitney Choose the group and data variable appropriately process | obs rank sum expected ---------+--------------------------------- 0 | 8 78.5 68 1 | 8 57.5 68 ---------+--------------------------------- combined | 16 136 136 unadjusted variance 90.67 adjustment for ties -0.13 ---------- adjusted variance 90.53 Ho: median time(process==0) = median time(process==1) z = 1.104 Prob > |z| = 0.2698 SQ gives two-tailed p-value, we want a one-tailed so we divide SQ’s p-value by two Our p-value = 0.1349 Rank Sum Example
Developed t-based procedures for any sample size from a normal pop or large samples from any type of pop Needed to determine if a population is normal based on sample This lead to the normal quantile plot Needed to develop procedures for small samples from non-normal populations This lead to the rank-sum procedure Review and Preview
Review and Preview • Next time, we discuss comparing means from more than two populations • This will involve finding sums of squares and the F-distribution
Normal Quantiles in SQ • After entering (or opening) the data, go to Graphs: One Variable: Normal Quantile Plot • Choose the variable of interest and click OK