200 likes | 446 Views
Contact Information. James AbbeyEmail: jdabbey@iastate.edu Website: www.public.iastate.edu/~jdabbey. Website. Address: www.public.iastate.edu/~jdabbeyJMP Tutorials and DataLarge number of high resolution videos for JMP operationsData set in JMP format for HWNew Site for the On-Campus LabLe
E N D
1. Stat 401 Lab 5 James D. Abbey
Iowa State University
2. Contact Information James Abbey
Email: jdabbey@iastate.edu
Website: www.public.iastate.edu/~jdabbey
3. Website Address: www.public.iastate.edu/~jdabbey
JMP Tutorials and Data
Large number of high resolution videos for JMP operations
Data set in JMP format for HW
New Site for the On-Campus Lab
Lecture notes and other useful handouts
Check for material to print before coming to Lab
4. Homework Schedule for today
Homework 4 topics and examples
Homework 3 review
Homework 2 review
Questions? Ask away!
5. Homework 4 Topics Know the difference between observational studies and (randomized) experiments
Assignment of units to groups/treatments makes an experiment
Inference of causality only when randomized
Selection from a population (note that we are observing traits of the population)
Inference back to the population of interest only when randomly sampled
See text pg. 9 for the graphic of randomization types
6. Homework 4 Summary Transformations
Why transform? To meet assumptions of normality and equal variance. See pages 57-74 for an in-depth discussion.
Note that the log(mean) is not the mean of the log(values)
Beware of transforming data that is already normal and has equal variance!
7. Homework 4 Summary Take the data sets 1, 2, 3, 50 and 20, 25, 35, 500
Mean1: 14
Mean2: 145
Log values
Data set 1: 0, 0.69314, 1.0986, 3.91202
Mean: 1.4259 vs. Log(14) = 2.63905
Data set 2: 2.9957, 3.2188, 3.5553, 6.2146
Mean: 3.9961 vs. Log (145) = 4.9767
Thus, we see that the log(mean) is not the same as the mean of the log(values)
8. Why transform?
Original Data
Data 1 Data 2
Too much skew!
9. Transformed Data
Data 1 Data 2
Not ideal, but closer to equal variance and normality. This data may have needed a stronger transformation.
10. Homework 4 Summary So, how do we interpret the log values?
Since the log(mean) is not equal to the mean of log(values), we cannot simply back-transform to our original units
However, we can still get a useful result
As the book states on pages 68-73, we actually have a median ratio estimator when comparing groups
11. Homework 4 Summary Results on this data set from JMP
To get a useful result, we take the exponent of the values (e^value or exp(value)).
12. Homework 4 Summary Take log data set 2 – log data set 1
Summary Numbers
Mean Difference: 2.57016
Sp = 1.6113
SE of the difference = 1.139
t-value: 2.447 for a 95% CI (6 df at 0.975 quantile)
See pages 38-41 for formulas
13. Homework 4 Summary Finally, we get our estimates
Mean Difference: 2.57016
95% CI: 2.57016 +/- (2.447 * 1.139)
(-0.216973, 5.357293)
Back-transforming ? Median Ratio
Exp(2.57016) = 13.0697 (estimate of the ratio)
Exp(-0.216973) = 0.8049517
Exp(5.358293) = 212.149
So, we are 95% confident that the median of data set 2 is between 0.804 and 212.149 times as great as the median of data set 1
14. Homework 4 Summary Text References
Sp, SE and CI on pages 38-41
Discussion of back transformation on pages 68-73. Pay close attention to display 3.9 on page 71.
15. Homework 3 Summary Hypothesis testing
The p-value. See pages 46-47.
A small p-value indicates that Ho, our default reality, is unlikely. Hence, if the p-value is small enough, we reject Ho.
A large p-value means that Ho is not an unlikely event, at least statistically.
Possible Results:
Fail to reject Ho (we do NOT accept Ho)
Reject Ho in favor of Ha or find strong evidence against Ho
16. Hypothesis Tests General Notes
17. Homework 3 Topics Randomization Distributions
We have two samples
After treatment A, we observed values 1, 2, 3
After treatment B, we observed values 4, 5, 6
So, the mean difference is (1+2+3)/3 – (4+5+6)/3 = 2 – 5 = -3
Is this a common value? How many ways could these samples have appeared if there is no effect due to a treatment?
18. Homework 3 Topics Randomization Continued
We now pool all the values in a jar. From this jar, we draw samples assuming that all results could happen for either treatment (e.g., the treatment does not affect the outcome).
Draw 1, 5, 6 for treatment A, which leaves 2, 3, 4 for treatment B. New difference is (1+5+6)/3 – (2+3+4)/3 = 2/3. Repeat this until you exhaust all the possibilities. In this example, we have only 20 total ways to draw the samples.
How likely is a value as or more extreme than the one we observed? In other words, how many samples have values as or more extreme than -3 or 3? The p-value is (# as or more extreme) / (total possible).
See pages 11-14, 44-46 and 95-98.
19. Homework 2 Summary Standard deviation and standard error
Know the distinction of a sample distribution vs. a sampling distribution (see pages 29-40 for an extensive discussion)
Sample distribution comes from a sample
Associated with a standard deviation
Sampling distribution is a theoretical device
The standard error is the measure of spread for the sampling distribution. Measure the spread of estimates of the sample mean y-bar.
20. Homework 2 Summary The five number Summary
Want 5 numbers with mean and median 9 and 0 standard deviation? 9,9,9,9,9
Extreme Values
The mean is heavily impacted by outliers/large values. The median is resistant.
Backing out information from a CI:
21. Homework 2 Summary See the above p-value discussion
Remember, a small p-value ? evidence against Ho
Also, see the above null and alternative hypothesis discussion
Do we ever “accept” Ho?
Finally, you need to understand experimental vs. observational studies. Review the slide above if necessary.