580 likes | 721 Views
HRP 223 - 2008. Topic 8 – Analysis of Means. One Categorical Predictor. Multiple Categorical Predictors. Unpaired samples ANOVA Paired samples Mixed Effects Models If data is not normally distributed There are spcialized statistics (Friedman’s test for 2 predictors).
E N D
HRP 223 - 2008 Topic 8 – Analysis of Means
Multiple Categorical Predictors • Unpaired samples • ANOVA • Paired samples • Mixed Effects Models • If data is not normally distributed • There are spcialized statistics (Friedman’s test for 2 predictors). • Try to transform into normally distributed.
Mean vs. Expected BMI It would be nice to see the actual Excel file.
Adds a link to source Adds a link to source and runs import wizard This gives instant access to the current state of the spreadsheet but it is bugged if you mix character and numeric data.
Prior to analysis, do all 3 plots. Histograms and box plots show outliers and bimodal data but are not ideal for assessing normality. The formal tests for normality are not great. They will not find problems with small samples and will declare problems with large samples.
. 3 . 1 . 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. normal distribution 2. skewed-to-the-right distribution 3. skewed-to-the-left distribution 4. heavy-tailed distribution 5. light-tailed distribution Image from: Statistics I: Introduction to ANOVA, Regression, and Logistic Regression: Course Notes. SAS Press 2008.
Inference 101 You only have one sample but you want to make inferences to the world. Given what you see in this sample, you can guess what the distribution of samples looks like around the null distribution.
If the population you are sampling from has a mean of 4, you will not observe a score of 4. How do you compare this sample vs. another with a mean of 5? Make a histogram of the means
.75/sqrt(1) = .75 Distributions of the Means .75/sqrt(5) = .34 .75/sqrt(25) = .15
Precision • Think of the “+/- something” imprecision in the estimates of the political polls. • You typically end up saying you are 95% sure you chose an interval that has the true value inside the range bracketed by the confidence limits (CLs). Either the population value is or is not in the interval between the lower and upper confidence limit, and if you repeated the process on many samples, 95% of such intervals would include the population value. • The 99% CI is wider (more accurate) than the more precise 95% CL.
Confidence Intervals from 10 Samples The unobservable truth You want to set the width of the interval so that in 95% of the experiments, the confidence interval includes the true value. In theory, you tweak the interval and increase or decrease the width. Axis with units showing your outcome
Benefits of CLs • You have information about the estimate's precision. • The width of the CI tells you about the degree of random error which is set by the confidence interval. • Wide intervals indicate poor precision. Plausible values could be across a broad range.
Estimation vs. Hypothesis Testing • P-value < .05 corresponds to a 95% CL that does not include the null hypothesis value. • CLs show uncertainty, or lack of precision, in the estimate of interest and thus convey more useful information than the p-value.
CLs vs. p-values 0 difference between groups or odds ratio of 1 null value P > .05 and the null value is inside of the confidence limits (CLs) Lower CL Upper CL Confidence interval null value Lower CL Upper CL P < .05 and the null value is not inside of the confidence limits Confidence interval
null value zone of clinical indifference Not statistically significant and not clinically interesting Not statistically significant, possibly clinically interesting Statistically significant but not clinically interesting Statistically significant and clinically interesting
Compare Two Teachers • Import the data • Describe the data • Assign the method as a classification variable • Do an unpaired T-test • Do a one-way ANOVA with the predictor having only two levels
SS total is the sum of the distance between each point and the overall mean line squared. SS error is the sum of the total squared distances between each point and the group mean lines.
Psoriasis Scores are arbitrary numbers 0 = < 0% response, 5 = 26-50% response, etc.