730 likes | 900 Views
Exam. Exam starts two weeks from today. Amusing Statistics. Use what you know about normal distributions to evaluate this finding:.
E N D
Exam • Exam starts two weeks from today
Amusing Statistics • Use what you know about normal distributions to evaluate this finding: The study, published in Pediatrics, the journal of the American Academy of Pediatrics, found that among the 4,508 students in Grades 5-8ハwho participated, 36 per cent reported excellent school performance, 38 per cent reported good performance, 20 per cent said they were average performers, and 7 per cent said they performed below average.
Review • The Z-test is used to compare the mean of a sample to the mean of a population and
Review • The Z-score is normally distributed
Review • The Z-score is normally distributed • Thus the probability of obtaining any given Z-score by random sampling is given by the Z table
Review • We can likewise determine critical values for Z such that we would reject the null hypothesis if our computed Z-score exceeds these values • For alpha = .05: • Zcrit (one-tailed) = 1.64 • Zcrit (two-tailed) = 1.96
Confidence Intervals • A related question you might ask: • Suppose you’ve measured a mean and computed a standard error of that mean • What is the range of values such that there is a 95% chance of the population mean falling within that range?
Confidence Intervals • There is a 2.5% chance that the population mean is actually 1.96 standard errors more than the observed mean True mean? 2.5% 95% 1.96
Confidence Intervals • There is a 2.5% chance that the population mean is actually 1.96 standard errors less than the observed mean True mean? 2.5% 95% -1.96
Confidence Intervals • Thus there is a 95% chance that the true population mean falls within + or - 1.96 standard errors from a sample mean
Confidence Intervals • Thus there is a 95% chance that the true population mean falls within + or - 1.96 standard errors from a sample mean • Likewise, there is a 95% chance that the true population mean falls within + or - 1.96 standard deviations from a single measurement
Confidence Intervals • This is called the 95% confidence interval…and it is very useful • It works like significance bounds…if the 95% C.I. doesn’t include the mean of a population you’re comparing your sample to, then your sample is significantly different from that population
Confidence Intervals • Consider an example: • You measure the concentration of mercury in your backyard to be .009 mg/kg • The concentration of mercury in the Earth’s crust is .007 mg/kg. Let’s pretend that, when measured at many sites around the globe, the standard deviation is known to be .002 mg/kg
Confidence Intervals • The 95% confidence interval for this mercury measurement is
Confidence Intervals • This interval includes .007 mg/kg which, it turns out, is the mean concentration found in the earth’s crust in general • Thus you would conclude that your backyard isn’t artificially contaminated by mercury
Confidence Intervals • Imagine you take 25 samples from around Alberta and you found:
Confidence Intervals • Imagine you take 25 samples from around Alberta and you found: • .009 +/- (1.96 x .0004) = .008216 to .009784 • This interval doesn’t include the .007 mg/kg value for the earth’s crust so you would conclude that Alberta has an artificially elevated amount of mercury in the soil
Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05
Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05 • we say that we have a significant result…
Power • we perform a Z-test and determine that the difference between the mean of our sample and the mean of the population is not due to chance with a p < .05 • we say that we have a significant result… • but what if p is > .05?
Power • What are the two reasons why p comes out greater than .05?
Power • What are the two reasons why p comes out greater than .05? • Your experiment lacked Statistical Power and you made a Type II Error • The null hypothesis really is true
Power • Two approaches: • The Hopelessly Jaded Grad Student Solution • The Wise and Well Adjusted Professor Procedure
Power 1. Hopelessly Jaded Grad Student Solution - conclude that your hypothesis was wrong and go directly to the grad student pub
Power - This is not the recommended course of action
Power 2. The Wise Professor Procedure - consider the several reasons why you might not have detected a significant effect
Power - recommended by wise professors the world over
Power • Why might p be greater than .05 ? • Recall that: and
Power • Why might p be greater than .05 ? 1. Small effect size: • The effect doesn’t stand out from the variability in the data • You might be able to increase your effect size (e.g. with a larger dose or treatment) is quite close to the mean of the population
Power • Why might p be greater than .05 ? 2. Noisy Data • A large denominator will swamp the small effect • Take greater care to reduce measurement errors and therefore is quite large
Power • Why might p be greater than .05 ? 3. Sample Size is Too Small • A large denominator will swamp the small effect • Run more subjects is quite large because is small
Power • The solution in each case is more power:
Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data
Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data • It is the opposite of Type II Error rate
Power • The solution in each case is more power: • Power is like sensitivity - the ability to detect small effects in noisy data • It is the opposite of Type II Error rate • So that you know: there are equations for computing statistical power
Power • An important point about power and the null hypothesis: • Failing to reject the null hypothesis DOES NOT PROVE it to be true!!!
Power • Consider an example: • How to prove that smoking does not cause cancer: • enroll 2 people who smoke infrequently and use an antique X-Ray camera to look for cancer • Compare the mean cancer rate in your group (which will probably be zero) to the cancer rate in the population (which won’t be) with a Z-test
Power • Consider an example: • If p came out greater than .05, you still wouldn’t believe that smoking doesn’t cause cancer
Power • Consider an example: • If p came out greater than .05, you still wouldn’t believe that smoking doesn’t cause cancer • You will, however, often encounter statements such as “The study failed to find…” misinterpreted as “The study proved no effect of…”
Experimental Design • We’ve been using examples in which a single sample is compared to a population
Experimental Design • We’ve been using examples in which a single sample is compared to a population • Often we employ more sophisticated designs
Experimental Design • We’ve been using examples in which a single sample is compared to a population • Often we employ more sophisticated designs • What are some different ways you could run an experiment?
Experimental Design • Compare one mean to some value • Often that value is zero
Experimental Design • Compare one mean to some value • Often that value is zero • Compare two means to each other
Experimental Design • There are two general categories of comparing two (or more) means with each other
Experimental Design • Repeated Measures - also called “within-subjects” comparison • The same subjects are given pre- and post- measurements • e.g. before and after taking a drug to lower blood pressure • Powerful because variability between subjects is factored out • Note that pre- and post- scores are linked - we say that they are dependant • Note also that you could have multiple tests
Experimental Design • Problems with Repeated-Measure design: • Practice/Temporal effect - subjects get better/worse over time • The act of measuring might preclude further measurement - e.g. measuring brain size via surgery • Practice effect - subjects improve with repeated exposure to a procedure
Experimental Design 2. Between-Subjects Design • Subjects are randomly assigned to treatment groups - e.g. drug and placebo • Measurements are assumed to be statistically independent
Experimental Design 2. Problems with Between-Subjects design • Can be less powerful because variability between two groups of different subjects can look like a treatment effect • Often needs more subjects
Experimental Design • We’ll need some statistical tests that can compare: • One sample mean to a fixed value • Two dependent sample means to each other (within-subject) • Two independent sample means to each other (between-subject)