270 likes | 373 Views
Introduction to Hypothesis Testing. Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University http://stat.tamu.edu/~carroll. Outline. Series of Examples Data Collection for Examples. Example #1.
E N D
Introduction to Hypothesis Testing Raymond J. Carroll Department of Statistics Faculty of Nutrition Texas A&M University http://stat.tamu.edu/~carroll
Outline • Series of Examples • Data Collection for Examples
Example #1 • My Hypothesis: Texas A&M Students simply guess when they are asked whether they are drinking diet Pepsi or diet Coke • The Experiment: Blind taste test. You are asked which cup you drink is diet Coke • Our Goal: Test this hypothesis, using statistical principles and probability statements
A Warning • Yes or No: No statistician will ever answer a question “yes” or “no” • Probabilities: We always say things like “the chance is less than 5% that your hypothesis is correct”
Example #1 • Data Model: The data model is • Normal? • Gamma? • Binomial? • Poisson?
Example #1 • Data Model: The data model is • Normal? • Gamma? • Binomial? Because each outcome is yes or no, success or failure • Poisson?
Example #1 • My Hypothesis in terms of population parameters: I have claimed that you can do no better than guess • Each of you is a Binomial(1,p) or Binomial(1,p) • When I say you are guessing, what am I saying about the population?
Example #1 • My Hypothesis in terms of population parameters: I have claimed that you can do no better than guess • Each of you is a Binomial(1,p) or Binomial(1,p) • When I say you are guessing, what am I saying about the population? • That the proportion of successes is p = p = ½
Example #2 • My Hypotheses: Keebler used to advertise • 17 chocolate chip per cookie • More chocolate chips than another brand • The Experiment: Get a cookie of each type, count the number of chips, criticize the experiment • Our Goal: Test these hypotheses, using statistical principles and probability statements
Example #2 • Data Model: The data model is • Normal? • Gamma? • Binomial? • Poisson?
Example #2 • Data Model:The data model is • Normal? • Gamma? • Binomial? • Poisson? • It could be Poisson or normal. Poisson is the better choice, because it is a count • We’ll use the central limit theorem to make inferences
Example #2 • My Hypothesis in terms of population parameters: Keebler has claimed that it gives you 17 chips per cookie, on average • Each of you is a Poisson with mean l • When I say Keebler is correct, what am I saying about the population?
Example #2 • My Hypothesis in terms of population parameters: Keebler has claimed that it gives you 17 chips per cookie, on average • Each of you is a Poisson with mean l • When I say Keebler is correct, what am I saying about the population? • That the population mean number of chips is 17
Example #3 • My Hypotheses: The percentage of regular M&M’s that are green is the same as the percentage of peanut M&M’s that are green • The Experiment: Compute the percentage of green M&M’s in each bag • Our Goal: Test these hypotheses, using statistical principles and probability statements
Example #3 • Data Model: The data model is • Normal? • Gamma? • Binomial? • Poisson?
Example #3 • Data Model:The data model is • Normal? • Gamma? • Binomial? • Poisson? • Roughly normal, since each data point is a percentage • We’ll use the central limit theorem to make inferences
Example #3 • My Hypothesis in terms of population parameters: The %-green M&M’s does not depend on the type of M&M’s • What am I saying about the two populations?
Example #3 • My Hypothesis in terms of population parameters: The %-green M&M’s does not depend on the type of M&M’s • What am I saying about the two populations? • That they have the same population mean.
Example #4 • My Hypotheses: Women who keep track of their diet by diaries or PDA do not lower their caloric intake in a 6-day period • The Experiment: The WISH Study at the National Cancer Institute, with 400 women • The data appear to contradict my hypothesis
Typical (Median) Values of Reported Caloric Intake Over 6 Diary Days: WISH Study A major point of STAT211 is to prepare you to answer the question as to whether these data, which look convincing, really are convincing in terms of probability statements.
Example #4 • Data Model: The data model is • Normal? • Gamma? • Binomial? • Poisson?
Example #4 • Data Model:The data model is • Normal? • Gamma? • Binomial? • Poisson? • Lognormal, so most people take logarithms of caloric intake and analyze them as normal
Example #4 • Data Model: The data that we use is the difference between Day 1 and Day 6, i.e., Day 1 – Day 6
Example #4 • My Hypothesis in terms of population parameters: • What am I saying about the population, when I claim that writing down diets will not lead to a change in reported caloric intake?
Example #4 • My Hypothesis in terms of population parameters: • What am I saying about the population, when I claim that writing down diets will not lead to a change in reported caloric intake? • That the population mean difference between Day 1 and Day 6 = 0
Some Final Comments • Formulating statistical hypothesis testing is really intuitive • Don’t let the formulae obscure the fact that all we are doing is • Asking questions about population parameters • Constructing confidence intervals for population parameters • Using these confidence intervals to answer the question
The WISH Data • I computed a 99% confidence interval for the population mean change in the WISH data. • This interval was entirely above 0, and ranged roughly from 75 to 375 • In other words, with 99% confidence, Day 1 reported between 75 and 375 more calories than Day 6. • Is the hypothesis true?