470 likes | 615 Views
Inference. Bootstrapping for comparisons. Outcomes. Understand the bootstrapping process for construction of a formal confidence interval for a comparison situation. Understand the requirements of an investigative question, confidence interval interpretation, and conclusion at Level 8.
E N D
Inference Bootstrapping for comparisons
Outcomes • Understand the bootstrapping process for construction of a formal confidence interval for a comparison situation. • Understand the requirements of an investigative question, confidence interval interpretation, and conclusion at Level 8.
People with Higher IQ are Shunning Internet Explorer • Read the article and highlight anything you might find suspicious.
Anatomy of a hoax • Now read this article
How could you conduct an experiment that would determine whether or not there was any truth in the first article?
The Doozer Construction Council (DCC) is looking at adding a height restriction to their workers to enhance doozer safety on the construction sites. Being a fair-minded council, the DCC want to ensure that the height restriction is not going to disadvantage female doozers in any way. The DCC have employed you to analyse a random sample of doozer heights to look at their concerns.
Comparison situation 2 Recently, Sprocket has taken to chasing doozers. Doc is quite concerned by this sudden change in Sprocket’s behaviour and wonders if the height of the doozers he is chasing has anything to do with why he is chasing them. Doc has collected information on a random sample of doozers, including their height and whether they had been chased by Sprocket in the last month. Doc has asked you to look at this data and let him know if it highlights any concerns.
Key Components to an Investigative Question – a good investigative question should include the following: • VARIABLE being examined • GROUPS being compared • POPULATION inferences are being made about • STATISTIC being estimated
VARIABLE being examined • Comparison situation 1 • Heights of Doozers in Fraggle Rock • Comparison situation 2 • Heights of Doozers in Fraggle Rock
GROUPS being compared Comparison situation 1 • Heights of male and female Doozersin Fraggle Rock Comparison situation 2 • Heights of Doozers in FraggleRock that have been chased by Sprocket and the heights of Doozers that have not been chased in the last month
POPULATION inferences are being made about Comparison situation 1 • Difference in mean heights of male and female Doozersin Fraggle Rock Comparison situation 2 • Difference in mean heights of Doozers in FraggleRock that have been chased by Sprocket and the heights of Doozers that have not been chased in the last month
STATISTIC being estimated • Comparison situation 1 • Mean/median heights of male and female Doozersin Fraggle Rock • Comparison situation 2 • Mean/median heights of Doozers in FraggleRock that have been chased by Sprocket and the heights of Doozers that have not been chased in the last month
I wonder what is the difference between the mean/median height of population B and the mean/medianheight of population A. I think the mean height of population B will be bigger because…..
Key Points • The distribution of re-sample means (the bootstrap distribution) is similar to the distribution of means from repeated random sampling. • Therefore we can use the bootstrap distribution to model the sampling variability in our data, and base our confidence interval on this bootstrap distribution.
We want to create a bootstrap confidence interval for the difference between mean heights of female doozers and mean heights of male doozers. • We want to create a bootstrap confidence interval for the difference in mean heights between the doozers chased by Sprocket and the doozers not chased by Sprocket
Key Components of any Confidence Interval Interpretation • Strong link between the sample and the population • Level of uncertainty evident eg pretty sure • Population parameter identified • Context clearly identified
Key Components of any Confidence Interval Interpretation • Strong link between the sample and the population • Level of uncertainty evident eg pretty sure • Population parameter identified • Context clearly identified • “I’m pretty sure that the mean height difference between male and female doozers at Fraggle Rock is somewhere between -0.59 mm and 2.18 mm”
Key Components of any Confidence Interval Interpretation • Strong link between the sample and the population • Level of uncertainty evident eg pretty sure • Population parameter identified • Context clearly identified • “I’m pretty sure that the mean height difference between male and female doozers at Fraggle Rock is somewhere between -0.59 mm and 2.18 mm”
Key Components of any Confidence Interval Interpretation • Strong link between the sample and the population • Level of uncertainty evident eg pretty sure • Population parameter identified • Context clearly identified • “I’m pretty sure that the mean height difference between male and female doozers at Fraggle Rock is somewhere between -0.59 mm and 2.18 mm”
Key Components of any Confidence Interval Interpretation • Strong link between the sample and the population • Level of uncertainty evident eg pretty sure • Population parameter identified • Context clearly identified • “I’m pretty sure that the mean height difference between male and female doozers at Fraggle Rock is somewhere between -0.59 mm and 2.18 mm”
Key Components of any Confidence Interval Interpretation - Justification • “I’m pretty sure that the mean height of male doozers is between 2.18 mm taller and 0.59 mm shorter than female doozers at Fraggle Rock because the bootstrap interval has both positive and negative values.”
“It is very likely that the mean height of the doozers not chased by Sprocket is between 1.05mm and 4.28mm more than the mean height of Doozers chased by Sprocket at Fraggle Rock.”
I can make the call that there is a difference between the median height of population A and the median height of population Bif all values in theconfidence intervalare positiveORif all values in theconfidence intervalare negative. In addition, we can tell the direction of the difference by whether the confidence interval is completely positive, or completely negative.
I cannot make the call that there is a difference between the median heights of population A and population Bif the confidence interval contains bothnegative and positivevalues • Refer to positive and negative values and NOT that zero is in the interval
I am pretty sure that the median height difference of population B lies between 0.86 and 1.96 units more than the median height of population A
My call is…the median height of population B is bigger than the median height of population A. I can make this call because the sample median height for population B is (outside the overlap) greater than the sample upper quartile height for population A. I can also make this call because all the values in the bootstrap confidence interval (0.86, 1.96) are positive. I’m pretty sure the median height of population B is somewhere between 0.86 and 1.96 cm bigger than themedian height of population A.
It is likely that the median height of population B lies between 0.4 and 1.57 units more than the median height of population A.
My call is… the median height of population B is bigger than the median height of population A. I can make this call because the distance between the sample median heights is greater than about 1/3 of the overall visible spread. I can also make this call because all the values in the bootstrap confidence interval (0.40, 1.57) are positive. I’m pretty sure the median height of population B is somewhere between 0.40 and 1.57 cm bigger than median height of population A.
My call is… the median height of population B is bigger than the median height of population A. I can make this call because all the values in the bootstrap confidence interval (0.13, 1.19) are positive. I’m pretty sure the median height of population B is somewhere between 0.13 and 1.19 cm bigger than thepopulation median for A.
I am pretty sure that the median height of population B is 0.86 units more or 0.19 units less than the median height of population A
I am not prepared to make a call as to whether population B’s median height is bigger because there are both negative and positive values in the confidence interval (-0.19, 0.86). The median height of population B could be bigger than, or it could be smaller than, or it could even be the same as the median height of population A.
I am not prepared to make a call about the difference between the heights of populations A and B because there are both negative and positive values in the confidence interval (-0.52, 0.54). The median height of population B could be bigger than, or it could be smaller than, or it could even be the same as the median height of population A.