430 likes | 669 Views
Stats Questions We Are Often Asked. Stats questions we are often asked. When can I use r and R 2 ? When can I make a ‘causal-type’ claim? Why should I be careful with a media reported margin of error? When can I say a confidence interval gives support to a claim?.
E N D
Stats questions we are often asked • When can I use r and R2? • When can I make a ‘causal-type’ claim? • Why should I be careful with a mediareported margin of error? • When can I say a confidence interval gives support to a claim?
Stats questions we are often asked • When can I use r and R2? • When can I make a ‘causal-type’ claim? • Why should I be careful with a mediareported margin of error? • When can I say a confidence interval gives support to a claim?
r – little r – what is it? • r is the correlation coefficient between y and x • r measures the strength of a linear relationship • r is a multiple of the slope
* * * * * * * * y * * * * * * * * * * * * x r – when can it be used? • Only use r if the scatter plot is linear • Don’t use r if the scatter plot is non-linear! r = 0.99
* * * r = 0.57 r = 0.99 * * * * * * * * * * * * * * * y * y * * * * * * * * * * * * * * * * * * * * * x x r– what does it tell you? • How close the points in the scatter plot come to lying on the line
* * * * * * * * * * * * * * * * * * y y * * * * * * * * * * * * * * * * * * * * * * x x R2 – big R2– what is it? • R2 is the coefficient of determination • Measures how close the points in the scatter plot come to lying on the fitted lineor curve
* * * * * * * * * * * * * * * * * * y y * * * * * * * * * * * * * * * * * * * * * * x x R2 – big R2– when can it be used? • When the scatter plot of y versus x is linear or non-linear
y Dotplot of the y’s Shows the variation in the y’s ˆ y x ˆ Dotplot of the y’s Shows the variation in the y’s ˆ x R2– what does it tell you?
ˆ Variation in the y’s This amount of variation can be explained by the model ˆ y y ˆ Variation iny's Variation in fitted values = 2 = R Variation in y values Variation in y's x R2– what does it tell you? We see some additional variation in the y’s. The excess is not explained by the model.
R2 – what does it tell you? • When expressed as a percentage, R2 is the percentage of the variation in Y that our regression model can explain • R2near 100% model fits well • R2 near 0% model doesn’t fit well
* * * * * * * * * * y * * * * * * * * * * x R2 – what does it tell you? • 90% of the variation in Y is explained by our regression model. R2 = 90%
R2 – pearls of wisdom! • R2 and r 2 have the same value ONLY when using a linear model • DON’T use R2 to pick your model • Use your eyes!
N Z Herald (04/10/2005) Damaged for life by too much TV
Damaged for life by too much TV Causal relationship? r = - 0.93 Health Score TV watching
Causal relationships • Two general types of studies: experiments and observational studies • In an experiment, the experimenter determines which experimental units receive which treatments.
Damaged for life by too much TV Causal relationship? r = - 0.93 Health Score TV watching
Causal relationships • Two general types of studies: experiments and observational studies • In an experiment, the experimenter determines which experimental units receive which treatments. • In an observational study, we simply compare units that happen to have received different levels of the factor of interest.
Causal relationships • Only well designed and carefully executed experiments can reliably demonstrate causation. • An observational study is often useful for identifying possible causes of effects, but it cannot reliably establish causation
Causal relationships - Summary • In observational studies, strong relationships are not necessarily causal relationships. • Correlation does not imply causation. • Be aware of the possibility of lurking variables.
Margin of Error Sunday Star Times: National 44% Labour 37.2% NZ First 4.7% margin of error: 4.4% (n = 540) Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)
Margin of Error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)
Margin of Error Confidence Interval: estimate ± margin of error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)
Margin of Error Survey Errors Sampling Error Nonsampling Errors
Margin of Error Survey Errors Sampling Error Nonsampling Errors • caused by the act of sampling • has potential to be bigger in smaller samples • can determine how large it can be – margin of error • unavoidable (price of sampling)
Margin of Error Survey Errors Sampling Error Nonsampling Errors • e.g., nonresponse bias, behavioural, . . . • can be much larger than sampling errors • impossible to correct for after completion of survey • impossible to determine how badly they affect results
Margin of Error Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)
Margin of Error Approx. 95% confidence interval for p:
Margin of Error Margin of error(single proportion)
Margin of Error Sunday Star Times: National 44% Labour 37.2% NZ First 4.7% margin of error: 4.4% (n = 540) Herald on Sunday: Labour 42% National 38.5% NZ First 5.5% margin of error: 4.9% (n = 400)
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: –9.8 to 6.6
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7mA – mW: –9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Canterbury customers is somewhere between 0.5 and 20.7 larger than the mean dissatisfaction score for Auckland customers.
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7mA – mW: –9.8 to 6.6 With 95% confidence,the mean dissatisfaction score for Canterbury customersis somewhere between 0.5 and 20.7 larger thanthe mean dissatisfaction score for Auckland customers.
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: –9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: –9.8 to 6.6 With 95% confidence, the mean dissatisfaction score for Auckland customers is somewhere between 9.8 less than and 6.6 greater than the mean dissatisfaction score for Wellington customers.
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: –9.8 to 6.6 Does this confidence interval support the proposition that there is a difference between the two population means? Supports mA – mW 0? No, it doesn’t support the proposition. Since 0 is in the confidence interval, then 0 is a believable value for the difference. There could be no difference between the two means. mA – mW= 0 (no diff) mA – mW 0 (a diff)
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7 mA – mW: –9.8 to 6.6 Does this confidence interval support the proposition that there is NO difference between the two population means? Supports mA – mW= 0? No, it doesn’t support the proposition. Since there are non-zero numbers in the intervalmA – mW could be non-zero, there could be a difference. mA – mW= 0 (no diff) mA – mW 0 (a diff)
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7mA – mW: –9.8 to 6.6 Does this confidence interval support the proposition that there is a difference between the two population means? Supports mA – mW 0? Yes, it does support the proposition. Since zero is not in the interval, it is not believable that the difference is zero. No difference between the means is not believable. mA – mW= 0 (no diff) mA – mW 0 (a diff)
Bank Dissatisfaction Scores – 95% CIs mC – mA: 0.5 to 20.7mA – mW: –9.8 to 6.6 Does this confidence interval support the proposition that there is NO difference between the two population means? Supports mA – mW= 0? No, it doesn’t support the proposition. In fact, it provides evidence against it. 0 is not in the interval. No difference between the means is not believable. mA – mW= 0 (no diff) mA – mW 0 (a diff)