160 likes | 309 Views
AP Statistics. 4.2 Cautions about Correlation and Regression. Learning Objective:. Understand Causation Differentiate between causation, common response, and confounding variables. Correlation and regression describe the relationship between two variables, but they have limitations:
E N D
AP Statistics 4.2 Cautions about Correlation and Regression
Learning Objective: • Understand Causation • Differentiate between causation, common response, and confounding variables
Correlation and regression describe the relationship between two variables, but they have limitations: • Correlation and regression describe only linear relationships. • The correlation and least-squares regression line are not resistant. (one influential observation or incorrectly entered data point can greatly change these measures)
Other things to keep in mind. • Extrapolation- making predictions outside our domain of values • Lurking Variables- when the relationship between 2 variables are affected by outside variables.
Ex 1: Studies show that men who complain of chest pain are more likely to get detailed tests and aggressive treatment such as bypass surgery than are women with similar complaints. Is this association between gender and treatment due to discrimination?
Ex 2: A study of housing conditions in the city, measured a large number of variables for each of the wards in the city. Two of the variables were a measure of x of overcrowding and a measure y of the lack of indoor toilets. Because x and y are both measures of inadequate housing, we expect a high correlation. In fact the correlation was only r=0.08. How can this be?
Ex1: The math department of a university must plan the number of sections of elementary courses. We want to see if we can predict this from the number of 1st year students, which is already known. Year 1993 1994 1995 1996 1997 1998 1999 2000 X 4595 4827 4427 4258 3995 4330 4265 4351 Y 7364 7547 7099 6894 6572 7156 7232 7450 • (x=the # of first year students; y= the number of students who enroll in elementary classes) • Why would we have reservations about using this data to make prediction
Using averaged data • Many regression or correlation studies work with averages or other measures that combine information from many individuals. • ***Do not apply the results of such studies to individuals. • Ex: Relationship between outside temperature and natural gas consumption.
Ex: A study shows a positive correlation between the size of a hospital(measured by its number of beds x) and the median number of days y that a patient remains in the hospital. Does this mean that you can shorten a hospital stay by choosing a small hospital?
The Question of Causation • In studies of the relationship between two variables, the goal is to show that changes in the response variable are caused by changes in the explanatory variable. • Even when there is a strong association, the conclusion that this is due to a causal link between the two variables is often elusive.
1- common response- an outside variable that affects both x and y. • 2- confounding- z is a confounding variable. We don’t know if the change in y was due to x or if it was because of z .
Explaining association: causation Change in x, causes a change in y
How do we explain this? • The # of times you brush your teeth is a confounding variable. We don’t know if the number of cavities you have is because you ate a lot of apples or because you brushed your teeth a lot.
Establishing causation • How can a direct causal link between x and y be established? Do an experiment (experiments control lurking variables)