160 likes | 343 Views
Chapter 4: More on Two Variable Data. Sec. 4.2 – Cautions about Correlation and Regression. Cautions about Correlation and Regression. Recall from chapter 3: T hat correlation and regression describe only linear relationships That c orrelation and the LSRL are not resistant
E N D
Chapter 4: More on Two Variable Data Sec. 4.2 – Cautions about Correlation and Regression
Cautions about Correlation and Regression • Recall from chapter 3: • That correlation and regression describe only linear relationships • That correlation and the LSRL are not resistant • One influential point or incorrectly entered data point can completely change the data. • Always plot your data before interpreting regression or correlation
Extrapolation • Extrapolation is the use of a regression line far outside the domain of values of the explanatory variable x that you used to obtain the line or curve. • Such predictions are not accurate • Example • Suppose that you have data on a child’s growth between the years 3 and 8. You find a strong linear relationship between age x and height y. If you fit a regression line to these data and use it to predict the child’s height at 25 years old you would predict them to be 8 feet tall • Don’t stray far from the domain of x that actually appears in your data
Lurking Variables • Sometimes the relationship between two variables is influenced by other variables that we did not measure or even think about • A lurking variable is a variable that is not among the explanatory or response variables in study and yet may influence the interpretation of relationships among those variables. • The relationship between two variables can be strongly influenced by lurking variables. • A lurking variable can falsely suggest a strong relationship between x and y or it can hide a relationship that is really there.
Lurking Variables • Because lurking variables are often unrecognized and unmeasured, detecting their effect is a challenge • Many lurking variables change systematically over time. • One method of detecting if time has an influence is to plot residuals and response variables against the time order if available.
The Question of Causation • In many studies of the relationship between two variables, the goal is to establish that changes in the explanatory variable cause changes in the response variable. • Even when a strong association is present, the conclusion that this association is due to a causal linking in the variables is often elusive.
Explaining Association • Strong Associations can generally be explained by one of three relationships. • 1. Causation • 2. Common Response • 3. Confounding • Variable x and y show a strong association (dashed line). This association may be the result of any of several causal relationships (solid arrow).
Causation: • x causes y • Confounding: • x may cause y, but y may instead be caused by a confounding variable z • CommonResponse: • x and y are reacting to a lurking variable z Explaining Association
Causation • Causation is not easily established. • The best evidence for causation comes from experiments that change x while holding all other factors fixed. • Even a very strong association between two variables is not by itself good evidence that there is a cause-and-effect link between the variables.
Examples of Direct Causation • The following relationships are examples of direct causation, but “causation” is not a simple idea. • Refer to p.233 for explanations 1. x =mother’s BMI y= daughter’s BMI 2. x = amount of saccharin in a rat’s diet y = count of tumors in the rat’s bladder
Common Response • Beware of lurking variables when thinking about an association between two variables. • The observed association between the variables x and y is explained by a lurking variable z. Both x and y change to changes in z. • This common response creates an association even though there may be no direct causal link between x and y.
Examples of common Response • The following relationships are examples of how common response can create an association. • Refer to p.233 for explanations 3. x = a high school senior’s SAT score y = the student’s first-year college GPA 4. x = monthly flow of money into stock mutual funds y = monthly rate of return for the stock market
Confounding • Two variables areconfoundedwhen their effects on a response variable cannot be distinguished from each other. The confounded variables may be either explanatory variables or lurking variables. • Confounding of several variables often prevents us from drawing conclusions about causation.
Examples of Confounding • The following relationships are examples of confounding • Refer to p.234 for explanations 5. x = whether a person regularly attends religious services y = how long the person lives 6. x = the number of years of education a worker has y = the worker’s income