1 / 32

Section 3.4 (Agresti & Franklin)

Section 3.4 (Agresti & Franklin). What are some cautions in Analyzing Associations?. Extrapolation is dangerous. Example: Mean temperature in the United States, 1895 - 2003. Extrapolation is Dangerous.

yuval
Download Presentation

Section 3.4 (Agresti & Franklin)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Section 3.4 (Agresti & Franklin) What are some cautions in Analyzing Associations?

  2. Extrapolation is dangerous Example: Mean temperature in the United States, 1895 - 2003

  3. Extrapolation is Dangerous • Extrapolation refers to using a regression line to predict y-values for x-values outside the observed range of data. • The further we move from the observed data, the riskier extrapolation becomes.

  4. Mean U.S. temperatures 1895-2002

  5. With trendline

  6. Regression Equation Regression Analysis: Temperature versus Year The regression equation is Temperature = 33.3 + 0.0100 Year Predictor Coef SE Coef T P Constant 33.257 4.645 7.16 0.000 Year 0.010024 0.002383 4.21 0.000 S = 0.772194 R-Sq = 14.3% R-Sq(adj) = 13.5%

  7. All models are wrong, but some models are useful. • George Box, born in England, Professor Emeritus, University of Wisconsin (wife is daughter of Sir Ronald Fisher, preeminent statistician of the 20th century.) • At one time was visiting professor at UNC-Chapel Hill

  8. Temp graph: Broken into 3 pieces • 1895 – 1929 • Increasing at the mean rate (0.01 degree per year), but arguably just within reasonable limits • 1930 – 1976 • Decreasing at double the mean rate (0.02 degree per year) • 1977 – 2002 • Increasing at triple the mean rate (0.06 degree per year)

  9. 1895 - 1929

  10. 1895 – 1929 Regression Analysis: temp4 versus year4 The regression equation is temp4 = 31.5 + 0.0109 year4 Predictor Coef SE Coef T P Constant 31.50 23.67 1.33 0.192 year4 0.01088 0.01238 0.88 0.386 S = 0.739710 R-Sq = 2.3% R-Sq(adj) = 0.0%

  11. 1930 - 1976

  12. 1930 - 1976 Regression Analysis: temp2 versus year2 The regression equation is temp2 = 98.4 - 0.0233 year2 Predictor Coef SE Coef T P Constant 98.43 12.57 7.83 0.000 year2 -0.023314 0.006436 -3.62 0.001 S = 0.598467 R-Sq = 22.6% R-Sq(adj) = 20.9%

  13. 1977 - 2002

  14. 1977 - 2002 Regression Analysis: temp3 versus year3 The regression equation is temp3 = - 66.9 + 0.0604 year3 Predictor Coef SE Coef T P Constant -66.89 41.12 -1.63 0.117 year3 0.06038 0.02067 2.92 0.007 S = 0.790427 R-Sq = 26.2% R-Sq(adj) = 23.2%

  15. Extrapolation is dangerous • Some periods arguably have no trend, even though the average is changing • Some periods go down • Some periods go up • The periods chosen are up to the analyst • We never know when these trends change • Our models must be viewed with a question mark, though they give us food for thought. What if we really are rising at 0.06 degree per year? 100 years = 6 degrees, 1000 years = 60 degrees!!

  16. Influential Outliers

  17. Be cautious of influential outliers • An observation is influential when it has a large effect on the results of linear regression • Two conditions must hold for an observation to be considered influential • These conditions are • Its x-value is relatively high or low • The observation is an outlier

  18. Example: • Regression by eye applet – watch the effect when the influential point is moved about. • Contrast with other outliers that are not influential.

  19. Correlation does not imply causation And types of causes

  20. Causes are of different types • Necessary cause : condition must be present for an event to take place (key in ignition for car to start) • Sufficient cause: condition is sufficient by itself to cause an event to take place (if you get sore every time you lift weights, then lifting is a sufficient cause for getting sore, but it’s not a necessary cause because racquetball also can cause you to get sore).

  21. Cause types (cont) • Necessary and sufficient (these are rare). • If the cause is present, the event happens, and if it is not present, the event does not happen. • Contributory cause • If a cause is a contributory cause of an event, then the event is more likely to occur when the contributory cause is present. • These are the causes we most often see in human activity

  22. Frequent errors • Inferring a cause is a simple, single cause of an effect, when in fact it is only a contributory cause • Confounding variables: • Two explanatory variables are confounding if they both correlate to the response, and also correlate to each other. • Lurking variables. Example: Hormone replacement therapy (HRT) and heart disease.

  23. Lurking variables • Lurking variables are those variables, either known or unknown, with which correlations exist between the variables in question. • When a response and explanatory correlate with lurking variables, then the response may correlate with the explanatory. • Take away: correlation between two variables DOES NOT mean those variables are directly connected, and certainly does not imply causation.

  24. Do We Really Know What Makes Us Healthy? • Example New York Times article on epidemiology and Hormone Replacement Therapy (HRT) for women • 1985: the Nurses’ Health Study run out of the Harvard Medical School and the Harvard School of Public Health reported that women taking estrogen had only a third as many heart attacks as women who had never taken the drug. • ->Women were protected from heart attacks until they passed through menopause (estrogen bestowed the protection) • ->this became the basis of the therapeutic wisdom for the next 17 years.

  25. New York Times article on HRT (cont) • the Women’s Health Initiative concluded in 2002 that H.R.T. caused far more harm than good • Why? healthy-user bias (lurking variable) • http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html

  26. NYT (cont): Healthy User Bias • the problem is that people who faithfully engage in activities that are good for them — taking a drug as prescribed, for instance, or eating what they believe is a healthy diet — are fundamentally different from those who don’t. One thing epidemiologists have established with certainty, for example, is that women who take H.R.T. differ from those who don’t in many ways, virtually all of which associate with lower heart-disease risk: they’re thinner; they have fewer risk factors for heart disease to begin with; they tend to be more educated and wealthier; to exercise more; and to be generally more health conscious.

  27. Discovery of the lurking variable(s) • In 1987, Diana Petitti, an epidemiologist now at the University of Southern California, reported that she, too, had detected a reduced risk of heart-disease deaths among women taking H.R.T. in the Walnut Creek Study, a population of 16,500 women. When Petitti looked at all the data, however, she “found an even more dramatic reduction in death from homicide, suicide and accidents.” With little reason to believe that estrogen would ward off homicides or accidents, Petitti concluded that something else appeared to be “confounding” the association she had observed. “The same thing causing this obvious spurious association might also be contributing to the lower risk of coronary heart disease,” Petitti says today.

  28. Frequent errors (cont) • Inferring that a correlation of event A with event B means that event A CAUSES event B • Another missing (lurking) variable that correlates with both variables may be the culprit • One variable may be the cause and the other the effect, but sometimes the wrong variable is chosen for the cause.

  29. Which is the cause? • Women in 1985 taking HRT had lower heart attack rates • 17 years of HRT therapy followed, arguably causing 10’s of thousands of deaths among American women • Subsequent research found HRT increased risk of heart disease • Lurking variables were found • The most sophisticated statistics could not outweigh the lack of judgement by researchers (who were notably qualified) • Clinical trials (2002) have caused HRT therapy to be discredited for reducing heart attacks in women

  30. Another (short) example: • A recent article: Many agree that the decline of religion may be a cause of the decline of the family. But what if it’s the other way around? Mary Eberstadt speculates... (http://www.hoover.org/publications/policyreview/7827212.html)

  31. Correlation does not imply causation • The number of TV sets per person x and the average life expectancy y for the world’s nations. There is a high positive correlation. • Does this mean that we can improve the life expectancy of people in Rwanda by shipping them TV sets? • No – rich nations have longer life expectancy because they have better nutrition, clean water, and better health care (lurking variables) • No cause and effect between TV sets and life expectancy.

  32. Take away (you too, Harvard researchers) • Correlation does not imply causation. • N.B. : these are not mathematical errors, but errors in applying the mathematics. This is why the authors use the term “art” in their definition of statistics. It’s more than a science. Done well, it’s an art.

More Related