160 likes | 321 Views
Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations. 4- 1. Simpson’s Paradox. Consider the following study: Accident rates in California.
E N D
Section 4.4: Simpson Paradox Section 4.5: Linearizing an association between two variable by performing a Mathematical Transformations 4-1
Simpson’s Paradox Consider the following study: Accident rates in California. A study showed that male teenagers have twice the accident rate of female teenagers. Male Female Proportion of accidents: 0.162 0.075 The study did not take into account the confounding variable: number of miles driven per year! Male Female Accident rate 0.162 0.075 Average number of miles p.p. 9,557 4,643 Average number of accidents 1.78 1.77 per 100,000 miles This more accurate study shows NO DIFFERENCE!! The higher proportion of accidents for male teenagers is explained away by the fact that men typically drive more! This is an example of Simpson’s paradox!!
Another example: Medical study of a treatment: http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247
Conclusion: “Thus, if the patient's serum X level is unknown, treatment Aseems to be better, but if serum X is known, treatment B ispreferable (and one can better predict the response rate ofa patient). This phenomenon is a result of the aggregation oftwo (or more) subgroups.1 The numbers of the example are keptsimple to demonstrate this phenomenon of severe confounding,but there are a number of real examples in the literature, includingthe medical literature.2–4. This aggregation effect canoccur in the case of an uneven distribution of a ‘latentvariable’ (in this case the serum X level) among the groupsstudied. “ http://qjmed.oxfordjournals.org/cgi/content/full/95/4/247
Simpson’s Paradox represents a situation in which an association between two variables inverts or goes away when : • data are collapsed across a sub-classification (in the previous example: across different serum X levels), the overall change may not represent what is really happening. • there is a combination of a lurking variable and/or data from unequal sized groups being combined into a single data set. The unequal group sizes, in the presence of a lurking variable, can weight the results incorrectly.
Nonlinear Regression Exponential relationship 4-6
Linearization • Apply a logarithm transformation to re-express the previous Exponential or Power functions into Linear Functions • Use Log function properties: • loga(MN) = logaM + logaN • logaMr = r loga M • (M, N, and a are positive real numbers, a> 1, and r is any real number.)
y = abxExponential Model log y = log (abx)Take the common logarithm of both sides log y = log a + log bx log y = log a + x log b Y = A + B x where b = 10Ba = 10A 4-9
y = axb Power Model log y = log (axb) Take the common logarithm of both sides log y = log a + log xb log y = log a + b log x Y = A + bX where a = 10A 4-10
Example: The statistics of poverty and inequality Data from U.N.E.S.C.O. 1990 Demographic Year Book . For 97 countries in the world, data are given for birth rates and for an index of the Gross National Product. Exponential relation!
Linearization using LOG function: The plot before shows a non-linear association! we can make it linear by using the transformation natural log of GNP. Birth rate vs Log G.N.P.
Distance Intensity 1.0 0.0972 1.1 0.0804 1.2 0.0674 1.3 0.0572 1.4 0.0495 1.5 0.0433 1.6 0.0384 1.7 0.0339 1.8 0.0294 1.9 0.0268 2.0 0.0224 EXAMPLE Finding the Curve of Best Fit to a Power Model Cathy wishes to measure the relation between a light bulb’s intensity and the distance from some light source. She measures a 40-watt light bulb’s intensity 1 meter from the bulb and at 0.1-meter intervals up to 2 meters from the bulb and obtains the following data. 4-13
(a) Draw a scatter diagram of the data treating the distance, x, as the predictor variable. (b) Determine X = log x and Y = log y and draw a scatter diagram treating the day, X = log x, as the predictor variable and Y = log y as the response variable. Comment on the shape of the scatter diagram. (c) Find the least-squares regression line of the transformed data. (d) Determine the power equation of best fit and graph it on the scatter diagram obtained in part (a). (e) Use the power equation of best fit to predict the intensity of the light if you stand 2.3 meters away from the bulb. 4-14