200 likes | 472 Views
Bivariate Cautions. Presentation 2-12. Bivariate Cautions. Many times, bivariate data is not as it seems. Advertisers and marketers often take advantage of this to increase sales Politicians and Lobbyists may take advantage in order to get more votes
E N D
Bivariate Cautions Presentation 2-12
Bivariate Cautions • Many times, bivariate data is not as it seems. • Advertisers and marketers often take advantage of this to increase sales • Politicians and Lobbyists may take advantage in order to get more votes • Incorrect analysis may also be done unintentionally due to ignorance
Bivariate Cautions • Reasons for incorrect analysis include: • Simpson’s Paradox • Lurking or confounding variables • Extrapolation • All of the above are required terminology
Simpson’s Paradox • Simpson’s Paradox may take on different forms. • In general, it refers to the reversal of the direction of a comparison or an association when data from several groups are combined to form a single group • Let’s look first at a qualitative example followed by a quantitative example.
Simpson’s ParadoxQualitative • From the introduction regarding Alex Rodriguez and Ichiro. • The issue was: • Over the first half of the season, Alex Rodriguez' batting average was better than Ichiro's. Over the second half of the season, Alex Rodriguez' batting average was better than Ichiro's. So, Alex Rodriguez's batting average for the season is better than Ichiro's. Right?!?!?
Simpson’s ParadoxQualitative • Here is the data it was based upon: • What was better over the entire season? Ichiro Rodriguez
Simpson’s ParadoxQualitative • Welcome to the wonderful world of percentages and misconception! • Because of the number of at-bats, it is true that Ichiro may in fact be the better hitter over the entire season! This is a great example of how the direction (whose average is higher) may change when the groups (1st and 2nd half) are combined (total season).
Simpson’s ParadoxQuantitative • From the lesson description regarding years of education and salary. • The issue was: • I thought you said the more education you get, the more money you make. I'm a teacher and it's not quite working out.
Simpson’s ParadoxQuantitative Education Profession • Let’s look at the scatterplot relating years of education and salary among those in the education profession. • Certainly, the more degrees and credits you have, the higher the pay. Salary Years of education A clear positive association
Simpson’s ParadoxQuantitative Business • Let’s look at the scatterplot relating years of education and salary among those in business. • Certainly, the more degrees (like an MBa) and credits you have, the higher the pay. Salary Years of education A clear positive association
Simpson’s ParadoxQuantitative Business • Let’s look at the scatterplot combining the two sets of data. • All of a sudden, the more education you have the less money you make. • The reasons are: • People in business make more money than people in education. • People in education tend to have many, many years of education (usually Masters degrees often Doctorates) • People in business typically stop after a Bachelor’s degree Salary Years of education Now, suddenly, a negative association
Simpson’s Paradox • For additional information about Simpson’s Paradox, go to the link below: • http://exploringdata.cqu.edu.au/sim_par.htm
Lurking and Confounding Variables • This also relates to correlation and causation. • REMEMBER – a correlation does NOT indicate causation!!!!!!! • Here’s an example: • A correlation (in this case, a positive association) is found between the number of police officers in a city and the number of reported crimes in a city. • That is, as a city increases its number of police officers, the number of reported crimes increases.
Lurking and Confounding Variables • The mayor, upon learning this may conclude that: • Crime is on the rise…or… • With the additional officers, a higher proportion of crimes are being caught thus increasing the reporting • The 2nd scenario is more likely with the proportion of crimes being the lurking variable.
Lurking and Confounding Variables • Follow the links below to read some additional and sometimes humorous examples. • http://score.kings.k12.ca.us/lessons/wwwstats/confounding.variables.html • http://score.kings.k12.ca.us/lessons/wwwstats/lurking.variables.html
Extrapolation • Extrapolation is using a regression model to make a prediction outside the range of the data. • Outside the range of the data means: • If your data was from 1920 to 2004, extrapolation would be making a prediction for a year before 1920 or after 2004. • Extrapolation is done all the time because otherwise what purpose would a regression model have? • Extrapolation should only be done relatively close to the data to avoid large risk of error.
Extrapolation • Consider the example of world record times in the men’s 800 meter race. • This is certainly a negative association. • The regression line is: Time=114.3-.14*(years past 1900) R-squared = 0.9387
Extrapolation • The regression line is: Time = 114.3 - .14*(years past 1900) R-squared = 0.9387 • Try to predict the record time in the year 2717. Time = 114.3 - .14 (817) Time = -.08 seconds • That’s quite impressive – in the year 2717, they will finish the 800m race before they even start! • That’s extreme extrapolation!
Extrapolation • The rules of thumb for extrapolation: • Use common sense • Stay close (a deliberately vague term) to the data • If you are straying away from the data, communicate that you are doing so
Bivariate Cautions • This concludes the presentation.