1 / 8

Unveiling Statistical Fallacies: Simpson’s Paradox Explained

Learn about Simpson’s Paradox and the pitfalls of combining data sets, with real-world examples from baseball statistics and college grades. Discover how seemingly contradictory conclusions can arise and the importance of cautious data aggregation. Unravel the hidden variables that can skew results and the impact of sample sizes on overall findings. Understand the complexities of interpreting data accurately.

dho
Download Presentation

Unveiling Statistical Fallacies: Simpson’s Paradox Explained

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LSP 121 Statistics That Deceive

  2. Simpson’s Paradox • It is well accepted knowledge that the larger the data set, the better the results • Simpson’s Paradox demonstrates that a great deal of care has to be taken when combining smaller data sets into a larger one • Sometimes the conclusions from the larger data set are opposite the conclusion from the smaller data sets

  3. Example: Simpson’s Paradox Baseball batting statistics for two players: How could Player A beat Player B for both halves individually, but then have a lower total season batting average?

  4. Example Continued We weren’t told how many at bats each player had: Player A’s dismal second half and Player B’s great first half had higher weights than the other two values.

  5. Another Example Average college physics grades for students in an engineering program: taken HS physics no HS physics Number of Students 50 5 Average Grade 80 70 Average college physics grades for students in a liberal arts program: taken HS physics no HS physics Number of Students 5 50 Average Grade 95 85 It appears that in both classes, taking high school physics improves your college physics grade by 10.

  6. Example continued In order to get better results, let’s combine our datasets. In particular, let’s combine all the students that took high school physics. More precisely, combine the students in the engineering program that took high school physics with those students in the liberal arts program that took high school physics. Likewise, combine the students in the engineering program that did not take high school physics with those students in the liberal arts program that did not take high school physics. But be careful! You can’t just take the average of the two averages, because each dataset has a different number of values!!

  7. Example continued Average college physics grades for students who took high school physics: # Students AvgGrades Weighted Grade Engineering 50 80 50/55*80=72.7 Lib Arts 5 95 5/55*95=8.6 Total 55 Average (72.7 + 8.6)81.3 Average college physics grades for students who did not take high school physics: # Students AvgGrades Weighted Grade Engineering 5 70 5/55*70=6.4 Lib Arts 50 85 50/55*85=77.3 Total 55 Average (6.4 + 77.3)83.7 Did the students that did not have high school physics actually do better?

  8. The Problem • Two problems with combining the data • There was a larger percentage of one type of student in each table • The engineering students had a more rigorous physics class than the liberal arts students, thus there is a hidden variable • So be very careful when you combine data into a larger set

More Related