1 / 30

Beware, Statistics!

Beware, Statistics!. Brani Vidakovic ISyE & BME, GaTech. They said…. There are lies, damned lies, and statistics. -- Attributed by Mark Twain to Benjamin Disraeli In earlier times, they had no statistics, and so they had to fall back on lies. – Stephen Leacock

Download Presentation

Beware, Statistics!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Beware, Statistics! BraniVidakovic ISyE & BME, GaTech

  2. They said… • There are lies, damned lies, and statistics. -- Attributed by Mark Twain to Benjamin Disraeli • In earlier times, they had no statistics, and so they had to fall back on lies. – Stephen Leacock • Numbers are like people; torture them enough and they'll tell you anything.

  3. Intentional Statistical Inaccuracies Level of sophistication Very Low – Very High Often hard to distinguish incompetence from intention Donoho D – Reproducible Research Baggerly K – Forrensic Statistics (given data and results –> methods used) Gelman A, Feinberg S

  4. ASA Guidelines To help statistical practitioners make and communicate ethical decisions. Committee on Professional Ethics A. Professionalism B. Responsibilities to Funders, Clients, and Employers C. Responsibilities in Publications and Testimony D. Responsibilities to Research Subjects F. Responsibilities to Other Statistical Practitioners G. Responsibilities Regarding Allegations of Misconduct

  5. Location Measures • Perils of “On average, …” • The average Australian has less that two legs.True! • Small company salaries: 4 employees 20K, 3 employees 30K, vice-president 200K, president 400K. Average salary ?? Mean=85.5K, GeoMean=41.2K, Median = 30K, HarMean=29.3K, Mode=20K.

  6. Some violations • Cherry picking of data/studies • Fallacy of Incomplete Evidence • Discarding Influential data and Outliers • Confirmation Bias ``myside’’ bias • Anecdotal Evidence • Hyperbolic Discounting 1000 now or 3000 next year • Bandwagon Fallacy • False Dichotomy Will that be cash or charge? • ``Golden Sample’’ • Attrition Bias • Publication Bias (File Drawer Problem) Funnel Plots

  7. Even More… Loaded questions "Have you stopped smoking?" a.   Should people have the right to smoke? b. Since cigarettes are dangerous and have deadly side effects such as cancer, don’t you agree that smoking should be controlled? Anchoring phenomenon Think about 4 last digits of your SS# -> Estimate # of physicians in Atlanta

  8. Kahneman & Tversky 1 x 2 x 3 x … x 7 x 8 8 x 7 x 6 x … x 2 x 1 The anchor was the number shown first in the sequence, either 1 or 8. When 1 was the anchor, the average estimate was 512; When 8 was the anchor, the average estimate was 2,250. The correct answer is 40,320.

  9. Geometric misdeeds

  10. From one dollar to 44 cents

  11. Truncated Graphs

  12. Correlations Galore… • A correlated with B (but because of C!!) Number of people who buy ice cream at the beach is correlated by number of people who drown at the beach (but because of # of people!) • Correlation different than Dependence! E.g., (xi, yi), i=1,…,n on a circle.

  13. Perils of Aggregation

  14. Voodoo Correlations

  15. Data Dredging Data dredging is an abuse of data mining. In data dredging, large compilations of data are examined in order to find a relationship, without any pre-defined choice of a hypothesis to be tested (e.g., endpoints in Clinical Trials). A clear distinction between data analyses that are confirmatory and analyses that are exploratory. Statistical inference appropriate for confirmatory.

  16. Perils of Aggregation: Simpson’s ParadoxHospitals A and BMeasure of Quality: prop of SAT

  17. % Death rates in Sweden and Panama • % population 0 - 29 30 - 59 60+ • populationS = [3145000 3057000 1294000]'; • populationP = [ 714000 275000 59000]'; • % • %deaths per year 1962 • deathsS = [3523 10928 57104]'; • deathsP = [3904 1421 2756]'; • mortalityS = deathsS./populationS • mortalityP = deathsP./populationP • % mortalityS = 0.0011 0.0036 0.0441 • % mortalityP = 0.0055 0.0052 0.0467 • totmortalityS = sum(deathsS)/sum(populationS) • totmortalityP = sum(deathsP)/sum(populationP) • % totmortalityS = 0.0095 • % totmortalityP = 0.0077

  18. Cohen and Nagel (1934) • Simpson (1951) • A, B, C events • It is possible P(A|B C) > P(A|Bc C) & P(A|B Cc) > P(A|Bc Cc) P(A|B) < P(A|Bc) • Kotz S and Stroup D (1998). Educated Guessing, Marcel & Dekker

  19. Testing Any fixed correlation coefficient is significant if the sample size is large enough. t ~ C*sqrt(n) In classical testing hypotheses, ANY precise H0 will be rejected if the sample size is large enough.

  20. Lindley’s Paradox A certain city where 49,581 boys and 48,870 girls are born last year • phat = 49,581/98,451 ≈ 0.5036. • H0: p = 0.5 vs. H1: p ~= 0.5 • Freq: Normal Approx p-value=2.35% • P(H0)=P(H1)=1/2 a priori • Bayes: Uniform prior on p under H1 • P(H0|data)=0.95 (approx). • Freq:H0 poor; Bayes: H0 poor H1 worse

  21. Need for Equivalence Tests Testing can be compared by the judicial process, where the accused is considered innocent (H0) until proven guilty (H1) beyond a reasonable doubt (alpha). Key Word: CONSIDERED! A suspect found not guilty ~= found inocent If H0 is not rejected, it is not proven!

  22. Biased Sampling • Sampling dependent on the observation size (Inspection Paradox) Example: Tourists in Morocco – a study in 1966: Mean sojourn times by tourists: Hotels 17.8 days; Frontier stations 9.0 days

  23. Biased Sampling • Waiting times on a bus stop. Example: Times between two successive buses Exponential (lambda) -> Expected wait=1/lambda A passenger comes at the station at random moment, his expected waiting time is 1/lambda! Source of many wrong models.

  24. Prosecutor’s Fallacy • Replace P(A|B) with P(B|A) • P(match|innocent)=0.000001, thus P(innocent|match)=0.000001! Wrong! • In the community of 5 mil people expected number of matches is 5. • P(innocent|match) = 4/5 (given no other evidence)

  25. Sensitivity/Specificity/PPV Casscells et al. (1978) 60 Studensts & Staff at an elite medical school on East Cost. If a test for a disease with prevalence of 1/1000 has false positive rate 5% what is the probability of a person testing positive having the disease? Given the disease the test is always positive. 18% gave correct answer (approx 2%), most answered: 95%.

  26. Sensitivity/Specificity Interpretation Sensitivity <-> PPV Desease D has prevalence 2/10000. Test:P(+|D)=0.999, P(-|ND)=0.99 • A subject tests +, no other symptoms Tempting…P(D|+)=0.999, but • P(D|+)=P(+|D)P(D)/P(+) = 0.999*0.0002/(0.999*0.0002 + 0.01*0.9998) = 0.0196 …less than 2%

  27. Cryptographic Surveys Boss present, 100 workers to be asked: • Do you like your boss? Boss interested only in the proportion of YES. Cryptographic Solution: Flip a coin twice: • If 1st flip H: Answer the question: Is the 2nd flip H? • If 1st flip T: Answer the question: Do you like your boss? • SOL: ½ p + ½ x ½ = obs.prop of YES • p (approx=) obs. prop of YES – 1/2

  28. Rational Decisions: South Dakota Lottery Data for 4th quarter, 1987 • Total Revenue $11,812,905 • Prize Payments $5,322,975 • Joe Sixpack knows his $1 investment returns about $0.45, and he still plays. Why? Is he irrational? • No. The value of $ is not linear in $.

  29. More reading … • Hooke, R., 1983, How to tell the liars from the statisticians; Marcel Dekker, Inc., New York, NY • Jaffe, A.J. and H.F. Spirer, 1987, Misused Statistics; Marcel Dekker, Inc., NY • Campbell, S.K., 1974, Flaws and Fallacies in Statistical Thinking; Prentice Hall, Inc., Englewood Cliffs, NJ • Hollanfer, M. and Proschan, F., 1984, The Statistical Exorcist, Marcel Dekker, Inc., NY • Goldacre, B., 2009, Bad Science, Fourth Estate, London

More Related