1 / 57

Statistics of Illumination

Statistics of Illumination. Beth Chance Roxy Peck Cal Poly, San Luis Obispo. STATISTICS SAY…. Increasingly daily life involves statistical information interpretations of graphical and numerical summaries comparisons of groups poll results from random samples

mardi
Download Presentation

Statistics of Illumination

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics of Illumination Beth Chance Roxy Peck Cal Poly, San Luis Obispo

  2. STATISTICS SAY… • Increasingly daily life involves statistical information • interpretations of graphical and numerical summaries • comparisons of groups • poll results from random samples • conclusions from randomized experiments • predictions of future outcomes

  3. Most people use statistics as a drunkard uses a lamppost- more for support than for illumination.

  4. Predicting Variable Behavior

  5. Predicting Variable Behavior (a) Height of students in this class (b) Students’ preference for coca-cola vs. pepsi-cola (c) Number of siblings of individuals (d) Amount paid for last haircut (e) Gender breakdown (f) Students’ guesses of my age

  6. Matching Variables to Graphs

  7. Matching Variables to Graphs • Think about context! • Anticipate patterns and variations • variable intuition • graph-sense

  8. STATISTICS SAY… • Students’ heights would show more variability than guesses of my age • KDC Pursues High-Return, Low-Risk Strategy

  9. What is Variability?

  10. What is Variability?

  11. Describing Variability • The “bumpiness” of a histogram does not determine the variability of the observations • The number of distinct values the variable takes does not determine the variability of the observations

  12. STATISTICS SAY… • 5236 drivers age 65 and over were involved in fatal accidents, compared to only 2900 drivers aged 16 and 17, so young people are safer drivers... • 65% of motorcycle fatalities occurred in states with mandatory helmet laws...

  13. Counts Versus Ratios • Simple counts are often not a good basis for comparison of two or more groups. • Group size isn’t always obvious—two groups of 25 U.S. states may have very different sizes even though both include the same number of states. • Deciding on a sensible basis for comparison requires thought!

  14. STATISTICS SAY… • 85% of software developers predicted that Microsoft's integration of Internet functions into Windows would help their company

  15. Some Simple Questions • Question 1 Lost ticket Yes: 6 No: 9 Lost $20 Yes: 8 No: 6

  16. Some Simple Questions • People are more likely to say “yes” when they have lost a $20 bill • People tend to answer “not surprising” to both expressions • People are more likely to choose program A with the “save” version and program B with the “die” version

  17. Some Simple Questions • Be careful when wording survey questions – ask to see the phrasing! • Bill Gates: It would help me EMENSELY to have a survey showing that 90% of developers believe putting the browser into the operating system is a good idea… • Browser vs. “browser technologies”

  18. STATISTICS SAY … • Researchers in Philadelphia investigated whether pamphlets containing information for cancer patients are written at a level that the cancer patients can comprehend • Median reading levels are equal

  19. Readability of Cancer Pamphlets

  20. Readability of Cancer Pamphlets • Graphs can illuminate Look at the data! • Think about the question

  21. STATISTICS SAY… • American men were randomly selected for the 1970 draft • Draft numbers (1-366) were assigned to birthdates

  22. Draft Lottery • Calculate the median draft number for each month • 31 days: 16th value • 30 days: average 15th and 16th values • 29 days: 15th value

  23. month median January 211.0 February 210.0 March 256.0 April 225.0 May 226.0 June 207.5 month median July 188.0 August 145.0 September 168 October 201 November 131.5 December 100 Draft Lottery

  24. Draft Lottery

  25. Draft Lottery

  26. Draft Lottery • Statistics matter • Summaries can illuminate • Randomization can be difficult

  27. STATISTICS SAY… • The average time between eruptions of the Old Faithful Geyser is 71 minutes • August, 1985

  28. Geyser Eruptions

  29. Geyser Eruptions • Looks can be deceiving! • Use the graph that summarizes without losing important details

  30. STATISTICS SAY… • The average major league baseball salary in the United States is about $1.5 million

  31. Rowers’ Weights • 2000 Men’s Olympic Rowing Team

  32. Rowers’ Weights

  33. Rowers’ Weights Mean Median Full Data Set 197.29 207.5 Without Coxswain 200.11 210.00 Without Coxswain or 210.57 210.00 lightweight rowers With heaviest at 320 215.33 210.00 Resistance....

  34. Rowers’ Weights • Know what your numerical summary is measuring • Investigate causes for unusual observations • Baseball: median salary ~ $500,000

  35. STATISTICS SAY… People live longer in countries with more televisions

  36. Televisions and Life Expectancy • Buy another television? • Association is not causation

  37. STATISTICS SAY… • Overall survival rates: • A: 80% B: 90% • Fair condition: • A: 98.3% B: 96.7% • Poor condition: • A: 52.5% B: 30.0%

  38. Hospital Recovery Rates • “Simpson’s Paradox” • Hospital A gets most of the poor condition cases • Patients in poor condition are less likely to survive • Thus: hospital A has the lower survival rate despite being the better choice for either condition • Beware of lurking variables

  39. Hospital Recovery Rates (cont.) 100% Fair % survive 0% Hospital A Hospital B

  40. Hospital Recovery Rates (cont.) 100% Fair % survive Poor 0% Hospital A Hospital B

  41. Hospital Recovery Rates (cont.) 100% Fair % survive Poor 0% Hospital A Hospital B

  42. STATISTICS SAY… • Taking an aspirin each day reduces the risk of heart attack for men, but less so for women

  43. How Experiments Take Variability Into Account • Direct control • Blocking • Randomization

  44. Randomization

  45. Blocking Scheme A

  46. Blocking Scheme B

  47. Results from 100 Trials First Blocking Scheme Completely Randomized Second Blocking Scheme

  48. Controlling for Variability • Blocking reduces variability in the estimated mean difference • Homogeneous blocks are desirable • Randomization evens out the effects of extraneous variables

  49. STATISTICS SAY… • A log was selected at random…

  50. Sampling Logs • Does choosing times at random result in a random sample of logs? _______________________________ 

More Related