570 likes | 664 Views
Statistics of Illumination. Beth Chance Roxy Peck Cal Poly, San Luis Obispo. STATISTICS SAY…. Increasingly daily life involves statistical information interpretations of graphical and numerical summaries comparisons of groups poll results from random samples
E N D
Statistics of Illumination Beth Chance Roxy Peck Cal Poly, San Luis Obispo
STATISTICS SAY… • Increasingly daily life involves statistical information • interpretations of graphical and numerical summaries • comparisons of groups • poll results from random samples • conclusions from randomized experiments • predictions of future outcomes
Most people use statistics as a drunkard uses a lamppost- more for support than for illumination.
Predicting Variable Behavior (a) Height of students in this class (b) Students’ preference for coca-cola vs. pepsi-cola (c) Number of siblings of individuals (d) Amount paid for last haircut (e) Gender breakdown (f) Students’ guesses of my age
Matching Variables to Graphs • Think about context! • Anticipate patterns and variations • variable intuition • graph-sense
STATISTICS SAY… • Students’ heights would show more variability than guesses of my age • KDC Pursues High-Return, Low-Risk Strategy
Describing Variability • The “bumpiness” of a histogram does not determine the variability of the observations • The number of distinct values the variable takes does not determine the variability of the observations
STATISTICS SAY… • 5236 drivers age 65 and over were involved in fatal accidents, compared to only 2900 drivers aged 16 and 17, so young people are safer drivers... • 65% of motorcycle fatalities occurred in states with mandatory helmet laws...
Counts Versus Ratios • Simple counts are often not a good basis for comparison of two or more groups. • Group size isn’t always obvious—two groups of 25 U.S. states may have very different sizes even though both include the same number of states. • Deciding on a sensible basis for comparison requires thought!
STATISTICS SAY… • 85% of software developers predicted that Microsoft's integration of Internet functions into Windows would help their company
Some Simple Questions • Question 1 Lost ticket Yes: 6 No: 9 Lost $20 Yes: 8 No: 6
Some Simple Questions • People are more likely to say “yes” when they have lost a $20 bill • People tend to answer “not surprising” to both expressions • People are more likely to choose program A with the “save” version and program B with the “die” version
Some Simple Questions • Be careful when wording survey questions – ask to see the phrasing! • Bill Gates: It would help me EMENSELY to have a survey showing that 90% of developers believe putting the browser into the operating system is a good idea… • Browser vs. “browser technologies”
STATISTICS SAY … • Researchers in Philadelphia investigated whether pamphlets containing information for cancer patients are written at a level that the cancer patients can comprehend • Median reading levels are equal
Readability of Cancer Pamphlets • Graphs can illuminate Look at the data! • Think about the question
STATISTICS SAY… • American men were randomly selected for the 1970 draft • Draft numbers (1-366) were assigned to birthdates
Draft Lottery • Calculate the median draft number for each month • 31 days: 16th value • 30 days: average 15th and 16th values • 29 days: 15th value
month median January 211.0 February 210.0 March 256.0 April 225.0 May 226.0 June 207.5 month median July 188.0 August 145.0 September 168 October 201 November 131.5 December 100 Draft Lottery
Draft Lottery • Statistics matter • Summaries can illuminate • Randomization can be difficult
STATISTICS SAY… • The average time between eruptions of the Old Faithful Geyser is 71 minutes • August, 1985
Geyser Eruptions • Looks can be deceiving! • Use the graph that summarizes without losing important details
STATISTICS SAY… • The average major league baseball salary in the United States is about $1.5 million
Rowers’ Weights • 2000 Men’s Olympic Rowing Team
Rowers’ Weights Mean Median Full Data Set 197.29 207.5 Without Coxswain 200.11 210.00 Without Coxswain or 210.57 210.00 lightweight rowers With heaviest at 320 215.33 210.00 Resistance....
Rowers’ Weights • Know what your numerical summary is measuring • Investigate causes for unusual observations • Baseball: median salary ~ $500,000
STATISTICS SAY… People live longer in countries with more televisions
Televisions and Life Expectancy • Buy another television? • Association is not causation
STATISTICS SAY… • Overall survival rates: • A: 80% B: 90% • Fair condition: • A: 98.3% B: 96.7% • Poor condition: • A: 52.5% B: 30.0%
Hospital Recovery Rates • “Simpson’s Paradox” • Hospital A gets most of the poor condition cases • Patients in poor condition are less likely to survive • Thus: hospital A has the lower survival rate despite being the better choice for either condition • Beware of lurking variables
Hospital Recovery Rates (cont.) 100% Fair % survive 0% Hospital A Hospital B
Hospital Recovery Rates (cont.) 100% Fair % survive Poor 0% Hospital A Hospital B
Hospital Recovery Rates (cont.) 100% Fair % survive Poor 0% Hospital A Hospital B
STATISTICS SAY… • Taking an aspirin each day reduces the risk of heart attack for men, but less so for women
How Experiments Take Variability Into Account • Direct control • Blocking • Randomization
Results from 100 Trials First Blocking Scheme Completely Randomized Second Blocking Scheme
Controlling for Variability • Blocking reduces variability in the estimated mean difference • Homogeneous blocks are desirable • Randomization evens out the effects of extraneous variables
STATISTICS SAY… • A log was selected at random…
Sampling Logs • Does choosing times at random result in a random sample of logs? _______________________________