Chapter 23

Chapter 23 Use and Abuse of Statistical Inference Chapter 23

Thought Question 1 When presenting the results of a study, would it be sufficient to only report the P-value? Why would it be a good idea to also give a confidence interval based on the results? Chapter 23

Thought Question 2 Suppose a new study found that there was no difference in lung function, measured by average volume of air expired, for smokers and nonsmokers. What may have led to this finding? Do you think the lung function was exactly the same for both groups in the study? Chapter 23

Thought Question 3 The results of a CNN/USA Today/Gallup public opinion poll in August of 2005 showed that a majority of Americans were pro-choice on the abortion issue. Would it be fair to claim that “significantly more than 50% of Americans were pro-choice”? Explain. Chapter 23

Thought Question 3: Answer • n=1003 • 542 stated that they were pro-choice • 95% C.I.: 0.509 to 0.571 Chapter 23

Warnings about Reports on Hypothesis Tests: Data Origins For any statistical analysis to be valid, the data must come from proper samples. Complex formulas and techniques cannot fix bad (biased) data. In addition, be sure to use an analysis that is appropriate for the type of data collected. Chapter 23

Warnings about Reports on Hypothesis Tests: P-value or C.I.? P-values provide information as to whether findings are more than just good luck, but P-values alone may be misleading or leave out valuable information (as seen later in this chapter). Confidence intervals provide both the estimated values of important parameters and how uncertain the estimates are. Chapter 23

Warnings about Reports on Hypothesis Tests: Significance If the word significant is used to try to convince you that there is an important effect or relationship, determine if the word is being used in the usual sense or in the statistical sense only. Chapter 23

Case Study: Patient Satisfaction “Women Doctors Fare Better in Patient Survey” reported in Sacramento Bee, April 26, 1995 Bertakis, Klea D., et. al., “The influence of gender on physician practice style”, Medical Care, Vol. 33, No. 4, 1995, pp 407-416. Chapter 23

Case Study: Patient Satisfaction • Alternative (Research) Hypothesis: The mean satisfaction rating by patients who first saw a female physician is different from the mean satisfaction rating by patients who first saw a male physician. • Null Hypothesis: There is no difference in the mean satisfaction rating by patients who first saw a female physician and the mean satisfaction rating by patients who first saw a male physician. Chapter 23

Case Study: Patient Satisfaction • The alternative hypothesis is two-sided. • Study was double blinded (neither patients nor physicians were told the purpose of the survey). • Survey was completed by 250 patients at the University of California at Davis Medical Center who rated medical residents on a scale 1 to 5 (very dissatisfied to very satisfied). Chapter 23

Case Study: Patient Satisfaction • Bee: “The female physicians received an average score of 4.27. The men – a respectable, yet significantly lower score of 4.05.” • The average difference was 0.22. • Medical Care: the difference was “small but statistically significant (P-value=0.02).” • Medical Care: “This difference is both statistically and clinically significant.” Chapter 23

Warnings about Reports on Hypothesis Tests: LargeSample If a study is based on a very large sample size, relationships found to be statistically significant may not have much practical importance. Chapter 23

Case Study: Drug Use in American High Schools Alcohol Use Bogert, Carroll. “Good news on drugs from the inner city,” Newsweek, Feb.. 1995, pp 28-29. Chapter 23

Case Study: Drug Use in American High Schools • Alternative Hypothesis: The percentage of high school students who used alcohol in 1993 is less than the percentage who used alcohol in 1992. • Null Hypothesis: There is no difference in the percentage of high school students who used in 1993 and in 1992. Chapter 23

Case Study: Drug Use in American High Schools 1993 survey was based on 17,000 seniors, 15,500 10th graders and 18,500 8th graders. Chapter 23

Case Study: Drug Use in American High Schools • The article suggests that the survey reveals “good news” since the differences are all negative. • The differences are significant. • statistically? • practically? Chapter 23

Warnings about Reports on Hypothesis Tests: Small Sample If you read “no difference” or “no relationship” has been found in a study, try to determine the sample size used. Unless the sample size was large, remember that it could be that there is indeed an important relationship in the population, but that not enough data were collected to detect it. In other words, the test could have had very low power. Chapter 23

Case Study: Memory Loss Memory Loss in American Hearing, American Deaf and Chinese Adults Levy, B. and E. Langer. “Aging free from negative stereotypes: Successful memory in China and among the American deaf,” Journal of Personality and Social Psychology, Vol. 66, pp 989-997. Chapter 23

Case Study: Memory Loss • Average Memory Test Scores (higher is better) • 30 subjects were sampled from each population Chapter 23

Case Study: Memory Loss • Young Americans (hearing and deaf) have significantly higher mean scores. • Science News (July 2, 1994, p. 13): “Surprisingly, ...memory scores for older and younger Chinese did not statistically differ.” Chapter 23

Case Study: Memory Loss • Since the sample sizes are very small, there is an increased chance that the test will result in a Type II error if indeed there is a difference between young and old subjects’ mean memory scores. • The “surprising” result may just be a Type II error. • The test could have very low power. Chapter 23

Warnings about Reports on Hypothesis Tests: 1 or 2 Sided Try to determine whether the test was one-sided or two-sided. If a test is one-sided, and details are not reported, you could be misled into thinking there was no difference, when in fact there was one in the direction opposite to that hypothesized. Chapter 23

Case Study: Seen a UFO? Seen a UFO? You May Be Healthier Than Your Friends Roper Organization. Unusual Personal Experiences: An Analysis of the Data from Three National Surveys, Las Vegas: Bigelow Holding Corp., 1992. Chapter 23

Case Study: Seen a UFO? • Research Hypothesis (Alternative): People who claim to have seen a UFO are on average more psychologically disturbed than those who make no such claim. • Null Hypothesis: People who claim to have seen a UFO are on average no more or less psychologically disturbed than those who make no such claim. Chapter 23

Case Study: Seen a UFO? • 49 subjects were recruited through a newspaper. • 18 were UFO nonintense • 31 were UFO intense(could explain details of encounter) • 127 control subjects were recruited • 74 students of a psychology class (receiving credit for participation) • 53 community members recruited through a newspaper Chapter 23

Case Study: Seen a UFO? • New York Times (1993): “Study Finds No Abnormality in Those Reporting UFOs.” • Results: UFO groups actually scored significantly better (statistically) on many of the psychological measures. • The stated one-sided alternative hypothesis was not supported. Does this mean the null hypothesis is true? Chapter 23

Warnings about Reports on Hypothesis Tests: Only Significant are Reported? Sometimes researchers will perform a multitude of tests, and the reports will focus on those that achieved statistical significance. Remember that if nothing interesting is happening and all of the null hypotheses tested are true, then [about] 1 in 20 (.05) tests should achieve statistical significance just by chance. Beware of reports where it is evident that many tests were conducted, but where results of only one or two are presented as “significant.” Chapter 23

Case Study: Spinach is Good? So You Thought Spinach Was Good for You? Norwak, R. “Beta-carotene: Helpful or harmful?” Science, Vol. 264, April 22, 1994, pp 500-501. Chapter 23

Case Study: Spinach is Good? • Startling finding: Supplements of the antioxidant beta-carotene markedly increased the incidence of lung cancer among heavy smokers in Finland. • This is the result of a large, randomized clinical trial: 29,000 cases • But…there were multiple tests conducted. Chapter 23

Key Concepts • Difference between a statistically significant effect and a practically important one • Large Samples and Statistical Significance • Small Samples and Statistical Significance • Multiple Tests and Statistical Significance Chapter 23

Chapter 23

Chapter 23

Presentation Transcript

Chapter 23

CHAPTER 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23

Chapter 23