1 / 32

Biostatistics in practice Session 3

Biostatistics in practice Session 3. Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center School of Medicine http://research.LABioMed.org/Biostat E-mail: ypak@labiomed.org. Table of Contents. Analogy of Hypothesis Testing

edythe
Download Presentation

Biostatistics in practice Session 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biostatistics in practiceSession 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center School of Medicine http://research.LABioMed.org/Biostat E-mail: ypak@labiomed.org

  2. Table of Contents • Analogy of Hypothesis Testing • How to compute a P-Values and interpret it • Understanding the sampling distribution and a confidence intervals (CI) • How to interpret a CI • The relationship between a P-Value and a CI

  3. The procedure of statistical inferences Population parameter Population C.I.s or P-values for population parameter Sampling mechanism: random sample or convenience sample Sample estimate of population parameter/ Descriptive statistics Sample

  4. Analogy for hypothesis testingExample: a bet between two friends • Suppose you and a friend were playing a “fun” gambling game. Your friend has a coin which you flip: • if “tails”, your friend pays you a $1, • if “heads”, you pay your friend a $1 • After 10 plays, you got 9 heads up. Do you trust your friend? Is this a fair coin? What is your argument?

  5. Statistical Hypothesis Testing • H0: Fair coin (null) vs. Ha: Unfair coin (alternative) • Assume the coin is “fair” (Assume H0 is true) • You and your friend have to put a threshold value on the definition of “being RARE”. That means that if Prob (# of H=9 or more |10 trials) is less than a certain value, say, α, then we will consider that 9 heads out of 10 trials are RARELY happen when the coin is fair, thus very unlikely to happen if the coin was fair. Then the rule is Prob.(# of H= 9 or more |10 trials) < 0.05 (α) ← Type I error rate ( = a level of significance) then your friend would agree to conclude it was not a fair coin thus reject H0 in favor of Ha.

  6. Statistical Hypothesis Testing continue. • Collect data and provide the “evidence” if H0: Fair coin is true. P(# of H=9 or more |10 trials)≈ 0.0011 (1.1%) • Make decision P(# of H=9 or more |10 trials) ≈ 1.1 (%) < 5% • Thus, it is VERY unlikely to happen if it was a fair coin. • We found a significant evidence to disapprove H0 in favor of Ha. • Therefore, conclude that it was an UNFAIR coin (thus, the bet is invalid).

  7. How to interpret P Value=1.1(%), in general ? • A P Value is predicted on the assumption that H0is true • A P Value is NOT a probability of the alternative being correct. • A P Value should be used as an evidence to DISPROVE H0, not to prove the Ha.

  8. How to interpret P-Values: ExampleAcute secondary Adrenal Insufficiency (AI) after Traumatic Brain Injury (TBI): a prospective study Objective: To determine the prevalence, clinical characteristics, and effect of AI on TBI patients Procedure: 80 TBI and 41 non-TBI patients were followed during the hospitalization up to 9 days, blood samples taken every 8 hours and vital signs recorded every hour. Subject is AI if 2 successive serum cortisols are low.

  9. Goal: Do Groups Differ By More than is Expected By Chance? • First, need to: • Specify experimental units (Persons? Blood draws?). • Specify single outcome for each unit (e.g., Yes/No(binary) or continuous?). • Examine raw data, e.g., histogram, for meeting test assumptions. • Specify group summary measure to be used (e.g., % or mean, median over units) Descriptive statistics. • Choose particular statistical test for the outcome and make inference with inferential statistics (CI, P-Value).

  10. Outcome Type → Statistical Test WilcoxonTest Medians %s ChiSquareTest . . . Means t Test . . . Cohan (2005) Crit Care Med;33:2358-66.

  11. t-Test for Minimal Mean Arterial Pressure(MAP): Step 1 • Calculate a standardized quantity for the particular test, a “test statistic”. Non AI N 38 Mean 63.4122807 Std Dev 8.7141575 SE(Mean) 1.41=8.71/√38 AI N 42 Mean 56.1666667 Std Dev 10.7824634 SE(Mean) 1.66=10.78/√42 Diff in Group Means = 63.4 - 56.2 = 7.2 (“Signal”) SE(Diff) ≈ sqrt[SEM12 + SEM22] = sqrt(1.662+1.412) ≈ 2.2 (“Noise” due to random sampling) Signal to Noise Ratio → Test Statistic = t = (7.2 - 0)/2.2 = 3.28

  12. t-Test for Minimal MAP: Step 2 • Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ(H0). Often: t is approx’ly normal bell curve. Is the t-test statistics of 3.28 seems to be “RARE” to you? Why? Prob (-2 to -1) is Area = 0.14 Expect 2.5 % 22 22 Observed = 3.28 0.95 Chance

  13. t-Test for Minimal MAP: P-Value • Declare groups to differ if test statistic is RARE under H0 is true[How much RARE?] • P-Value=Prob. ( T-statistics > 3.28)=0.0007(One-sided) • In practice, a two sided p-value is usually used. • Two sided P-Value • = 2 x One-sided P-value • =2 x 0.0007= 0.0014 < 0.05 • Conclude: Groups differ since ≥3.28 has <5% if no difference in the entire • Smaller values ↔ more evidence of group differences. Area = 0.0007 Area = 0.0007 Expected 95% When H0 is true Observed = 3.28 95% Chance

  14. One sided or Two sided P-Values? • There are other types of t-tests: • A two-sided P-value assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease. • A one-sided P-value assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

  15. Tests on Percentages Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ? Solution: Same theme as for means. Find a test statistic and compare to its expected values if groups do not differ. See next slide.

  16. Tests on Percentages Cannot use t-test for comparing lab data for multiple blood draws per subject. Chi-Square Distribution Here, the signal in the test statistic is a squared quantity, expected to be 1. Area = 0.002 Expect Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002. 5.99 1 Observed = 10.2 95% Chance

  17. Tests on Percentages: Chi-Square The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this. Then, chi-square is found as the sum of standardized ∑ (Observed – Expected)2 / Expected. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big(extreme) to have happened by chance (probability=0.002),i.e., if there is no difference among “all” TBI subjects(H0).

  18. How RARE is being “RARE”? Convention: “Too deviant” is < 5% chance → |t| >~2. Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%? Expect <0.5% <0.5% >99% Chance Observed = 3.28 Answer: Then the chances of missing a real difference are increased, the converse wrong conclusion. This is analogous to setting the threshold for a diagnostic test of disease.

  19. A statistically significant result --- • is not necessarily an important or even interesting result • may not be scientifically interesting or clinically significant. • With large sample sizes, very small differences may turn out to be statistically significant. In such a case, practical implications of any findings must be judged on other than statistical grounds. • Statistical significance does not imply practical significance

  20. How to interpret insignificant p-values • Possible answers 1.There is no difference (H0 is true). 2.There is a real difference (Ha is true) but fail to detect due to small sample size– Type II error • There is no way to determine whether a non-significant difference is the result of a small sample size or because the null hypothesis is correct. • Thus, insignificant P-Values should almost always be regarded as INCONCLUSIVE rather than an indication of no effect. (Fail to reject the null.). • Insignificant p-value does NOT prove H0.

  21. Back to Paper: Normal Range SD = 8.7 SD = 10.8 N = 38 N = 42 What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range?

  22. Back to Paper: Normal Range SD = 8.7 SD = 10.8 N = 38 N = 42 What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range? Answer: 56.2 ± 2(10.8) ≈ 35 to 78

  23. Back to Paper: Confidence Intervals SD = 8.7 SD = 10.8 N = 38 N = 42 SE = 1.41 SE = 1.66 SE(Diff of Means) = 2.2 SE(Diff) ≈ sqrt of [SEM12 + SEM22] Δ= 63.4-56.2= 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients. We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to 11.6.

  24. Sampling distribution and CI • Sampling distribution: A distribution of a statistics (such as a sample mean or a t-test statistics) with repeated sampling from a target population. • We can calculate statistics from one random sample and use that statistics as point estimate for population. • But how precise that statistics is based on the sampling distribution of that statistics • Since a sample mean is used most commonly, the sampling distribution of the mean are used most commonly. • Simulation of a sampling distribution or a confidence interval of the sample mean the sample mean  go to http://onlinestatbook.com/stat_sim/index.html

  25. Confidence Interval • When your study is under powered(e.g., pilot data) or over powered(e.g., national surveys), the confidence interval provide the range for where true effect ( a population parameter) lies. • How well your sample mean (m) reflect the true mean? • Generic form of 95% CI for the mean(proportion) Lower limit: Sample mean(proportion) – 1.96* SE Upper limit: Sample mean (proportion) + 1.96* SE , 1.96* SE also usually called “the margin of the error”. • SE is measures the variability in the sampling distribution of the sample mean (or proportion) from a repeated sampling.

  26. Revisiting the food additives study • 2. Look at the left side of the bottom panel of Figure 3 and recall what we have said about confidence intervals. Would you conclude that there is a change in hyperactivity under Mix A? • 3. Repeat question 2 for placebo.

  27. Revisiting the food additive study cont.

  28. Revisiting the food additive study cont. Possible values for real effect. Zero is “ruled out”.

  29. Revisiting the food additive study cont. 4. Do you think that the positive conclusion for question #3 has been "proven"? Yes, with 95% confidence. 5. Do you think that the negative conclusion for question #2 has been "proven"? No, since more subjects would give a narrower confidence interval. Hypothesis testing make a Yes or No conclusion whether there is an effect and quantifies the chances of a correct conclusion either way. Confidence intervals give possible magnitudes of effects.

  30. Confidence Intervals ↔ Hypothesis tests The food additives study p>0.05 p≈0.05 p<0.05

  31. Confidence Intervals ↔ Hypothesis tests The AI study 95% Confidence Intervals Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences. However, overlapping is not necessary. They can overlap and still groups can differ significantly.

  32. Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It needs to be balanced with the likelihood of wrongly declaring effects when they are non-existent. Today, we have been keeping that error at <5%. Power is the topic for the next session #4.

More Related