380 likes | 631 Views
OUTLINE. Study design issuesConstructing hypothesesPower and significance levelSample size determinationDescriptive statisticsAveragesvariability. GENERAL APPROACH. Concepts, not equationsGoal is to increase awareness of statistical considerationsStatistical software widely available to do basic calculationsGood basic reference: Altman DG, Practical Statistics for Medical Research, Chapman
E N D
1. TOPICS IN BIOSTATISTICS: PART 1 Susan S. Ellenberg, Ph.D.
Center for Clinical Epidemiology and Biostatistics
U Penn School of Medicine
3. GENERAL APPROACH Concepts, not equations
Goal is to increase awareness of statistical considerations
Statistical software widely available to do basic calculations
Good basic reference: Altman DG, Practical Statistics for Medical Research, Chapman & Hall/CRC
4. ESTIMATION VS TESTING Sometimes primary goal is to describe data. Then we are interested in estimation. We estimate parameters such as
Means
Variances
Correlations
When primary goal is to draw a conclusion about a state of nature or the result of an experiment, we are interested in statistical testing
5. EXPLORATORY VS CONFIRMATORY STUDIES Exploring patterns in data can be very useful, even if specific hypotheses have not been set up beforehand
Such analyses can generate interesting new hypotheses; can’t generate final conclusions
If you want to be able to make a definitive statement, important to specify hypothesis in very specific terms, in advance of experiment
6. TYPES OF DATA Independent: each observation from a different subject
Paired: two observations (eg, before and after some intervention, left and right eyes) in same subject, or in closely related subjects (eg, siblings for genetics studies)
Clustered: multiple observations on each subject
When designing study and conducting analyses, need to use methods appropriate to data type
7. DESIGNING A (CONFIRMATORY) STUDY First requirement: a specific hypothesis to be tested
What will be measured
Criteria for “success”
Usual convention: establish a “null hypothesis,” then attempt to disprove
Need to be specific—should be no ambiguity about primary hypothesis
Possible ambiguities not always obvious
8. EXAMPLE: STUDYING “LIQUID STITCHES” New material to apply to wound, stop bleeding
Need to study how quickly and effectively bleeding is stopped
Possible outcomes of interest:
9. OTHER AMBIGUITIES Given a specific hypothesis, what statistical test will be used to evaluate that hypothesis?
What potentially confounding variables will be accounted for in the analysis?
size of subject
size of wound
Other demographics: age, gender, …
How will missing data be handled?
10.
12. COMMON PHRASES RELATED TO “MULTIPLE TESTING” Testing to a foregone conclusion
Data dredging
Torturing the data until they confess
13. SIGNIFICANCE LEVEL The significance level is also commonly referred to as
The p-value
The alpha level
The false positive rate
The Type I error
It is defined as the probability of seeing an effect of a specified size just by chance—that is, if there really were no effect at all (the “null hypothesis”)
We must specify a significance level when we design our experiment
14. Bell-Shaped (Normal) Curve
15. ONE-SIDED OR TWO-SIDED A one-sided (or one-tailed) test is one that looks for effects in only one direction
“side” or “tail” refers to extreme end of normal (bell-shaped) curve
If all of the false positive error is put into one of the tails, requirement for significance is less stringent
No wide consensus among statisticians as to when one-sided tests are OK or when two-sided tests are needed
16. VIEWS ON ONE-SIDED TESTING One view: one-sided tests OK when an effect is possible in one direction only
Example: a treatment to increase height
Another view: one-sided tests OK anytime an effect is only of interest in one direction
Example: when evaluating a new drug for regulatory approval, action is taken only if there is a positive effect: negative effects are possible but treated like zero effects
17. POWER The power of a study is the probability that it will yield a statistically significant result if there truly is an effect
1 minus power is the Type II error, or false negative rate, or beta error
Power depends on
The size of the effect
The size of the study
The false positive rate you can live with (if we declare all experiments a success, we will have 100% power but a very high false positive rate)
18. FACTS ABOUT POWER There is always some effect for which power is high, even with a small sample size
For a given sample size, the power to detect and effect is higher when the effect is measured by a continuous variable (eg, lab value) than a yes-no variable (eg, mortality at day 10)
One typically wants a hypothesis-testing study to have power of 80-90%
19. DETERMINING SAMPLE SIZE A study should be large enough so that if there is an effect of a size worth knowing about, the study will demonstrate the effect
To calculate the sample size, need
Effect size of interest
Error rates we will tolerate
Variability of outcome measure
20. COMPARISON OF CONTINUOUS OUTCOMES “Standardize” effect size by dividing effect size of interest to confirm, by expected SD
For given power and significance level, sample size increases rapidly as the desired effect size gets smaller
21. TWO-SIDED 0.05 SIGNIFICANCE LEVEL SAMPLE SIZE BY EFFECT SIZE 0.2 0.4 0.6 0.8
22. COMPARISON OF RATES/PROPORTIONS Need larger sample sizes when trying to detect differences between proportions
Reason: 0-1 data are less informative than continuous data
Use binomial distribution rather than normal distribution
For calculation need to specify difference of interest, expected proportion in control group, and error probabilities
23. two-sided 0.05 significance level SAMPLE SIZE BY POWER AND RATES OF INTEREST Event/success rates pwr=0.80 pwr=0.90
0.20 vs 0.40 182 236
0.40 vs 0.60 214 278
0.10 vs 0.20 438 572
0.20 vs 0.30 626 824
24. DESCRIBING DATA Two basic aspects of data
Centrality
variability
Different measures for each
Optimal measure depends on type of data being described
25. CENTRALITY Mean
Sum of observed values divided by number of observations
Most common measure of centrality
Most informative when data follow normal distribution (bell-shaped curve)
Median
“middle” value: half of all observed values are smaller, half are larger
Best centrality measure when data are skewed
Mode
Most frequently observed value
26. MEAN CAN MISLEAD Group 1 data: 1,1,1,2,3,3,5,8,20
Mean: 4.9 Median: 3 Mode: 1
Group 2 data: 1,1,1,2,3,3,5,8,10
Mean: 3.8 Median: 3 Mode: 1
When data sets are small, a single extreme observation will have great influence on mean, little or no influence on median
In such cases, median is usually a more informative measure of centrality
27. TIME-TO-EVENT DATA In many experiments, outcome of interest is time to some event
Death
Resolution of disease/symptom
First symptom manifestation
Such data are typically not normally distributed; tend to follow an exponential distribution
Data may be truncated (eg, all animals sacrificed at day X, so X is longest observable time)
Medians typically used for such data
28. VARIABILITY Most commonly used measure to describe variability is standard deviation (SD)
SD is a function of the squared differences of each observation from the mean
If the mean is influenced by a single extreme observation, the SD will overstate the actual variability
29. ALTERNATIVE TO SD When using median as centrality measure, can describe variability by providing range (min, max) and interquartile range (25th and 75th percentiles)
Graphical presentation often provides best sense of variability
30. EXAMPLE Group 1 data: 1,1,1,2,3,3,5,8,20
Mean: 4.9 Median: 3
Group 2 data: 1,1,1,3,3,3,5,8,10
Mean: 3.8 Median: 3
SDs: group 1: 6.1 group 2: 3.2
Interquartile range: 1,5
31. CONFIDENCE INTERVALS A confidence interval is intended to provide a sense of the variability of an estimated mean
Can be defined as the set of possible values that includes, with specified probability, the true mean
Confidence intervals can be constructed for any type of variable, but here we consider the most common case of a normally distributed variable
32. CONSTRUCTING A CONFIDENCE INTERVAL First, determine what level of probability should define the interval
Second, find the normal value (or z-value) that corresponds to that probability
99%: 2.58
95%: 1.96
90%: 1.64
Third, multiply the z-value by the standard error of the mean
33. Bell-Shaped (Normal) Curve
34. FACTS ABOUT CONFIDENCE INTERVALS The more sure you want to be that the true value is included in your interval, the wider the interval will be
A 99% confidence interval will be wider than a 95% confidence interval
Most common size confidence interval is 95%, but 90% and even 80% confidence intervals are sometimes used
35. VALUE OF CONFIDENCE INTERVALS Two data sets may have the same mean; but if one data set has 5 observations and the second has 500 observations, the two means convey very different amounts of information
Confidence intervals remind us how uncertain our estimate really is
36. FINAL COMMENTS Statistics are only helpful if the approach taken is appropriate to the problem at hand
Most statistical procedures are based on some assumptions about the characteristics of the data—these need to be checked
Remember GIGO