970 likes | 1.2k Views
How to Learn Everything You Ever Wanted to Know About Biostatistics. Daniel W. Byrne Director of Biostatistics and Study Design General Clinical Research Center Vanderbilt University Medical Center. The presenter has no financial interests in the products mentioned in this talk.
E N D
How to Learn Everything You Ever Wanted to Know About Biostatistics Daniel W. Byrne Director of Biostatistics and Study Design General Clinical Research Center Vanderbilt University Medical Center The presenter has no financial interests in the products mentioned in this talk.
Objective of This Workshop • To provide a 1-hour overview of the important practical information that a clinical investigator needs to know about biostatistics to be successful.
Install a powerful, yet easy to use, statistical software package on your computer. • I recommend SPSS for Windows. • Bring an 1180 for $80 to Karen Montefiori in 143 Hill Student Center (3-1630). • She will lend you the SPSS CD for the day and you can install this software easily.
SPSS is the 2nd most popular package. It is much easier to use than SAS and Stata.
Install additional software for statistical “odds and ends” • Instat by GraphPad – graphpad.com • for summary data analysis - $100 • True Epistat by Epistat Services – true-epistat.com - $395 • for random number table, etc. • CIA (Confidence Interval Analysis) – bmj.com • for confidence intervals - $35.95 with book • “Statistics with Confidence” D. Altman
Install a sample size program. • If you can afford to spend $400, buy nQuery Advisor – statistical solutions - www.statsol.com • If you can afford to spend $0, download PS from the Vanderbilt web site – • http://www.mc.vanderbilt.edu/prevmed/ps/index.htm • Both packages are on the CRC’s statistical workstation in room A-3101. VUMC investigators are welcome to use this workstation.
Use the scientific method to keep your project focused. • State the problem • Formulate the null hypothesis • Design the study • Collect the data • Interpret the data • Draw conclusions
State the Problem • Among patients hospitalized for a hip fracture who develop pneumonia during their stay in the hospital, the mortality rate is 2.3 times higher at non-trauma centers compared with trauma centers • (48.7% vs. 21.1%, P=0.043.) • It is not clear if, or how, those who will develop pneumonia could be identified on admission.
Formulate the Null Hypothesis • Among patients hospitalized for treatment of a hip fracture, there are no factors known upon admission that are statistically different between those who develop pneumonia during their stay and those who do not.
Why bother with a null hypothesis? • For the same reason that we assume that a person is innocent until proven guilty. • The burden of responsibility is on the prosecutor to demonstrate enough evidence for members of a jury to be convinced of that the charges are true and to change their minds. • Outcome after treatment with Drug A will not be significantly different from placebo.
Design the Study • Data on 933 patients with a hip fracture from a New York trauma registry will be analyzed. • The 58 patients with pneumonia will be compared with the 875 without pneumonia.
Example of Recall Bias • A control group is asked, • “Two weeks ago from today, did you eat X for breakfast?” • Two weeks after their MI, patients are asked • “Did you eat X for breakfast on the day of your heart attack?” • You can prove any food causes an MI using this method (X=bacon, X=Flintstone vitamins, etc.)
John Bailar’s Quote: • “Study design and bias are much more important than complex statistical methods.” • Devote more time to improving the study design, and minimizing and measuring bias. • Become an expert at study design issues and biases in your area of research.
What is the statistical power of the study? • Power • Beta • Alpha • Sample size • Ratio of treated to control group • Measure of outcome
Sample Size Table • See Table 9-1 in the handout • “Sample Size Requirements for Each of Two Groups”.
Collect the Data • See the handouts for: • I TEC Trauma Systems Study
Enter your data with statistical analysis in mind. • For small projects enter data into Microsoft Excel or directly into SPSS. • For large projects, create a database with Microsoft Access. • Keep variables names in the first row, with <=8 characters, and no internal spaces. • Enter as little text as possible and use codes for categories, such as 1=male, 2=female.
Descriptive vs. Inferential • Descriptive statistics summarize your group. • average age 78.5, 89.3% white. • Inferential statistics use the theory of probability to make inferences about larger populations from your sample. • White patients were significantly older than black and Hispanic patients, P<0.001.
Import your data into a statistical program for screening and analysis.
Screen your data thoroughly for errors and inconsistencies before doing ANY analyses. • Check the lowest and highest value for each variable. • For example, age 1-777. • Look at histograms to detect typos. • Cross-check variables to detect impossible combinations. • For example, pregnant males, survivors discharged to the morgue, patients in the ICU for 25 days with no complications.
Analyze, descriptive statistics, frequencies, select the variable
Correct the data in the original database or spreadsheet and import a revised version into the statistical package. • The age of 777 should be checked and changed to the correct age. • Suspicious values, such as an age of 106 should be checked. In this case it is correct.
P Value • A P value is an estimate of the probability of results such as yours could have occurred by chance alone if there truly was no difference or association. • P < 0.05 = 5% chance, 1 in 20. • P <0.01 = 1% chance, 1 in 100. • Alpha is the threshold. If P is < this threshold, you consider it statistically significant.
Basic formula for inferential tests • Based on the total number of observations and the size of the test statistic, one can determine the P value.
How many noise units? • Test statistic & sample size (degrees of freedom) convert to a probability or P Value.
Use inference statistics to test for differences and associations. • There are hundreds of statistical tests. • A clinical researcher does not need to know them all. • Learn how to perform the most common tests on SPSS. • Learn how to use the statistical flowchart to determine which test to use.
VI. You Will Need to Understand the Statistical Terminology Required to Select the Proper Inferential Test
Univariate vs. Multivariate • Univariate analysis usually refers to one predictor variable and one outcome variable • Is gender a predictor of pneumonia? • Multivariate analysis usually refers to more than one predictor variable or more than one outcome variable being evaluated simultaneously. • After adjusting for age, is gender a predictor of pneumonia?
Difference vs. Association • Some tests are designed to assess whether there are statistically significant differences between groups. • Is there a statistically significant difference between the age of patients with and without pneumonia? • Some tests are designed to assess whether there are statistically significant associations between variables. • Is the age of the patient associated with the number of days in the hospital?
Unmatched vs. Matched • Some statistical tests are designed to assess groups that are unmatched or independent. • Is the admission systolic blood pressure different between men and women? • Some statistical tests are designed to assess groups that are matched or data that are paired. • Is the systolic blood pressure different between admission and discharge?
Level of Measurement • Categorical vs. continuous variables • If you take the average of a continuous variable, it has meaning. • Average age, blood pressure, days in the hospital. • If you take the average of a categorical variable, it has no meaning. • Average gender, race, smoker.
Level of Measurement • Nominal - categorical • gender, race, hypertensive • Ordinal - categories that can be ranked • none, light, moderate, heavy smoker • Interval - continuous • blood pressure, age, days in the hospital
Horse race example • Nominal • Did this horse come in first place? • 0=no, 1=yes • Ordinal • In what position did this horse finish? • 1=first, 2=second, 3=third, etc. • Interval (scale) • How long did it take for this horse to finish? • 60 seconds, etc.
Normal vs. Skewed Distributions • Parametric statistical test can be used to assess variables that have a “normal” or symmetrical bell-shaped distribution curve for a histogram. • Nonparamettric statistical test can be used to assess variables that are skewed or nonnormal. • Look at a histogram to decide.
Flowchart of common inferential statistics • See the handout, Figure 16-1, pages 78-79.