250 likes | 370 Views
Data Analysis Workshop. Chuck Spiekerman (cspieker@u) Karl Kaiyala (kkaiyala@u). Course Outline. February 20 How to describe your study Choosing an Analysis method March 13 Student presentations of study designs and data-analysis plans March 20 Student presentations of data analyses.
E N D
Data Analysis Workshop Chuck Spiekerman (cspieker@u) Karl Kaiyala (kkaiyala@u)
Course Outline • February 20 • How to describe your study • Choosing an Analysis method • March 13 • Student presentations of study designs and data-analysis plans • March 20 • Student presentations of data analyses
Describing your study • Next session (3/13) we are asking you to present a description of your planned study • The next few slides give an outline of suggested components of this description • Attention to all these components should help you (and/or a consultant) decide on appropriate methods of statistical analysis
Study Design Description • Specific Aims (what?) • Background (why?) • Previous work (who?)* • Study methods (how?) • several components *optional for student presentations
Specific Aims • Describe the scientific question(s) • Be specific and precise • Stick to the study at hand
Background and Motivation • Relevance of this research • Existing knowledge • Identify gap this research will fill • Relate to specific aims • If part of a larger study, where does this study fit?
Study Methods Components • Primary outcomes • Study population • Methods and procedures * • Data analysis plan *optional for student presentations
Primary Outcomes • Precise definition of key measurement (individual data item) of interest • Justify why this outcome and not something else. • Relate to specific aim • Details of collection can be left to methods and procedures section
Study population • How were the subjects selected? • Exclusion and inclusion criteria • Group classification? • Matching? • Randomization?
Data analysis plan • Outline data analysis for each specific aim • Make clear which procedures are being used toward which aim • Usually some simple tables and plots should be sufficient • Keep it simple
Forming an analysis plan Two important questions • What do you want to do/show? • What kind of data … • …will answer your question best? • … can you get? • … do you have?
Types of data • Continuous • Differences between values have meaning, and are interpretable independent of the values themselves • E.g. difference between 8 and 9 basically the same as difference between 1 and 2. • Ordinal • Values have an order, but differences are not easily interpretable (e.g. good, fair, poor)
Types of data (cont.) • Categorical • Values are descriptive but do not have any obvious ordering. E.g. tx A, tx B, tx C. • Binary, Dichotomous • Fancy names for categorical variables with only two possible values.
Types of data (sampling) • one-sample • Refers to situation when values of interest all come from one group and will be compared to a known quantity (e.g. “change greater than zero”) • two-sample • When data are divided/sampled in two groups and observed values compared between groups.
What do you want to do? • Show evidence of differences • Estimate population parameters • Demonstrate equivalence • Show evidence of association • Create/validate a predictive model • Assess agreement or reliability • Other?
Showing evidence of differences • Standard hypothesis testing procedures, usually comparing means or proportions • Which test will depend on type of data. Usual suspects (YMMV) • T-test or ANOVA for Continuous data • Chi-square test for Categorical data • Rank-based tests (e.g. Wilcoxon) for Ordinal data • Use Rosner flowchart for guidance • Supplement p-value with estimate of difference (with confidence interval)
Estimate Population Parameters • P-values and hypothesis tests aren’t always necessary • Sometimes you don’t really want to compare things but only estimate values • Estimate parameters of interest and supplement with confidence intervals (IMPORTANT!) .
Demonstrate equivalence • In some instances the goal is to show equivalence of, say, two treatments. • Failing to show a difference using a standard hypothesis test is usually not sufficient evidence of equivalence • Two strategies • Estimate difference and show ‘worst cases’ with confidence interval • Compute a standard hypothesis test with very good power (> 95%)
Prediction • Dichotomous outcome • Logistic regression* • Sensitivities, specificities† • ROC curves† (continuous predictor) • Continuous outcome • Linear regression* • “Leave one out” statistics or cross validation† * Predictivemodel building † assessing predictive model
Reliability/Agreement • Kappa statistic is commonly used for categorical data and two raters. • Intra-class correlation coefficient for multiple raters • If you have a ‘gold standard’ it makes the most sense to tabulate percent correct or average distance from correct.
more Reliability/Agreement • If trying to demonstrate agreement between two continuous measures the correlation coefficient is tangential at best • Better to tabulate statistics related to mean pairwise differences between judges • See • Bland JM, Altman DG. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, i, 307-310. • Available at http://www-users.york.ac.uk/~mb55/meas//ba.htm
Other? • Time-to-event data • Kaplan-Meier survival estimate • Cox regression • Other other?
Correlated Data Issues • Data consist of “clusters” of correlated observations. This is common in dental studies (many teeth from same mouth) • Common Solutions? • Collapse data to independent units (patient-level averages) • Adjust for correlation using generalized estimating equations (GEE) or mixed model regression approaches
Homework for Feb. 29 • Following the guidelines presented in class today, present a concise description of your study and planned data analysis to the class. • Plan to keep your talk under ____ minutes • Limited office hours will be available with myself and Dr. Kaiyala to help. Call or email us for appointments.