250 likes | 416 Views
Examination of Analysis Methods for Positive Continuous Dependent Variables: Model Fit and Cost Saving Implications. Brian P Smith Maria De Yoreo Biostatistics Director Department of Applied Mathematics UC Santa Cruz May 22, 2013 Midwest Biostatistics Workshop; Muncie, IN.
E N D
Examination of Analysis Methods for Positive Continuous Dependent Variables: Model Fit and Cost Saving Implications Brian P Smith Maria De YoreoBiostatistics Director Department of Applied Mathematics UC Santa Cruz May 22, 2013Midwest Biostatistics Workshop; Muncie, IN
Personal Motivation • Compositional Data Analysis Using Liouville Distributions … - Forgettable Ph.D. Dissertation by BP Smith • Compositional Data – Multivariate Data That Sum to 1 • Clay – 0.2, Silt - 0.53, Sand - 0.27 • John Aitchison – The Statistical Analysis of Compositional Data • ln odds – ln (x1/x3), ln(x2/x3) – Bivariate Normal
Basic principle • Underlying distribution should match the sample space of the data • If using multivariate normal, then must transform compositional data from Simplex Multivariate Reals • Could use Dirichlet or Liouville
How to follow principle with positive valued data? • log transformation – Positive reals to reals • Yet, colleagues were using natural scale or percent change from baseline • Why? • That was what had always been done • Central limit theorem protection for type 1 error • Easy to show with simulation if true distribution is log-normal and use normal distribution to analyze then there is a power loss
What do the critics think? • Real data is not log-normal or normal • So what factor • Arguing a theoretical argument for a real world problem
Personal Motivation Part 2 • It is generally accepted among statisticians that in a clinical trials the simple use of baseline as a covariate provides more power • More than once with scientist – “What is this analysis of covariance, we should just do percent change from baseline.” • “That is the analysis Jennings did in their paper...” Or “this is what Goodguy Pharmaceuticals did in their NDA” • Me – “But you will lose power” but I have already lost this argument • There appears to me to be a higher appreciation that good design can affect power than good analysis.
What Do I (and Maybe Some of You, if you are like minded) need? • Research that not only suggests that log-transformation is better for positive data • But also quantifies how much better • Research that not only suggests analysis of covariance is better • But also quantifies how much better • This should exist, right? • Not that I can find
What Did We Do? • 70 Continuous Endpoints Analyzed • 10 Analyses Endpoints Each • 4 Phase 1 Studies • 1 Phase 2 Study • 1 Phase 3 Study • 10 Endpoints Chosen from 3 Preclinical Studies
What Did We Do? (cont) • Chose primary or secondary endpoints if continuous 1-3 per study • Remaining 7-9 randomly selected from • ECGs • Vitals • Laboratory Measurements • Variety of endpoints from range of studies chosen in non-subjective manner
The Analyses • All endpoints had repeated observations over time • Used Mixed Effect Model • Random subject effect • Fixed Effects • Treatment • Time • Treatment by Time Interaction • If Cross-over study, additional random effects added • 8 models examined for each endpoint
Three Means of Comparison • For ANCOVA Only • P-value of Covariate • For Log Scale • Compare Likelihoods • For All Analyses • Compare Costs
How to Compare Costs? • Compare Standard Errors of Estimates for Treatment Effect • Determine change in sample size that would be needed under one model to obtain a standard error equivalent to that of another model • Scaling Issue due to log-transformation • If no scaling issue and two models • (se1/se2)2 is how many fold more subjects that analysis 1 would need to have the same standard error as analysis 2
Dealing with the Scaling Issue • Natural Scale • Log Scale – Consider • If start with log scale and work towards natural scale
Which to use? • If data is skewed right then Geometric Mean < Mean • Use of the mean favors the natural scale (most conservative) • Use of geometric mean more consistent with data • We do both but • Prefer Geometric Mean
Back to comparing cost • Is the fold increase in subjects needed for the natural scale to be equivalent to the log-scale • Similar argument for scaling for percent change from baseline
The Case for Log Ratio over Percent Change from Baseline (Cont)
Conclusions • Don’t just trust us, do it yourself • If these results continue to replicate can conclude • If a baseline is available, use of baseline as a covariate should always be undertaken • Although we recommend exploration of data from previous studies, percent change from baseline analyses should not be undertaken unless there is strong empirical evidence that for that endpoint it is preferred • Again with the caveat that nothing replaces exploration of data from previous studies, log-transformation ought to be the default analysis of positive data unless exploration of previous data provides convincing evidence that the natural scale is preferred.