480 likes | 594 Views
Comparing methods for addressing limits of detection in environmental epidemiology. Roni Kobrosly, PhD, MPH Department of Preventive Medicine Icahn School of Medicine at Mount Sinai. A familiar diagram…. Biomarker of Exposure. Biologically Effective Dose. Altered Structure/
E N D
Comparing methods for addressing limits of detection in environmental epidemiology Roni Kobrosly, PhD, MPH Department of Preventive Medicine Icahn School of Medicine at Mount Sinai
A familiar diagram… Biomarker of Exposure Biologically Effective Dose Altered Structure/ Function Environmental Exposure Internal Dose Clinical Disease DeCaprio, 1997
It is difficult to quantify the concentration because it is so low LOD Higher concentration
Handling LODs in analysis • Easiest approach: simply delete these observations • Problems with this: • However, values < LOD are informative: analyte may have a concentration between 0 and LOD • Studies are expensive and you lose covariate data! • Excluding observations from analyses *may* substantially bias results Chen et al. 2011
Handling LODs in analysis • Hornung & Reed describe approach that involves substituting a single value for each observation <LOD • Three suggested substitutions: LOD/2, LOD/√2, or just LOD • Problem: Replacing a sizable portion of the data with a single value increases the likelihood of bias and reduces power! Helsel, 2005; Hughes 2000; Hornung & Reed, 1990
Citations in Google Scholar Hornung & Reed, 1990
Comparing LOD methods • While there are many studies testing individual methods, relatively little work comparing performance of several methods • Even fewer studies have compared methods in context of multivariable data • Comparative studies that do exist provide contradictory recommendations. No consensus!
Simulation Study Objectives • Compare performance of LOD methods when independent variable is subject to limit of detection in multiple regression • Compare performance across a range of “experimental” conditions • Create flowchart to aid researchers in their analysis decision making
Statistical Bias Nat’l Library of Med definition: “Any deviation of results or inferences from the truth” Unbiased Biased
Variable Definitions • Four continuous variables: • Y: Dependent variable (outcome) • X: Independent variable (exposure, subject to LOD) • C1, C2: Independent variables (covariates)
6 “Experimental Conditions” 1) Dataset sample size: n = {100, 500}
2) % of exposure variable with values in LOD region: LOD% = {0.05, 0.25}
3) Distribution of Exposure Variable: Normal versus Skewed
4) R2 of full model: R2 = {0.10, 0.20}
5) Strength & direction of exposure-outcome association: Beta = {-10, 0, 10}
6) Direction of confounding: Strong Positive, versus Strong Negative, versus None + -
LOD methods considered • Deletion of subjects with LOD values • Substitution with LOD/√(2) • Substitution with LOD/2 • Substitution with just LOD value • Multiple imputation (King’s Amelia II) • MLE-imputation method (Helsel & Krishnamoorthy)
Method 2: Sub with LOD/√(2) • LODX = 9.0 • 9.0/√2 = 6.4
Method 3: Sub with LOD/(2) • LODX = 9.0 • 9.0/2 = 4.5
Method 4: Sub with just LOD • LODX = 9.0 • 9.0
Method 5: Multiple Imputation • “Amelia II” by Dr. Gary King • Assumes pattern of observations below LOD only depends on observed data (not unobserved data) • Lets you constrain imputed values (very helpful when working with LODs!)
Method 5: Multiple Imputation • M = 5
Method 5: Multiple Imputation β2 = 9.5 β3 = 8.3 β4 = 12.1 β1 = 10.1 β5 = 10.4 = 10.01
Method 6: MLE-Imputation Assume normal distribution, estimate and Sx
Method 6: MLE-Imputation Use estimated LOD value, , and Sx to randomly generate observations below LOD
Two-step Data Generation Process • 1st Step: Select “true” regression parameters for following two models: • 2nd Step: Use “true” parameters to guide the drawing of random numbers
X = 1.3 - 6(C1) + 1.5(C2) “TRUTH” Y = 2.8 + 2(X) + 4.5(C1) + 6(C2) SIMULATED DATASETS Dataset1.1 Dataset1.2 Dataset1.3
Create a set of “true” parameters Y = 2.8 + 2(X) + 4.5(C1) + 6(C2) Create 1500 simulated datasets for set of “true” parameters, using specific set of experimental conditions Dataset1.1 Dataset1.3 Dataset1.1000 Dataset1.2 Apply a LOD correction method and run regression for each dataset Take difference of estimated coefficient and “true” parameter. Produce 1000 bias estimates with 95% CI’s Bias = 2.2 – 2 = 0.2
Help from Minerva Minerva runtime ~ 5 minutes
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, Negative confounding Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu 8.0 MLE Impu 7.0 Mean Bias (with 95% CI) 6.0 5.0 4.0 3.0 2.0 1.0 0 -1.0 -2.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, Negative confounding Deletion LOD/sqrt(2) LOD/2 LOD 2.0 Multi Impu 1.0 MLE Impu 0 Mean Bias (with 95% CI) -1.0 -2.0 -3.0 -4.0 -5.0 -6.0 -7.0 -8.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, No confounding Deletion LOD/sqrt(2) LOD/2 LOD 1.0 Multi Impu 0.8 MLE Impu 0.6 Mean Bias (with 95% CI) 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, No confounding Deletion LOD/sqrt(2) LOD/2 LOD 1.0 Multi Impu 0.8 MLE Impu 0.6 Mean Bias (with 95% CI) 0.4 0.2 0 -0.2 -0.4 -0.6 -0.8 -1.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Negative X-Y Association, Positive confounding Deletion LOD/sqrt(2) LOD/2 LOD Multi Impu 8.0 MLE Impu 7.0 Mean Bias (with 95% CI) 6.0 5.0 4.0 3.0 2.0 1.0 0 -1.0 -2.0
n = 100, 25% LOD, Skewed Dist, R2 = 0.20, Positive X-Y Association, Positive confounding Deletion LOD/sqrt(2) LOD/2 LOD 2.0 Multi Impu 1.0 MLE Impu 0 Mean Bias (with 95% CI) -1.0 -2.0 -3.0 -4.0 -5.0 -6.0 -7.0 -8.0
An overview of results • Relative bias of methods is highly dependent on experimental conditions (i.e. no simple answers) • Covariates and confounding matters! Simulations that only consider bivariate, X-Y relationships with LODs are limited
Deletion method results • Surprisingly… provides unbiased estimates across all conditions! • If sample size is large and LOD% is small, this may be a good option. As LOD% becomes larger, deletion is more costly • Important caveat: deletion method works well if true associations are linear
Deletion method with linear effects Bottom 8% of X variable deleted
Substitution method results • Not surprisingly… these methods are generally terrible! • Just LOD substitution is worst type • In most scenarios, these will bias associations towards the null • … but, works reasonably well when distribution is highly skewed, no confounding, and LOD% is low
Multiple Imputation results • Amelia II performs relatively well! Particularly when R2 is higher • Does well even when LOD% is high • Problematic when there is no confounding (reason: this indicates there are no/weak associations between variables)
MLE Imputation results • Associated with severe bias in most cases • Highly reliant on parametric assumptions and the code is daunting: recommend avoiding this method • However, performed reasonably well when exposure is normally distributed, no confounding, and LOD% is low
Sarah’s SFF Analysis • Study for Future Families (SFF): a multicenter pregnancy cohort study that recruited mothers from 1999-2005 • Sarah Evans’ analysis: prenatal exposure to Bisphenol A (BPA) and neurobehavioral scores in 153 children at ages 6-10 • 28 (18%) children have BPA levels below the LOD
Sarah’s SFF Analysis • Maternal urinary BPA collected during late pregnancy • Neurobehavioral scores obtained through School-age Child Behavior Checklist (CBCL). • Used multiple regression adjusting for child age at CBCL assessment, mother’s education level, family stress, urinary creatinine
LOD/sqrt(2) Deletion -0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8 1.0