170 likes | 296 Views
East African. MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE. Regional consortium. Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5 th May 2010. Objectives. Describe level of missing data for key variables
E N D
East African MISSING DATA IN THE INFECTIOUS DISEASES INSTITUTE CLINIC DATABASE Regional consortium Agnes N Kiragga East Africa IeDEA investigators’ meeting 4-5th May 2010
Objectives • Describe level of missing data for key variables • Factors associated with missing for patients on Antiretroviral therapy (ART) • Assess missing data assumptions in observational databases
Assumptions of missing data • “missing completely at random” [MCAR] - not dependent on anything important • blood sample lost or not taken in error • “missing at random” [MAR] • - dependent only on other measured factors, not on the missing (unobserved) value • study specifies blood pressure below a threshold, so after registering a high value, patient is withdrawn [blood pressure at this visit] • “missing not at random” [MNAR] • related to the missing outcome itself • patient withdrew from study because they "didn't feel well“
Registered 23121 Active 15070 Non-ART 13310 ART 9811 • DART 300 9511 Before 2005 1043 After 2005 8468 Study population 04/2000 – 04/2010
Source of CD4 data • Electronic download (86146 (95%) • Recorded (3085 (5%))
Missing baseline variables Note: 1=3mth pre-ART, 2=6mths pre-ART, 3=12mths pre-ART
Number of missing baseline variables Note: a variables include weight, height and CD4 count
Factors associated with missing baseline CD4 count No association with gender, age, weight
CD4 counts at follow-up visits • CD4 tested 6 monthly (± 2 months) • Exclude baseline CD4 counts • Complete CD4 data No. of cd4 test expected >= No. total cd4 Given duration on ART counts observed • Missing CD4 data No. of cd4 test expected ≠ No. total cd4 Given duration on ART counts observed • 1423 (15%)- insufficient follow-up • 8088 (85%) assessed for missing CD4
Categorization of follow-up CD4 data (N= 8088) Categorization | Freq. Percent -------------------------------------+------------------------ complete baseline+ complete follow-up | 2,878 35.58 complete baseline + missing follow-up | 2,529 31.27 missing baseline + complete follow-up | 1,315 16.26 missing baseline + missing follow-up | 1,366 16.89 ------------------------------- -----+------------------------ Total | 8,088 100.00 • Complete baseline + complete f/up + cd4 testing + timely cd4 tests = 864 (10.7%) • Included all nested research cohort patients
Categorization of follow-up CD4 data year of ART initiation for patients with atleast 6 months follow-up n=995 n=2487 n=1174 n=1555 n=960 n=917
Validation of incident Post-ART Tuberculosis cases • Tuberculosis most common opportunistic infection (rate (95% CI) 2.79 (2.45-3.16)) in first 24 months after ART initiation • Merged flagged TB cases with TB drug database • Identified patients on TB treatment • 334 incident post-ART cases
Probability of development of Tuberculosis (TB) by baseline CD4 data 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 .5 1 1.5 2 2.5 analysis time Complete baseline CD4 data Missing baseline_ CD4 data Log rank P<0.435 Assumption 1 Baseline CD4 data Missing completely at random
Probability of development of Tuberculosis (TB) by follow-up CD4 data 0.30 0.25 0.20 0.15 0.10 0.05 0.00 0 .5 1 1.5 2 2.5 analysis time Missing follow cd4 data Complete follow up cd4 data Assumption 2 Baseline CD4 data missing at random
Preliminary Insights from analysis • Reconcile local and IeDEA wide analyses • Baseline CD4 missing completely at random (MCAR) • Follow-up CD4 data missing at random • Ignoring the missing data will lead to biased estimates of ART • Strategies needed to identify patterns and mechanisms of missing data in observational data prior to analysis
Planned analyses • missing data and other HIV outcomes e.g. • immune response • Incidence of other opportunistic infections • toxicity • treatment changes/switches • Strength of nested research cohort can be used to validate imputed data in large database • CD4 trajectories versus mortality -estimate the distribution of CD4 marker trajectories and the distribution of log survival time using mixed-effects models, measuring time from the first pre-HAART CD4