1 / 25

Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States

This study explores the use of multiple imputation to address missing race data in a pre-invasive cervical cancer study across three states. It compares the results with a complete case method and examines the correlation between race and cervical cancer.

rickiea
Download Presentation

Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Imputation and Missing Race in the Pre-Invasive Cervical Cancer Study among Three States 2010 NAACCR Conference Quebec City, June 22, 2010 Bin Huang Kentucky Cancer Registry University of Kentucky

  2. The Pre-invasive Cervical Cancer Study • HPV vaccine • Quadrivalent vaccine licensed for females in June 2006 • ACS developed the guideline for HPV vaccine use June 2007 • Anticipated reductions in cervical cancers, other anogenital cancers • Need for surveillance systems • Collection of population data for pre-invasive cervical cancer cases • Monitoring effectiveness and efficacy • CDC funded study • Includes three cancer registries – Michigan, Kentucky, Louisiana • Pre-pilot period (Sept-Dec 2008) • Data collection Jan 2009-Dec 2009

  3. Missing Data In the Study • Missing data issue • Race : 30% missing. • Overall cases with complete data: 68.7% • Potential to cause bias or lead to inefficient analyses.

  4. Missing Data Mechanism • Missing completely at random (MCAR).  • The missingness is independent of both the missing response and the observed response. • Missing at random (MAR). • The missingness is independent of the missing response given the observed values. • Not missing at random (NMAR) . • The missingness depends on both observed and missing responses.

  5. Methods to Treat Missing Data Available Case Methods • Complete case method (listwise deletion). • Pairwise deletion Single Imputation methods • Mean substitution • Hot deck imputation • Regression substitution Modern Approaches • Maximum Likelihood (ML) method • Bayesian method • Multiple Imputation (MI)

  6. Multiple Imputation (MI) MI is a three-step approach to estimation for incomplete data, first proposed by Rubin in 1977. MI assumes missing data are MAR. • Imputation - the missing data are filled in m times to generate m complete data sets. Imputation model preserves the distributional relationship between the missing values and the observed values. • Analysis - the m complete data sets are analyzed separately using standard statistical analyses. • Combination - the results from the m complete data sets are combined to produce inferential results.

  7. Software Available • SAS • PROC MI; PROC MIANALYZE. • MCMC option - assumption of multivariate normality. • SOLAS (Statistical Solutions Inc) • Same assumption as SAS Proc MI. • S-Plus: NORM • IVEware: SAS callable • PROC IMPUTE; PROC DESCRIBE; PROC REGRESS • Does not assume multivariate normality.

  8. Aim of the Study • To impute the missing race with MI • To examine the difference of estimates between complete case method and the MI method • Percentage of race • The correlation between having AIS and Race.

  9. Data – Pre-Cervical Cancer Cases • Three states – Kentucky, Louisiana and Michigan • Total – 3843 • Kentucky: 953 (24.8%), Louisiana: 653 (17.0%), Michigan: 2237 (58.2%) • Variables (17) • Demographics: race, address, age, ethnicity • Data sources: reporting facility, facility type, time at diagnosis • Disease data: site, histology code, histology terminology code, sequence code • Added variable (2) – 2000 US Census • % of Whites at county level • % of Blacks at county level

  10. Data Collection Process

  11. Descriptive Analysis

  12. Descriptive Analysis (cont.)

  13. Comparison Among The Three States

  14. Missing Cases – Race, State at Diagnosis, County at Diagnosis

  15. Comparison Between Known and Unknown Races

  16. MI Methods • IVEware and SAS PROC MI • Used both methods • Only results from IVEware are presented • IVEware: http://www.isr.umich.edu/src/smp/ive/

  17. Missing Pattern – All States

  18. Associations Multivariate logistic regression showed: • Race is significantly associated with ethnicity, histological terminology type, age, state. • Most notably, percent of race at county level is most dominate variable predicting race.

  19. Imputation Model • Variables includes race, registry, age, ethnicity, facility type, site, histology terminology code, sequence code, percentages of races at county level • 10 imputation sets

  20. Frequency of Race

  21. Logistics Regression Analysis with AIS Status as the Dependent Variable

  22. Summary • The high percentage of cases with missing race likely introduced bias to the estimate of proportion of race, mainly among data from Michigan. • The results shows that whites have much higher risk of getting AIS than blacks. • Quantitative differences in estimates between the two methods were found in the logistic model. • MI is relatively easy to implement and is appropriate for a wide range of datasets.

  23. Acknowledgements CDC – DeblinaDatta and staff Kentucky Cancer Registry: Thomas Tucker, Mary Jane Byrne, Brent Shelton Michigan Cancer Registry: Glenn Copland, Won Silva and staff Louisiana Cancer Registry: Vivien Chen and staff Macro International - Benita O’Colma

  24. Words to Share John Wooden - “Be quick, but don’t hurry” “If you don’t have time to do it right, how will you find time to do it again?”

  25. Questions? Bin Huang bhuang@kcr.uky.edu 859-219-0773 x 280 Thank You ! Merci !

More Related