1 / 39

Update from the I4C International Data Coordinating Centre

Update from the I4C International Data Coordinating Centre . Gabriella Tikellis (PhD) Murdoch Childrens Research Institute 5 th International I4C Meeting, IARC 12-13 th November, 2012. I4C International Data Coordinating Centre (IDCC). Location Members Role Update on progress

roch
Download Presentation

Update from the I4C International Data Coordinating Centre

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update from the I4C International Data Coordinating Centre Gabriella Tikellis (PhD) Murdoch Childrens Research Institute 5th International I4C Meeting, IARC 12-13th November, 2012

  2. I4C International Data Coordinating Centre (IDCC) • Location • Members • Role • Update on progress • Cancer cases ascertainment • Cohort data transfer • Pooled data analysis- cleaning and harmonization • Database management • Next steps

  3. I4C member cohorts MoBa - Norway 109,981 DNBC - Denmark 101,042 GEHBC – Germany200,000 ALSPAC - UK 14,042 ELFE – France20,000 BCS - UK100,000 CFCS – China 300,000 CPP - USA 60,000 Bradford - UK10,000 JECS - Japan 100,000 NINFEA – Italy7,500 JPS - Israel 92,408 Wuhan - China120,000 BDSS - China 247,831 NCS - USA 100,000 CIHS - Brazil100,000 Murdoch Childrens Research Institute I4C International Data Coordinating Centre MCRI - VIC100,000 TIHS 10,627

  4. Role of the IDCC

  5. Coordinate the transfer of cohort data to a central location (i.e. MCRI) Maintain and ensure the data is kept secure In collaboration with cohort representatives, work on the cleaning,validation and harmonization of variables from each dataset Develop pooled datasets for analysis Provide statistical support where required Work with cohorts, Working Groups and other members to assist and facilitate the various research and day-to-day activities Provide scientific input into the development of research proposals I4C draft policies_Aug2012

  6. Progress on activities at the IDCC

  7. Childhood cancers • Classification based on* • International Classification of Diseases for Oncology (Cancers) ICD-0 3rd Edition • Leukemia • ICD-0-3 topography (site of origin of a neoplasm) code = C42.1 • ALL • ICD-0-3 morphology (type of cell ) code =9835/3 (4 digits cell type – histology; 1 digit behavior- e.g. 3=malignant primary site) • Classification of cancer based on primary diagnosis *For 5 of the 6 cohorts that linked to cancer registries

  8. Transfer of data on childhood cancers to the IDCC Wang, Ning

  9. Total number of cancer cases at I4C IDCC:Nov 2012 based on all live births

  10. All cancers, leukemia and ALL

  11. Other cancer types

  12. Sources of heterogeneity • Case ascertainment- variety of sources • national, population registries and hospital records • Follow-up for cancer cases varied across cohorts • CPP followed to 7 years • DNBC and MoBa not completed follow up to 15 years

  13. Next steps in cancer ascertainment • Complete classification of all cancer types from cohorts currently contributing data • DNBC, ALSPAC • Work with ongoing cohorts who have the potential to add to the existing pool of cancer cases (short term) • CIP-China, JECS, Wuhan, ELFE etc • Communicate with cohorts in development regarding the importance of detailed information for cases of childhood cancer • Germany, China-CFCS, UK, Brazil, Victorian Birth cohort

  14. Transfer of questionnaire data

  15. 6 cohorts contributing data to the pooled dataset • ALSPAC, UK • CPP, USA • DNBC, Denmark, • JPS, Israel • MoBa, Norway • TIHS, Australia

  16. Current hypotheses under examination 1. Birth weight and childhood cancer • Environmental Birth weight WG: led by Ora Paltiel, Hadassah Medical Organization, Israel • Preliminary draft of paper - to be presented by Ora • Genetic/Epigenetics Working Group led by Zdenko Herceg, Hector Hernandez-Vargas, IARC • Working on blood spots from TIHS and NCS 2. Pesticide exposure and childhood cancer Environmental Pesticide WGs • Occupation – Ann Olsson, Joachim Schüz (IARC) • Examining occupation data from the 6 cohorts to standardize according to ISCO-88 • Residential proximity- Mary Ward, Leslie Stayner (NCI) • Ben Booth (Doctoral student) –examine land cover maps and work on occupation

  17. 3. Maternal prenatal folic acid supplementation and the risk of childhood cancer • Led by Terry Dwyer, Murdoch Childrens Research Institute/IARC • Harmonized data from TIHS, ALSPAC • Working on data from DNBC and MoBa • Folic acid data available from 4/6 cohorts 4. Paternal age and childhood cancer • Led by Jorn Olson, UCLA

  18. Tally of available data at IDCC 380,427- mothers and babies* * Includes 10% subsample from DNBC and MoBa

  19. Cleaning, harmonization and data pooling

  20. Cleaning • Run range checks on all variables • Determining valid ranges e.g. birth weight, placental weight, maternal height • Renaming and labelling variables to be consistent amongst cohorts • Report inconsistencies or questionable values back to the respective cohorts to seek verification • Prepare summary descriptives for each variable including proportion of missing data

  21. Harmonization • Units, variable names same e.g. age at cancer diagnosis, mat_smk (maternal smoking) • Ensure all coding for variables across cohorts are consistent e.g. male=1 female=2 • For heterogeneous variables - need to harmonize e.g. Education- convert to years of education • Categorical variables use same grouping e.g. Paternal age- provided by one cohort in groups • Cleaning and harmonization of variables creates individual datasets for pooling and analysis

  22. Harmonization: key issues (1) • Number of cohorts with data on specific variables e.g. x-ray exposure during pregnancy 4 out of the 6 cohorts collected data • Loss of power - will be reliant on additional data from new cohorts • Proportion of missing data for variables e.g. Data on mother’s education from one cohort=28% • Loss of power -be reliant on additional data from new cohorts • Cohort-specific diversity e.g. education levels, occupation classifications • Creating a standardized classification for occupations- currently being undertaken by Working Group • Definitions of exposures e.g. Passive smoking- live with other people who smoke OR Hours spent in a room exposed to smoke –can be at home, work? • Harmonized variable becomes very general i.e. Any exposure to passive smoking?

  23. Pooled dataset- Birth weight X = data not collected ? = to be verified

  24. Birth-related variables X = data not collected ? = to be verified

  25. Future work- focus on additional variables X = data not collected ? = to be verified

  26. Pooled variables provide a ‘core’ dataset which we can build on for examining other hypothesis

  27. Example: Previous fetal loss and childhood cancers New exposure data

  28. Next steps • Increase power to examine various exposures by incorporating data from additional cohorts • Clean and harmonize data relating to current hypothesis: • Folate and vitamin supplementation • Previous fetal loss • Incorporate data on standardized occupation classification based on outcome from WG • Identify what data is available on exposures relating to new areas of interest such as infections

  29. Progress in data management

  30. Web-based Data Pooling Application at IDCC(developed by Luke Stevens)

  31. Data Pooling Application • MCRI’s secure e-Research portal • Restricted access • I4C team only • Can restrict user access at dataset level • Ongoing development

  32. Data Pooling Application • Run a query • Select variables to download • Select from any dataset • Database joins the datasets returning a combined data file

  33. Data Pooling Application • Select variables to download • Select from any dataset • Database joins the datasets returning a combined data file

  34. Data Pooling Application • Edit or Save your query • Download • Raw data file

  35. In development • Define “pools” that combine • Recordset: selection criteriae.g. live, singleton births, no Down Syndrome • Fieldset: sets of variables from each cohort relevant to a hypothesis e.g. birth measures, cancer outcomes • Download pool data • All records in the pool’s recordset • All variables in the pool’s fieldset or select a sub-set • Search for variables using keywords • Source file version control

  36. Information on I4C • NIH/NCI- I4C portal https://communities.nci.nih.gov/i4c/default.aspx • MCRI- website http://www.mcri.edu.au/research/international-partnerships-collaborations/i4c • National Children’s Study website http://www.nationalchildrensstudy.gov/research/internationalinvolvement/pages/default.aspx

  37. Thanks to ... DNBC Sjurdur Olsen Jorn Olsen Marin Ström Charlotta Granström JPS Ora Paltiel Elena Polanker Ronit Calderon-Margalit MoBa Camilla Stoltenberg Siri Eldevik Håberg Therese Bakke • MCRI • Terry Dwyer • Luke Stevens • Karen Lamb • ALSPAC • Jean Golding • Kate Northstone • CPP • Mark Klebanoff • Logan Spector • NIH/NCI • Martha Linet • Somdat Mahabir

More Related