390 likes | 552 Views
Update from the I4C International Data Coordinating Centre . Gabriella Tikellis (PhD) Murdoch Childrens Research Institute 5 th International I4C Meeting, IARC 12-13 th November, 2012. I4C International Data Coordinating Centre (IDCC). Location Members Role Update on progress
E N D
Update from the I4C International Data Coordinating Centre Gabriella Tikellis (PhD) Murdoch Childrens Research Institute 5th International I4C Meeting, IARC 12-13th November, 2012
I4C International Data Coordinating Centre (IDCC) • Location • Members • Role • Update on progress • Cancer cases ascertainment • Cohort data transfer • Pooled data analysis- cleaning and harmonization • Database management • Next steps
I4C member cohorts MoBa - Norway 109,981 DNBC - Denmark 101,042 GEHBC – Germany200,000 ALSPAC - UK 14,042 ELFE – France20,000 BCS - UK100,000 CFCS – China 300,000 CPP - USA 60,000 Bradford - UK10,000 JECS - Japan 100,000 NINFEA – Italy7,500 JPS - Israel 92,408 Wuhan - China120,000 BDSS - China 247,831 NCS - USA 100,000 CIHS - Brazil100,000 Murdoch Childrens Research Institute I4C International Data Coordinating Centre MCRI - VIC100,000 TIHS 10,627
Coordinate the transfer of cohort data to a central location (i.e. MCRI) Maintain and ensure the data is kept secure In collaboration with cohort representatives, work on the cleaning,validation and harmonization of variables from each dataset Develop pooled datasets for analysis Provide statistical support where required Work with cohorts, Working Groups and other members to assist and facilitate the various research and day-to-day activities Provide scientific input into the development of research proposals I4C draft policies_Aug2012
Childhood cancers • Classification based on* • International Classification of Diseases for Oncology (Cancers) ICD-0 3rd Edition • Leukemia • ICD-0-3 topography (site of origin of a neoplasm) code = C42.1 • ALL • ICD-0-3 morphology (type of cell ) code =9835/3 (4 digits cell type – histology; 1 digit behavior- e.g. 3=malignant primary site) • Classification of cancer based on primary diagnosis *For 5 of the 6 cohorts that linked to cancer registries
Transfer of data on childhood cancers to the IDCC Wang, Ning
Total number of cancer cases at I4C IDCC:Nov 2012 based on all live births
Sources of heterogeneity • Case ascertainment- variety of sources • national, population registries and hospital records • Follow-up for cancer cases varied across cohorts • CPP followed to 7 years • DNBC and MoBa not completed follow up to 15 years
Next steps in cancer ascertainment • Complete classification of all cancer types from cohorts currently contributing data • DNBC, ALSPAC • Work with ongoing cohorts who have the potential to add to the existing pool of cancer cases (short term) • CIP-China, JECS, Wuhan, ELFE etc • Communicate with cohorts in development regarding the importance of detailed information for cases of childhood cancer • Germany, China-CFCS, UK, Brazil, Victorian Birth cohort
6 cohorts contributing data to the pooled dataset • ALSPAC, UK • CPP, USA • DNBC, Denmark, • JPS, Israel • MoBa, Norway • TIHS, Australia
Current hypotheses under examination 1. Birth weight and childhood cancer • Environmental Birth weight WG: led by Ora Paltiel, Hadassah Medical Organization, Israel • Preliminary draft of paper - to be presented by Ora • Genetic/Epigenetics Working Group led by Zdenko Herceg, Hector Hernandez-Vargas, IARC • Working on blood spots from TIHS and NCS 2. Pesticide exposure and childhood cancer Environmental Pesticide WGs • Occupation – Ann Olsson, Joachim Schüz (IARC) • Examining occupation data from the 6 cohorts to standardize according to ISCO-88 • Residential proximity- Mary Ward, Leslie Stayner (NCI) • Ben Booth (Doctoral student) –examine land cover maps and work on occupation
3. Maternal prenatal folic acid supplementation and the risk of childhood cancer • Led by Terry Dwyer, Murdoch Childrens Research Institute/IARC • Harmonized data from TIHS, ALSPAC • Working on data from DNBC and MoBa • Folic acid data available from 4/6 cohorts 4. Paternal age and childhood cancer • Led by Jorn Olson, UCLA
Tally of available data at IDCC 380,427- mothers and babies* * Includes 10% subsample from DNBC and MoBa
Cleaning • Run range checks on all variables • Determining valid ranges e.g. birth weight, placental weight, maternal height • Renaming and labelling variables to be consistent amongst cohorts • Report inconsistencies or questionable values back to the respective cohorts to seek verification • Prepare summary descriptives for each variable including proportion of missing data
Harmonization • Units, variable names same e.g. age at cancer diagnosis, mat_smk (maternal smoking) • Ensure all coding for variables across cohorts are consistent e.g. male=1 female=2 • For heterogeneous variables - need to harmonize e.g. Education- convert to years of education • Categorical variables use same grouping e.g. Paternal age- provided by one cohort in groups • Cleaning and harmonization of variables creates individual datasets for pooling and analysis
Harmonization: key issues (1) • Number of cohorts with data on specific variables e.g. x-ray exposure during pregnancy 4 out of the 6 cohorts collected data • Loss of power - will be reliant on additional data from new cohorts • Proportion of missing data for variables e.g. Data on mother’s education from one cohort=28% • Loss of power -be reliant on additional data from new cohorts • Cohort-specific diversity e.g. education levels, occupation classifications • Creating a standardized classification for occupations- currently being undertaken by Working Group • Definitions of exposures e.g. Passive smoking- live with other people who smoke OR Hours spent in a room exposed to smoke –can be at home, work? • Harmonized variable becomes very general i.e. Any exposure to passive smoking?
Pooled dataset- Birth weight X = data not collected ? = to be verified
Birth-related variables X = data not collected ? = to be verified
Future work- focus on additional variables X = data not collected ? = to be verified
Pooled variables provide a ‘core’ dataset which we can build on for examining other hypothesis
Example: Previous fetal loss and childhood cancers New exposure data
Next steps • Increase power to examine various exposures by incorporating data from additional cohorts • Clean and harmonize data relating to current hypothesis: • Folate and vitamin supplementation • Previous fetal loss • Incorporate data on standardized occupation classification based on outcome from WG • Identify what data is available on exposures relating to new areas of interest such as infections
Web-based Data Pooling Application at IDCC(developed by Luke Stevens)
Data Pooling Application • MCRI’s secure e-Research portal • Restricted access • I4C team only • Can restrict user access at dataset level • Ongoing development
Data Pooling Application • Run a query • Select variables to download • Select from any dataset • Database joins the datasets returning a combined data file
Data Pooling Application • Select variables to download • Select from any dataset • Database joins the datasets returning a combined data file
Data Pooling Application • Edit or Save your query • Download • Raw data file
In development • Define “pools” that combine • Recordset: selection criteriae.g. live, singleton births, no Down Syndrome • Fieldset: sets of variables from each cohort relevant to a hypothesis e.g. birth measures, cancer outcomes • Download pool data • All records in the pool’s recordset • All variables in the pool’s fieldset or select a sub-set • Search for variables using keywords • Source file version control
Information on I4C • NIH/NCI- I4C portal https://communities.nci.nih.gov/i4c/default.aspx • MCRI- website http://www.mcri.edu.au/research/international-partnerships-collaborations/i4c • National Children’s Study website http://www.nationalchildrensstudy.gov/research/internationalinvolvement/pages/default.aspx
Thanks to ... DNBC Sjurdur Olsen Jorn Olsen Marin Ström Charlotta Granström JPS Ora Paltiel Elena Polanker Ronit Calderon-Margalit MoBa Camilla Stoltenberg Siri Eldevik Håberg Therese Bakke • MCRI • Terry Dwyer • Luke Stevens • Karen Lamb • ALSPAC • Jean Golding • Kate Northstone • CPP • Mark Klebanoff • Logan Spector • NIH/NCI • Martha Linet • Somdat Mahabir