300 likes | 435 Views
Sub-brand to go here. Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme. Jane Elliott Director of the Centre for Longitudinal Studies and Director of CLOSER J.Elliott@ioe.ac.uk. Summary. A brief overview of CLOSER
E N D
Sub-brand to go here Data harmonisation and the value of cross-cohort analysis: an overview of the new CLOSER programme Jane Elliott Director of the Centre for Longitudinal Studies and Director of CLOSER J.Elliott@ioe.ac.uk
Summary • A brief overview of CLOSER • Early progress on harmonisation work packages • biological structure • Socioeconomic status and qualifications • Uniform Search Platform • Contextual database • Benefits of cross cohort analysis
Cohorts and Longitudinal Studies Enhancement Resources = CLOSER • Nine Longitudinal Studies • Hertfordshire Cohort Study • 1946 British Birth Cohort • 1958 British Birth Cohort • 1970 British Birth Cohort • ALSPAC – Avon Longitudinal Study of Parents and Children • Millennium Cohort Study • Southampton Women’s Study • Life Study • Understanding Society • Funded by ESRC and MRC
Objectives & timetable • Maximise the use, value and impact of data collected through a portfolio of key UK longitudinal studies • Stimulate interdisciplinary research across major longitudinal studies • Provide common resources for research • Assist with training and development • Share information and expertise between study teams • 1st October 2012 – 30th September 2017
Work streams • 4 work packages on data harmonisation • 3 work packages on data linkage • Core work on • Impact – Lead by the British Library • Training and Capacity Building • Uniform Search platform • Leadership team contributing to strategic planning, sharing of best practice, funders’ strategies • See our website: www.CLOSER.ac.uk for further information • Twitter: @CLOSER_UK
Leadership team WP2: Harmonisation socio-economic resources WP1: Harmonisation of biological structure and function WP7: Data linkage – health data WP5: Data linkage administrative data WP3: Harmonisation analysis of biological samples WP4: Harmonisation measures of vision WP6: Data linkage - geography 1946 cohort 1958 cohort 1970 cohort ALSPAC Impact MCS Understanding Society Metadata SWS Training and capacity building HCS Life Study Uniform Search Platform
Vision for the USP • Portal to discovery of hundreds of thousands of variables, questions and data collection instruments across the nine longitudinal studies: • covering survey and biomedical data collection • promoting CLOSER harmonisation work • state-of-the-art searching tool • focus on improving visibility of associations between (currently) disparate metadata items • shared subject/topic classification • We should remember that this is massively ambitious; something that matches or surpasses the best multi-study metadata repository out there: • RAND Survey Meta Data Repository covering the HRS • family of studies: https://mmicdata.rand.org/megametadata/
Why do it? Benefits to users: single resource discovery portal – replacing a fractured resource discovery landscape lowers barriers to conducting cross-cohort analysis increased visibility of cohort data and resources Benefits to data managers: standardised metadata management workflows – currently curated in isolation workflows in place for future ‘joiners’ Benefits to Principal Investigators/survey commissioners: make prospective harmonisation easier promotion and re-use of tested questions and instruments
Assumptions, constraints Not a data repository Not a major software development project: major £££ is for metadata creation/enhancement DDI-L agreed as standard for metadata exchange: covers subject areas (bio and soc science) and data collection methods (‘hard’ instrument and survey) designed for marking-up longitudinal/repeated metadata items Colectica Designer selected as preferred metadata ingest/editing software
Challenges • Legacy metadata: • elderly and decrepit! • not always designed for equivalence within a study, much less across studies • differing or non-existent naming conventions • substantial (manual) effort required to establish equivalences and level of equivalence • Metadata managed by five or six different units: different formats, workflows, vocabularies • Relative lack of familiarity with DDI-L: • uneven knowledge across study units
Metadata: State of play • >200k variables • c.150 data collections: • CAI, PAPI, nurse visit, clinic-based protocol, biosamples, etc. • c.85 validated survey instruments • GHQ, AUDIT, Malaise Inventory, etc. • c.10 instruments used in >1 study • c.20 validated clinical measures • blood pressure, bone density, lung function, etc. • range of instruments used • c.15 cognitive or physical tests
How to do it? USP will be a web interface that sits on top of a central repository fed by metadata created and delivered both by the individual study units and the CLOSER core Study units continue to curate metadata as they see fit; but not in conflict with proposed USP metadata profile Substantial metadata creation and enhancement to be undertaken by the study units: inputting historical questionnaires; mapping between data items and data collection CLOSER core responsible for identifying common (cross-study) variable and question schemes, allowing studies to reference these and also any agreed controlled vocabularies (concept, life stage etc.)
Contextual database - rationale • Life course approach stresses the importance of the connection between individuals and the historical and socioeconomic context in which these individuals lived • But some research based on cohort studies pays little attention to the social, economic or historical context that helps shape the lives of individuals • Some data on social change and social context will come from the studies themselves (e.g. Breast feeding) • Aim of the contextual database is to provide a central source of key indicators over time likely to be of direct relevance to cohort research
Source: Changing Britain Changing Lives : Three generations at the turn of the century Table 8.3 (Wadsworth et al)
Proportion of women in paid employment, by age and cohort Source: Jenny Neuburger - Paper presented at CLS June 2008
Contextual database - elements Also want to include policy narratives and a bibliography
Work package 1 Biological structure and function Two years March 2013- February 2015 William Johnson & Rebecca Hardy MRC Unit for Lifelong Health and Ageing Body size and composition Cognitive performance Blood pressure Physical capability
Research priority Body size - because of the obesity epidemic and the long term consequences of adiposity on health & well-being Need for harmonisation:
First papers Compare body size distributions and mean trajectories, across different phases of the life course, between cohorts Investigate how SEP inequalities in body size trajectories, across different phases of the life course, differ between cohorts Li L et al. Am J Epidemiol. 2008 Howe LD et al. JECH. 2012
Studies 0 1 3 5 7 0 7 8 9 10 11 12 13 15 18 0 5 10 16 26 30 34 0 7 11 16 23 33 42 44 50 0 2 4 6 7 11 15 20 26 36 43 53 60-64
Challenges Between studies: Data covering different age ranges Data increasingly positively skewed in more recent studies Within individuals: Different number of observations at different exact ages Different precision of data Within and between individuals: Both measured and self-report data
What we are aiming to achieve: 1) Demonstration research project focussing on socioeconomic differences in growth and obesity across cohorts 2) A harmonised dataset, with accompanying documentation for other users
Socio-economic data harmonisation work package • Claire Crawford, Brian Dodgeon, Tim Morris, Sam Parsons, Anna Vignoles (lead) • Two years April 2013- March 2015
What measures? • Measures to be harmonised are: • parental education level • cohort member level of education • socio-economic (occupation) status • household equivalised income • home ownership • Cohorts: NSHD; NCDS; BCS; ALSPAC; MCS
Priority Measures agreed • Highest qualification (vocational/academic separately) held at every age • Age left full time education • Whether the person went past compulsory schooling • Average GCSE score or equivalent • GCSE Grades in mathematics and English (not for all cohorts) • For cohort member parents - age left full time education and highest qualification at birth of CM • Grandparents’ age left school
The value of cross-cohort analysis • A meta-narrative of societal change over time • Creating a synthetic life course – understanding life time trajectories • Investigate cohort effects - examining the impact of different social and policy contexts • Replication of results – checking the robustness of models • Larger N and greater power • Decompose age and period effects
Lifetime systolic blood pressure trajectories and velocities (predicted means) Men Women Wills et al. PLOS Med, 2011