130 likes | 265 Views
Sub-brand to go here. Experiences of managing Birth Cohort Data at CLS. Jon Johnson (Senior Database Manager). CLS is an ESRC Resource Centre based at the Institute of Education. Contents. Introduction (Pre) History Centralised Computing Semi-centralised computing Personal Computing
E N D
Sub-brand to go here Experiences of managing Birth Cohort Data at CLS Jon Johnson (Senior Database Manager) CLS is an ESRC Resource Centre based at the Institute of Education
Contents • Introduction • (Pre) History • Centralised Computing • Semi-centralised computing • Personal Computing • Consequences • Survey Data ‘production line’ • Requirements • Potential Database strategies • Staffing and skills
Introduction • CLS has been an ESRC Resource Centre since 2005. We are responsible for three of the four British Birth Cohort studies • NCDS (1958) • BCS70 (1970) • MCS (2000) • NSHD (1946) is funded by MRC at UCL. www.cls.ioe.ac.uk
(Pre) History • NCDS has its origins in the Perinatal Mortality Survey. Sponsored by the National Birthday Trust Fund, this was designed to examine the social and obstetric factors associated with stillbirth and death in early infancy among the children born in Great Britain in that one week. This was a ‘follow-up’ to the 1946 study with a similar scope. • BCS70 began as the British Births Survey (BBS), and it was sponsored by the National Birthday Trust Fund in association with the Royal College of Obstetricians and Gynaecologists to follow up the 1958 study. • MCS was the specifically designed as a longitudinal survey to follow up upon the three previous birth surveys. www.cls.ioe.ac.uk
Centralised Computing “If one had coded and tried to use all the information received from the 68 questions it is calculated that the results could have been expressed in a vast number of permutations probably in the region of 10 to 480th power” Perinatal Mortality (1963) Four years after the data collection, the tabulations were eventually finalised. Things got faster ... “The first batch of coded forms were sent for punching in October 1970 ... 113,994 punch cards there being a minimum of 6 cards per case. The punching was completed in November 1971” Researchers were reliant on the DP and computer professionals to generate tabulations. www.cls.ioe.ac.uk
Semi-Centralised Computing • In the mid-1970’s, as at first SPSS and then other statistical packages became available. Researchers had the opportunity to use the data prepared and marshalled by the DP and computer scientists to analyse the data themselves using the central computer. • Most users still relied on computer professionals to retrieve and tabulate data. www.cls.ioe.ac.uk
Personal Computing (c1984) • With a powerful 386 computer on your desk and a copy of SPSS researchers could take the raw data and manipulate it for their own purposes. • By the mid 1990’s this process had accelerated to the position where all the data from a survey could be easily handled on a single machine and the need for database professionals could be circumvented. www.cls.ioe.ac.uk
Consequences • A study became snapshots of each survey making its value as a longitudinal resource cumbersome and inefficient to manage • Data fragmentation as derivations became disconnected from original data • Longitudinal linkage discrepancies e.g. Partnership, fertility histories • Coding frame discrepancies • Data security moved from IT to individuals • Meta data was viewed as being separate from data • With the introduction of dependent interviewing these problems would be further increased. www.cls.ioe.ac.uk
Survey Data ‘production line’ www.cls.ioe.ac.uk
Requirement • Migrate and restructure the data back into a database to restore integrity and clean discrepancies • Re-derive variables • Integration of meta-data into data • Create longitudinal checking algorithms • Ability to manipulate data in-situ • Log of changes and version control www.cls.ioe.ac.uk
Potential database strategies www.cls.ioe.ac.uk
Staffing and Skills • At CLS we chose use SIR as our main database and SQL for holding metadata (DDI 2.0 model) • Existing SIR experience • Easy to cross-train from SPSS • Migration of data from SPSS is straight-forward • Security very configurable • Version control and change log easy to implement • Derivations, manipulations done in one place • 3 FTE (mix of skills, data management, DBA) www.cls.ioe.ac.uk
Any questions? Institute of Education University of London 20 Bedford Way London WC1H 0AL Tel +44 (0)20 7612 6000 Fax +44 (0)20 7612 6126 Email info@ioe.ac.uk Web www.ioe.ac.uk