320 likes | 468 Views
Matching PLASC and ALSPAC PLASC/NPD User Group Workshop 13 th September 2006. Andy Boyd (a.w.boyd@bristol.ac.uk) David Herrick (david.herrick@bristol.ac.uk). What is ALSPAC?. “Avon Longitudinal Study of Parents and Children”
E N D
Matching PLASCand ALSPACPLASC/NPD User Group Workshop13th September 2006 • Andy Boyd(a.w.boyd@bristol.ac.uk) • David Herrick(david.herrick@bristol.ac.uk)
What is ALSPAC? • “Avon Longitudinal Study of Parents and Children” • Cohort study of children and their parents, based in south-west England • Designed to determine ways in which the individual’s genotype combines with environmental pressures to influence health and development
Study design • Eligibility criteria: Mothers had to be resident in Avon and have an expected date of delivery between 1st April 1991 and 31st December 1992 • Avon was broadly representative of the UK as a whole and has a relatively stable population • Enrolled sample of 14,541 pregnancies resulting in 14,062 live born children
Data • Self Completion Questionnaires • Hands on Measurements • Biological Samples • Health Records • Education Records • Direct School Contact
Educational Data - Primary • Contact with ~350 primary schools in the four local LEAs: • Bristol • South Gloucestershire • North Somerset • Bath and North East Somerset • Private & special schools included • Parental contact for out of area cases
Educational Data - Primary • Questionnaires in Year 3 & Year 6: • School (Head teacher) • Class (Class teacher) • Child (Class teacher) • Year 4 test: Maths • Year 6 tests: Maths, Spelling, Science
Educational Data - Secondary • Questionnaire for maths teachers in 2002/3 (Year 7) & 2004/5 (Years 7, 8 & 9) and associated class lists • Year 6 maths test repeated in Year 8 • Moving away from direct school contact
Educational Data - SATS • Entry Assessment & KS1 data on eligible children at local schools acquired directly from the LEAs • Linkage to NPD: • Increased coverage • Easier linking (UPN) • PLASC as well
Study Approval & Cohort Matching • Ethics & study approval • The Fischer Trust • Validating the cohort match • Anonymizing the data set • Issues encountered
Ethics & Study Approval • ALSPAC Ethics & Law committee • LREC (NHS research ethics committee) • ‘Eligible’ vs. ‘Enrolled’ cohort • Final research file to be anonymous • DfES commissioned a third party, The Fischer Trust, to conduct the cohort/data match
The Fischer Trust • An intermediately between ALSPAC and the DfES • FT received both ALSPAC and NPD datasets and conducted the cohort match. • FT created it’s own ID (however we were also provided with UPN)
Cohort match variables Details for 20551 children provided: • Child Surname • Child Forename • Child Date of Birth • Home Postcode • School Indicator (name & address) from ALSPAC schools data collection
Validating the cohort match • For our methodology, study requirements we wanted to reverse check the match • FT matched 86% cases provided (17671 cases) • Very few errors found (<0.5%)
Problems with the match variables • Child Surname (change over time) • Child Forename (familiar names) • Child Date of Birth • Home Postcode (out of date and lost cases) • School Indicator (name & address) from ALSPAC schools data collection (depended on school participation and out of date information)
Anonymizing the data set • UPN transferred to new internal ID and then to new collaborator ID • Personal variables dropped (DoB, names, postcode, age at census) • Identifying variables dropped (care authority) • Variables recoded (ethnicity, SEN) • LEA & Estab Ids recoded into our own unique ALSPSCHL_ID
Issues encountered • Cases not covered by NPD • REE – not including old schools • Primary to junior succession • Children who resit years or are in a non natural school year • Historical records of school movement
Issues - UPN We discovered that the U in UPN isn’t that unique! • 215 ALSPAC cases have multiple UPNs (with no clear pattern as to why) • PLASC 2004 has two ALSPAC children with the same UPN
Sample • At least 1 PLASC return identified for 11,997 (85%) of the 14,062 enrolled live births: • 2002 - 11,850 (84%) • 2003 - 11,731 (83%) • 2004 - 11,473 (82%) • Balance: • Private schools • Home educated • Outside England • Not identified
Editing (1) • Convert string variables to numeric, label and sort missing values and write documentation. • Calculate age at census. • From date of entry derive age on starting at current school and length of time at current school. • Derive expected NCYG (National Curriculum Year Group).
Editing (2) • Ethnicity: 39 cases had new ethnicity codes in 2002 – these were mapped back to old codes and an equivalent to main category derived. Also derive white/non-white indicators. • Care: In 2003 17 of the 34 cases marked as currently in care were marked as N for ever in care. Did not occur in 2004.
Unanswered questions • 6.6% of children were not in the expected NCYG in 2002 compared with 0.7% in 2003 and 2004. • Large increase in use of code T for ethnicity source between 2003 & 2004, even if restricted to Year 7 only.
Illegal Values (1) • Numeric codes in Boarder field (should be only ‘B’ or ‘N’) – 2 cases in 2002, 7 in 2003 and 13 in 2004. • Code ‘1’ in for NCYG in 2003 for child in secondary school who was expected to be in Year 7 and who was recorded as in Year 6 in 2002 and Year 8 in 2004.
Illegal Values (2) • X in NCYG in 2004 – 2 cases. • A small number of cases are missing important fields like date of entry, NCYG. • 3 cases had the same code for primary and secondary SEN types.
Uses • Identifying Developmental Impairments: • Investigating the use of early life parental questionnaires to predict later problems. • SEN types used to identify autism, speech/language problems and possible learning difficulties. • Twin approach with medical database searches. • Autism project. • Ethnicity.
Wish List • Detailed documentation describing how different fields relate (especially for SATs). • Numeric fields supplied as numeric rather than string.