1 / 34

Chris Dibben University of Edinburgh

Linking historical administrative data. Chris Dibben University of Edinburgh. Context. History of very important contributions: Dutch Famine Birth Cohort Study – epigenetics, thrifty phenotype Överkalix study – epigenetics, sex differences UK Longitudinal Study – health inequalities.

aizza
Download Presentation

Chris Dibben University of Edinburgh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking historical administrative data Chris Dibben University of Edinburgh

  2. Context • History of very important contributions: • Dutch Famine Birth Cohort Study – epigenetics, thrifty phenotype • Överkalix study – epigenetics, sex differences • UK Longitudinal Study – health inequalities

  3. Two new developmental projects • Scottish Mental Surveys 1932 and 1947 • Scottish civil registration data • New cohorts for people now in old age

  4. The ‘Scottish Mental Survey’

  5. ED code, address, household members: marital status, occupation The Scottish Longitudinal study Birth 1936 1939 register Education Employment Scottish morbidity records 1939 books recorded the date of death (up to 1980) linkage to the death database (1974 onwards) 1947 Scottish Mental Survey

  6. Age 0 11 34 55 65 75 Year 1970 1991 2001 2011 1947 Birth 1936 Occupation (estimated) Mental ability Mortality Hospitalisation Detailed household/ individual information School Achievement (time estimated) Early life environment

  7. Background – Scottish vital events • Civil registration of births, deaths and marriages in Scotland began on 1 January 1855 • All historical vital events records have been converted into digital image format with a supporting index • Modern vital events data (from 1974 onwards) are available electronically

  8. Digitising Scotland • Approximately 50 million occupation strings, 8 million causes of death • Classify occupations to Historical International Standard Classification of Occupations (HISCO) • Cause of death to a modified ICD10 • Each with a location

  9. Historical Geocoding Postcode change Without postcode Interpretation error + 1710 1710 1810 GEOMETRY FEATURES 1810 1910 GEOMETRY FEATURES 1910 2010 GEOMETRY FEATURES GEOCODING TOOL 2010 = GEOMETRY FEATURES + • Change of road networks (new road replace old) over time • Change of road names over time • Interpretation errors from the address digitisation

  10. Challenges • Significant methodological issues: • How can we consistently code occupational data so that researchers can explore changing patterns and trends? • How can we automate this process so that the majority of records do not need to be manually coded? digitisingscotland@lscs.ac.uk

  11. Digitising Scotland • Records of births, marriages and deaths recorded in Scotland from 1855 to present day. digitisingscotland@lscs.ac.uk

  12. Experimental Dataset • Use a dataset with similar content for experiments • 60,000 records from the Cambridge Family History Study (records from 1800-1990) • Occupation descriptions and associated HISCO codes • HISCO coding done by historians • Dataset contains 330 different HISCO codes

  13. HISCO Hierarchy Example

  14. Classification Example

  15. Classification Example

  16. Approach • Text analysis • Supervised machine learning • Apache Mahout framework. • Combination of these techniques.

  17. Supervised Machine Learning Training Data Machine Learning Prediction Model Unseen Data Prediction Model Predicted Classification

  18. Supervised Machine Learning Machine Learning Prediction Model Training Data Farm horseman 62460 Shoe maker 80110 Fireman 58100 Stationer 41000 Unseen Data Prediction Model Predicted Classification

  19. Supervised Machine Learning Training Data Machine Learning Prediction Model Farm horseman 62460 Shoe maker 80110 Fireman 58100 Stationer 41000 Unseen Data Prediction Model Predicted Classification Farm horseman Boot maker Fireman Painter

  20. Supervised Machine Learning Training Data Machine Learning Prediction Model Farm horseman 62460 Shoe maker 80110 Fireman 58100 Stationer 41000 Unseen Data Prediction Model Prediction Model Predicted Classification Farm horseman Boot maker Fireman Painter ?

  21. 100% asthma Asthma bronchial dropsy miner's spasmodic miners collier's Miners asthma 100%

  22. Creation of a fully-linked vital events database for the whole Scotland back to 1855 1855 1974 Present Vital Events (24 million births, deaths and marriages) Digital Images + Index Vital Events Database Vital Events Database Fully-linked Vital Events Database

  23. Large scale family reconstruction studies and Pedigrees

  24. Gottfredsson, Magnús, et al. "Lessons from the past: familial aggregation analysis of fatal pandemic influenza (Spanish flu) in Iceland in 1918."Proceedings of the National Academy of Sciences 105.4 (2008): 1303-1308.

  25. Acknowledgments • The Digitising Scotland project is funded by ESRC; • The support from National Records of Scotland is also gratefully acknowledged.

More Related