150 likes | 312 Views
Proposal of a collaboration to improve the ethnicity classification of patient registers. Pablo Mateos UCL - CASA 25 th May 2005. Contents. Aims of proposal Mutual Benefits & Justification Members Data Sharing Data Protection Intellectual Property Project Name. 1- Aims.
E N D
Proposal of a collaboration to improve the ethnicity classification of patient registers Pablo Mateos UCL - CASA 25th May 2005
Contents • Aims of proposal • Mutual Benefits & Justification • Members • Data Sharing • Data Protection • Intellectual Property • Project Name
1- Aims The purposes of the group are threefold: • To facilitate access to the Names-to-CEL directory developed by CASA • To develop and to share access to knowledge relating to: • Effective use of the Names-to-CEL directory in public health • Data mining of birthplace information in the ‘Exeter’ register • To improve the quality and accuracy of the directory by contributing anonymised data from operational files
2- Mutual Benefits Benefits for the model • Wider population base per surname • More ethnic groups better represented • Better Firstname or Surname matches • More extensive birthplace name alias tables Benefits for the PCTs • Birthplace information correctly classified • Ethnic group classification provided • Richer ethnic classification: • Beyond 16+ • At individual level • Know-how already built
London ‘non-16+ ethnic groups’ (1.2 million people stated ‘other’ ethnic identities in London 2001 Census) (.../...) Source: 2001 Census GLA commissioned tables
3- Members • Primarily aimed at PCTs and health institutions working on improving ethnicity classification • Open to any institution interested in benefiting and contributing to the ethnicity classification model • Pre-existing ‘operational names data’ at individual level must exist within each member
4- Data Sharing • UCL will distribute to members each update of the Name-to-CEL directories: • - Surname-to-CEL • Forename-to-CEL • Members will provide 2 separate files: • - Surname-Birthplace aggregation • - Forename-Birthplace aggregation • There will be no way to link these two files together • Only 1 common version of the Name-to-CEL directory will be maintained
Proposed Data Flow (1) Example provided here for Surnames. An exact parallel process applies for First Names Input & Processing Module (Highly restricted access) PCTs BirthPlace Geocoder Sur-names Records Aggregated by surname UCL Surname-to-CEL Check > threshold N Leave surname until more records arrive Y Output Surname Current threshold = Over 10 persons / surname Output Module
Proposed Data Flow (2) Input & Processing Module Output Module Manual Review N N CEL=Group of COBs CEL=COB Y Visual Inspection Surname-to-CEL Assigned Updates Distributed Surname-to-CEL Surname-to-CEL Directory
5- Data Protection • There will be no way to link the 2 files together (surname or forename) • Records in the files will identify aggregations of either a surname or a forename, not individuals • A minimum threshold of 10 persons per name will be applied to process & release the name to the output module. • A detailed data sharing framework document is being developed to be signed by members
6- Intellectual Property • Intellectual Property of the Names-to-CEL directory is held by University College London • Access to this directory, and to the methods and tools developed in the project will be granted free of charge for contributing members • A fee will be charged to non-contributing members, as per future arrangements • Contributing members are those who provide data to improve the Name-to-CEL allocation
7- Project Name GEONom Geographic & Ethnic Origin of Names www.casa.ucl.ac.uk/geonom
8- Open Discussion 8.1. Data Sharing and Data Protection 8.2. Methodology 8.3. Applications