490 likes | 564 Views
Using Intelligent Systems to infer ethnicity from names. Implications for ethnic analysis of research surveys and administrative datasets. Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006. Name and address. Daniele Ceccomori
E N D
Using Intelligent Systems to infer ethnicity from names Implications for ethnic analysis of research surveys and administrative datasets Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006
Name and address Daniele Ceccomori 16a Broadlands Road London N6 4AN
Intelligent Inferences Italy Central Italy Young Daniele Ceccomori 16a Broadlands Road London N6 4AN Male 1880 - 1939 Subdivided house Global Connections
Monica / Stage Age Daniele Ceccomori 16a Broadlands Road London N6 4AN
Mosaic / Acorn Daniele Ceccomori 16a Broadlands Road London N6 4AN Global Connections
Nam Pehchan / Sangra / Origins Italy Central Italy Daniele Ceccomori 16a Broadlands Road London N6 4AN
‘Ethnic’ coding systems • Based typically on surname element of name • Typically targetting a single or regional group • South Asians (UK) • Hispanics (US) • East Asians (US) • Calibrated against a reference database • Optimised for a specific geographical area • May be delivered using bespoke software application
‘Onomastic’ coding systems • Categories based on common naming practices • Based on interconnection between personal and family names • May or may not correspond to ethnic, linguistic, religious or cultural categorisations • No external validation • Global coverage and territorial independence
Example of Onomastic analysis :Computer Studies graduates, City University, 2006
‘Origins’ • Onomastic coding system • Base files • UK / Ireland / Australia • France • Spain • Italy • Netherlands • Romania • Norway • Number of unique names • Family names 620,000 • Personal names 200,000 • Number of categories • 195 onomastic types • 13 onomastic groups
Key Origins Groups European English Celtic Western European Hispanic Nordic Eastern European Jewish and Armenian Rest of the World African Muslim South Asian Sikh East Asian Japanese
General Strategy • Access universal file
General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division
General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Patel, Rees) • Names of unknown or unclear origin
General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Smith, Rees) • Names of unknown or unclear origins • Unknown or unclear names • Triage • Text mining • Geodemographic profiling • Regional analysis
Triage example : Ourania • UK total : 88 • 11.4% have British family name • 27.3% have Greek Orthodox family name • 61.4% have family names that have not been classified • Greek Orthodox total : 24 • 62.5% have Greek family names • 37.5% have Greek Cypriot family names
Netherlands : names exclusive to multi-cultural geodemographic clusters
General Strategy • Access universal file • Personal name • Family name • Geodemographic cluster • Regional division • Rank names by frequency • Names of known origin (eg Peter, John, Smith, Rees) • Names of unknown or unclear origins • Unknown or unclear names • Triage • Text mining • Geodemographic profiling • Regional analysis • Extend to other countries • Compare frequencies between countries
Section of current family name reference file • Current family name total 562,558
Validation : Comparing census with name based classifications
Research Questions • The size of different minority groups • Their regional dispersion • Their degree of residential integration • Their success in improving their social position • The growth or decline in their numbers
US : Destinations of migrants from Cornwall (above) and Devon (below)
West London postcodes where Hindus and Sikhs are the majority community
Using Intelligent Systems to infer ethnicity from names Implications for ethnic analysis of research surveys and administrative datasets Richard Webber Visiting Professor, Department of Geography, UCL / OriginsInfo Oxford, 18 July 2006