330 likes | 467 Views
Challenges and strategies when exploiting data on ethnicity from social survey datasets. Paul Lambert, University of Stirling
E N D
Challenges and strategies when exploiting data on ethnicity from social survey datasets Paul Lambert, University of Stirling Talk presented to the NCRM seminar ‘What is ethnicity? What methods best capture it?’, part of the NCRM series ‘’Promoting methodological innovation and capacity building in research on ethnicity’, University of Essex, 14th May 2010. This work draws upon materials from the GEMDE project, a component of DAMES (www.dames.org.uk), an ESRC funded research Node working on ‘Data Management through e-Social Science’
Summary of claims • Well known challenges exploiting survey measures of ethnicity ..our response is usually too conservative.. • Better ‘data management’ could/should allow us to get much more from data • Take account of more precise ethnic differences • Longitudinal/cross-national comparisons • Complex multivariate models, interaction effects • We have something to offer here: ‘GEMDE’
…why is working with ethnicity data in surveys so hard…? - It’s sparse - It’s collinear (e.g. to age) - It’s dynamic (cf. comparative research)
Data includes: • Generic & specialist studies collecting ethnic ‘referents’ ‘ethnic identity’; nationality, parents’ nationality; country of birth; language spoken; religion; ‘race’ • National research: • Most countries have evolving standard definitions of ethnic groups, though not all surveys follow them • Some surveys cover large numbers from many/all groups • Most surveys only have sparse representation of most groups • Comparative research (international/longitudinal) : • Seen as highly problematic in many fields except immigration studies • Lambert, P.S. (2005). Ethnicity and the Comparative Analysis of Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11.
He said that ‘our response is usually too conservative’? I’m not conservative! Social theory is dynamic, fluid, ‘intersectional’, but representative empirical analyses struggles to engage with its terms Empirical studies are bivariate; descriptive; use low numbers of groups & normalising assumptions This is ‘conservative’ because.. • Administrative pressure to reify descriptive groups • Analyses simplify, or ignore, rather than incorporate, extra information on ethnic locations (e.g. language, religion) • Analytical results tend to be easily anticipated (basic descriptions, ignoring complex collinear contexts)
2) Data management for categorical data • Principal social survey datum • Basis of most social research reports/analyses/comparisons • It’s rich and complex • We’re often interested in very fine levels of detail / difference • We usually recode categories in some way for analysis • …how categorical data is managed is of great consequence to the results of analysis… Choices about recoding, boundaries, contrasts made [e.g. RAE analysis: Lambert & Gayle 2009] Management itself influences analytical approaches
UK EFFNATIS survey (1999) [Heckmann et al 2001]; [Penn & Lambert 2009]
A ‘data management’ contribution? • Preserve information on what was done with categorical data • Communicate information on what should/could be done
Standardizing categorical data • ‘Standardization’ refers to treating variables for the purposes of analysis, in order to aid comparison between variables • {In the terminology of survey research analysts} 1. Arithmetic standardization to re-scale metric values [zi = (xi – x) / sd] 2. Ex-ante or Ex-post harmonisation [during data production, or adaptation after the event] 3. Measurement or Meaning/Functional equivalence [Much comparative research flounders on the apparent impossibility of measurement equivalence and lack of options for functional equivalence, e.g. Van Deth, 2003] ‘One size doesn’t fit all so we can’t go on’
Meaning equivalence • For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence (because of non-linear relations between categories and shifting underlying distributions) (even if measurement equivalence seems possible) • Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context • For categorical data, this can be achieved/approximated by scaling categories in one or more dimension of difference
‘Effect proportional scaling’ using parents’ occupational advantage
What was that then? • We can represent categories through positions on a scale • In turn, we can use position in the dimension as a category score which then plugs into a further analysis (e.g. regression main and interaction effects) ..Some options for data on ethnicity.. • Stereotyped Ordered Logistic Regression (SOR) models, summarize dimensions of difference according to regression predictor values [e.g. Lambert and Penn, 2001] • Geometric data analysis for distances between people, or things [cf. Prandy, 1979; Bennett et al., 2009] • Assign category scores by hand (a priori or by selected average)
Is scaling useful? ..sometimes.. • Intrinsically revealing as an exploratory exercise • Parsimonious functional form in explanatory modelling • Esp. if ethnicity is a control variable • If interaction effects are considered • If a story of a linear functional form is persuasive (e.g. exponential increase)
What we do and what we ought to do Research applications tend to select a single simplifying collinear categorisation of a concept • Due to coordinated instructions [e.g. Blossfeld et al. 2006] • Due to perceived lack of available alternatives • Due to perceived convenience To make statistical analyses more robust we should… • Operationalise and deploy various scalings and arithmetic measures • Try out various categorisations and explore their distributional properties • … and keep a replicable trail of all these activities..
3) Some contributions from DAMES • 3 themes in DAMES ought, in our perspective, to help here • Replicability / transparency • Plurality of approaches • Ease of access (to off-putting operations)
Replicability / transparency • Document your own recodes • Access somebody else’s recodes • Identify commonly used recodes (& use them..!)
Plurality of approaches • Diminishing excuses for not trying out multiple operationalisations…
Making complex things easier • Organising complex categorical data • Labelling, recoding, etc • Effect proportional scaling • Standardisation • Interaction terms
GESDE: Grid Enabled Specialist Data Environments • Facilities for collecting together, and distributing, specialist data resources • Occupations: GEODE project began 2005 • Education and Ethnicity: GEEDE and GEMDE began Feb. 2008 • Capacity building aims: improving use of measures of these concepts by • improving access to relevant information • providing training / advice on good practice
GEODE: Organising and distributing specialist data resources (on occupations)
The GEODE model for GEMDE? • Occupational Information Resources • Occupational Unit Groups
Our approach to GEMDE • ….A service for MUGs and MIRs… • Define/register ‘Minority Unit Groups’ • Define/register ‘Minority Information Resources’ • Explore data resources and obtain help in approaching analysis of complex, sparse data
What's a MIR? • 'Minority Information Resource'. • This is our own terminology. By a MIR, we mean any piece of information which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' that we used on GEODE • E.g. summary statistical data about the categories from and documentation or information • E.g. recodings which have been used in a particular study • Social scientists are not in general aware of the existence of MIRs (cf. wides use of popular Occupational Information Resources). In GEMDE we seek to publicise little know resources and promote their uptake: We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research. • In our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR).
The GEMDE prototype‘Liferay portal’ with access to MUGs and MIRs • Current facilities • Shibboleth access • Deposit MUGs/MIRs • Search/browse deposited resources • Feedback on resources (user ratings) • Still to come • Additional guest access • Review live data (e.g. pooled LFS records) • Expert and user quality ratings => …development over 2010...
Data used • Department for Education and Employment. (1997). Family and Working Lives Survey, 1994-1995 [computer file]. Colchester, Essex: UK Data Archive [distributor], SN: 3704. • Heckmann, F., Penn, R. D., & Schnapper, D. (Eds.). (2001). Effectiveness of National Integration Strategies Towards Second Generation Migrant Youth in a Comparative Perspective - EFFNATIS. Bamberg: European Forum for Migration Studies, University of Bamberg. • Inglehart, R. (2000). World Values Surveys and European Values Surveys 1981-4, 1990-3, 1995-7 [Computer file] (Vol. 2000). Ann Arbor, MI: Institute for Social Research [Producer]; Inter-university Consortium for Political and Social Research [Distributor]. • Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester, Essex: UK Data Archive [distributor], SN: 5666. • University of Essex, & Institute for Social and Economic Research. (2009). British Household Panel Survey: Waves 1-17, 1991-2008 [computer file], 5th Edition. Colchester, Essex: UK Data Archive [distributor], March 2009, SN 5151.
References • Blossfeld, H. P., Mills, M., & Bernardi, F. (Eds.). (2006). Globalization, Uncertainty and Men's Careers: An International Comparison. Cheltenham: Edward Elgar. • Bennett, T., Savage, M., Silva, E. B., Warde, A., Gayo-Cal, M., Wright, D., et al. (2009). Culture, Class, Distinction. London: Routledge. • Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk) • Lambert, P. S., & Penn, R. D. (2001). SOR models and Ethnicity data in LIS and LES : Country by Country Report. Syracuse University, Syracuse, New York 13244-1020: Luxembourg Income Study Paper No. 260, Maxwell School of Citizenship and Public Affairs. • Penn, R. D., & Lambert, P. S. (2009). Children of International Migrants in Europe: Comparative Perspectives. Basingstoke: Palgrave. • Prandy, K. (1979). Ethnic discrimination in employment and housing. Ethnic and Racial Studies, 2(1), 66-79. • Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper 2006-05. • van Deth, J. W. (2003). Using Published Survey Data. In J. A. Harkness, F. J. R. van de Vijver & P. P. Mohler (Eds.), Cross-Cultural Survey Methods (pp. 329-346). New York: Wiley.