380 likes | 561 Views
Handling data on occupations, educational qualifications, and ethnicity. Paul Lambert & Vernon Gayle, Univ. Stirling Talk to the workshop ‘Resources for Data Management and Handling Social Science Data’ ESRC Research Methods Festival, Oxford, 1 July 2008. Handling variables.
E N D
Handling data on occupations, educational qualifications, and ethnicity Paul Lambert & Vernon Gayle, Univ. Stirling Talk to the workshop ‘Resources for Data Management and Handling Social Science Data’ ESRC Research Methods Festival, Oxford, 1 July 2008 NCRM, Session 27, 1 July 2008
Handling variables • DAMES project (www.dames.org.uk) - specialist data services on three major social science topics (occupations, education, ethnicity) • ‘GE*DE’ – ‘Grid Enabled Specialist Data Environments’ • From: www.geode.stir.ac.uk NCRM, Session 27, 1 July 2008
Handing social science variables – general themes • Common v’s best practice • Recording the derivation/variable construction process • Reviewing alternative measures • Comparability (between contexts - countries, times) • Input or output harmonisation? • Measurement or functional equivalence? • See esp. ‘Variable constructions in longitudinal research’, http://www.longitudinal.stir.ac.uk/variables/ • Existing standards of National Statistics Institutes and international bodies (during data collection) NCRM, Session 27, 1 July 2008
Handling variables – general themes, ctd. • The unit of analysis • Individual, spouse, household, etc. • Current time; career summary, etc. • Concept and measures • Variety of academic preferences • NSI standard measures NCRM, Session 27, 1 July 2008
Key variables: concepts and measures NCRM, Session 27, 1 July 2008
Key variables: comments & speculation (from www.longitudinal.stir.ac.uk/variables/Coefficients.html ) a) Data manipulation skills and inertia • I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset • Data supply decisions (‘what is on the archive version’) are critical • Much of the explanation lies with lack of confidence in data manipulation / linking data • Too many under-used resources – cf. www.esds.ac.uk NCRM, Session 27, 1 July 2008
b) Software and key variables – a personal view • Stata is the superior package for secondary survey data analysis: • Advanced data management and data analysis functionality • Supports easy evaluation of alternative measures (e.g. est store) • Culture of transparency of programming/data manipulation • Problems with Stata • Not available to all users • {Slow estimation times} NCRM, Session 27, 1 July 2008
c) Endogeneity and key variables • ‘everything depends on everything else’ [Crouchley and Fligelstone 2004] • We know a lot about simple properties of key variables • Key variables often change the main effects of other variables • Simple decisions about contrast categories can influence interpretations • Interaction terms are often significant and influential • We have only scratched the surface of understanding key variables in multivariate context and interpretation • Key variables are often endogenous (because they are ‘key’!) • Work on standards / techniques for multi-process systems and/or comparing structural breaks involving key variables is attractive NCRM, Session 27, 1 July 2008
d) Social science variables and functional form Functional form = the way in which measures are arithmetically incorporated in quantitative analysis • With occupations, education, ethnicity, and elsewhere, we tend to be too willing to make simplifying categorisations • An alternative - scaling and relative positions – is better suited for complex analytical procedures NCRM, Session 27, 1 July 2008
1. Data and research on occupations • In the social sciences, occupation is seen as one of the most important things to know about a person • Direct indicator of economic circumstances • Proxy Indicator of ‘social class’ or ‘stratification’ • GEODE – how social scientists use data on occupations • DAMES – extending GEODE resources • Expanding range • Improving usability NCRM, Session 27, 1 July 2008
www.geode.stir.ac.uk/ougs.html NCRM, Session 27, 1 July 2008
Occupations: we agree on what we should do: • Preserve two levels of data • Source data: Occupational unit groups, employment status • Social classifications and other outputs • Use transparent (published) methods[i.e. OIR’s] • for classifying index units • for translating index units into social classifications for instance.. • Bechhofer, F. 1969. 'Occupations' in Stacey, M. (ed.) Comparability in Social Research. London: Heinemann. • Jacoby, A. 1986. 'The Measurement of Social Class' Proceedings from the Social Research Association seminar on "Measuring Employment Status and Social Class". London: Social Research Association. • Lambert, P.S. 2002. 'Handling Occupational Information'. Building Research Capacity 4: 9-12. • Rose, D. and Pevalin, D.J. 2003. 'A Researcher's Guide to the National Statistics Socio-economic Classification'. London: Sage.
…in practice we don’t keep to this... Inconsistent preservation of source data • Alternative OUG schemes • SOC-90; SOC-2000; ISCO; SOC-90 (my special version) • Inconsistencies in other index factors • ‘employment status’; supervisory status; number of employees • Individual or household; current job or career Inconsistent exploitation of Occupational Information • Numerous alternative occupational information files • (time; country; format) • Substantive choices over social classifications • Inconsistent translations to social classifications – ‘by file or by fiat’ • Dynamic updates to occupational information resources • Strict security constraints on users’ micro-social survey data • Low uptake of existing occupational information resources
GEODE provides services to help social scientists deal with occupational information resources • disseminate, and access other, Occupational Information Resources • Link together their (secure) micro-data with OIR’s NCRM, Session 27, 1 July 2008
Occupational information resources: small electronic files about OUGs… NCRM, Session 27, 1 July 2008
For example: ISCO-88 Skill levels classification NCRM, Session 27, 1 July 2008
and: UK 1980 CAMSIS scales and CAMCON classes NCRM, Session 27, 1 July 2008
Summary on occupations and data management • Extensive debate about occupation-based social classifications • Document your procedures.. • ..as you may be asked to do something different.. • If you need to choose between occupation-based measures… • They all measure, mostly, the same things • Don’t assume concepts measure measures • Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the ISA RC28 conference, Montreal (14-17 August), www.camsis.stir.ac.uk/stratif/archive/lambert_bihagen_2007_version1.pdf . NCRM, Session 27, 1 July 2008
July 2008: Existing resources on occupations Popular websites: • http://www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ • http://home.fsw.vu.nl/~ganzeboom/pisa/ • www.iser.essex.ac.uk/esec/ • www.camsis.stir.ac.uk/occunits/distribution.html Emerging resource: http://www.geode.stir.ac.uk/ Some papers: • Chan, T. W., & Goldthorpe, J. H. (2007). Class and Status: The Conceptual Distinction and its Empirical Relevance. American Sociological Review, 72, 512-532. • Rose, D., & Harrison, E. (2007). The European Socio-economic Classification: A New Social Class Scheme for Comparative European Research. European Societies, 9(3), 459-490. • Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy, K., & Bergman, M. M. (2008). The importance of specificity in occupation-based social classifications. International Journal of Sociology and Social Policy, 28(5/6), 179-192. NCRM, Session 27, 1 July 2008
Using data on occupations – further speculation • Growing interest in longitudinal analysis and use of longitudinal summary data on occupations • Intuitive measures (e.g. ever in Class I) • Lampard, R. (2007). Is Social Mobility an Echo of Educational Mobility? Sociological Research Online, 12(5). • Empirical career trajectories / sequences • Halpin, B., & Chan, T. W. (1998). Class Careers as Sequences. European Sociological Review, 14(2), 111-130. • Growing cross-national comparisons • Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of Detailed and Coarse Occupational Coding. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 241-257). Mannheim: ZUMA, Nachrichten Spezial. • Treatment of the non-working populations • Seldom adequate to treat non-working as a category • ‘Selection modelling’ approaches expanding NCRM, Session 27, 1 July 2008
2. Data and research on education • Although there have been standardisation attempts, data on an individual’s level of education is notoriously difficult to collect and compare between studies • Between countries • Between regions • Between time periods • Even between short time periods (Example of the UK Youth Cohort Study) NCRM, Session 27, 1 July 2008
In international research.. There are two leading standards • ISCED www.unesco.org/education/information/nfsunesco/doc/isced_1997.htm • CASMIN education http://www.equalsoc.org/publications/show/40 • But not all researchers adopt them, or are satisfied with them when they do NCRM, Session 27, 1 July 2008
In UK research.. • There are some recommended standard data collection schemes… • Simplified measure (‘other primary standard’) at: www.statistics.gov.uk/about/data/harmonisation/ • ..but many studies build up unstandardised data on highest levels of qualifications • Often hundreds of unique qualification titles • Little standardisation on relative levels • Many surveys collect multiple response data (multiple qualifications held by an individual) NCRM, Session 27, 1 July 2008
BHPS example NCRM, Session 27, 1 July 2008
Family and Working Lives Survey (54 vars per educ record) NCRM, Session 27, 1 July 2008
Data on education levels cf. occupations Underlying qualification units • There are few obvious ‘educational unit groups’ • There are many publicly defined alternative schemes Manipulation of educational data • Few published ‘educational information resources’ • Many open-access sources of data about educational qualifications • e.g. national statistics website reports • There has been less previous recognition of value of standardisation • Though this is emerging in comparative research • Educational data is dynamic and rapidly expanding
Educational data and cohort change • A critical consideration concerns cohort change in educational qualifications and distributions • Appreciating relative value of education level given context • Multivariate analytical procedures • Mean benefit of education within cohort? NCRM, Session 27, 1 July 2008
Summary on education and data management • We should document measures because.. • Some way away from agreeing on preferred measures • Dynamic nature of educational distributions • Debate between categorisers and scorers… • Some useful resources: • Schneider, Silke L. (ed.) (2008), The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. ISBN 978-3-00-024388-2 • ISMF educational databases and recodes: http://home.fsw.vu.nl/hbg.ganzeboom/ISMF/ismf.htm NCRM, Session 27, 1 July 2008
3. Data and research on ethnicity • Rapid growth in social science interest, and data, on ‘ethnic minority groups’, ‘immigration’, ‘immigrants’ • Data includes: • Generic & specialist studies collecting ethnic ‘referents’ • ‘ethnic identity’; ‘nationality’, parents’ nationality; country of birth; language spoken; religion; ‘race’ • National research and data management: • Most countries have evolving standard definitions of ethnic groups • International research and data management • Seen as highly problematic in many fields except immigration data • Lambert, P.S. (2005). Ethnicity and the Comparative Analysis of Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11. NCRM, Session 27, 1 July 2008
UK: ONS & ESDS data guides • Input harmonisation within decades • Output harmonisation between decades • Bosveld, K., Connolly, H., & Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. • Academic strategies – ad hoc ‘black’ group, etc • Addition of extra categories over time • Mixed ethnicities, marriages… • UK Focus on ‘ethnic identity’, lack of attention to alternative referents NCRM, Session 27, 1 July 2008
Comparative research solutions? • Measurement equivalence might be achieved by: • Survey data collection • Connecting related groups • Longitudinal linkage • Functional equivalence for categories: • Simplified categorical distinctions • Immigrant cohorts • Scaling ethnic categories NCRM, Session 27, 1 July 2008
Ethnicity and the DAMES project • Hard subject to collate information on • Few recognisable ‘ethnic unit groups’ • Limited previous ‘data management’ reflection • Very few published databases on ethnicity • Important question of sparse distributions • Dynamic, & rapidly expanding • Likely role is to give new guidance on emerging strategies for analysing and exploiting data NCRM, Session 27, 1 July 2008
Concluding summary: Handling data on occupations, educational qualifications and ethnicity Principles for data management: • Keep clear records • Recodes and transformations • Use existing standards • Do something, not nothing • Distributional differences by cohorts • Learn how to match files • Exploiting wider resources / other research NCRM, Session 27, 1 July 2008