1 / 38

Handling data on occupations, educational qualifications, and ethnicity

Handling data on occupations, educational qualifications, and ethnicity. Paul Lambert & Vernon Gayle, Univ. Stirling Talk to the workshop ‘Resources for Data Management and Handling Social Science Data’ ESRC Research Methods Festival, Oxford, 1 July 2008. Handling variables.

adriel
Download Presentation

Handling data on occupations, educational qualifications, and ethnicity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling data on occupations, educational qualifications, and ethnicity Paul Lambert & Vernon Gayle, Univ. Stirling Talk to the workshop ‘Resources for Data Management and Handling Social Science Data’ ESRC Research Methods Festival, Oxford, 1 July 2008 NCRM, Session 27, 1 July 2008

  2. Handling variables • DAMES project (www.dames.org.uk) - specialist data services on three major social science topics (occupations, education, ethnicity) • ‘GE*DE’ – ‘Grid Enabled Specialist Data Environments’ • From: www.geode.stir.ac.uk NCRM, Session 27, 1 July 2008

  3. Handing social science variables – general themes • Common v’s best practice • Recording the derivation/variable construction process • Reviewing alternative measures • Comparability (between contexts - countries, times) • Input or output harmonisation? • Measurement or functional equivalence? • See esp. ‘Variable constructions in longitudinal research’, http://www.longitudinal.stir.ac.uk/variables/ • Existing standards of National Statistics Institutes and international bodies (during data collection) NCRM, Session 27, 1 July 2008

  4. Handling variables – general themes, ctd. • The unit of analysis • Individual, spouse, household, etc. • Current time; career summary, etc. • Concept and measures • Variety of academic preferences • NSI standard measures NCRM, Session 27, 1 July 2008

  5. Key variables: concepts and measures NCRM, Session 27, 1 July 2008

  6. Key variables: comments & speculation (from www.longitudinal.stir.ac.uk/variables/Coefficients.html ) a) Data manipulation skills and inertia • I would speculate that around 80% of applications using key variables don’t consult literature and evaluate alternative measures, but choose the first convenient and/or accessible variable in the dataset • Data supply decisions (‘what is on the archive version’) are critical • Much of the explanation lies with lack of confidence in data manipulation / linking data • Too many under-used resources – cf. www.esds.ac.uk NCRM, Session 27, 1 July 2008

  7. b) Software and key variables – a personal view • Stata is the superior package for secondary survey data analysis: • Advanced data management and data analysis functionality • Supports easy evaluation of alternative measures (e.g. est store) • Culture of transparency of programming/data manipulation • Problems with Stata • Not available to all users • {Slow estimation times} NCRM, Session 27, 1 July 2008

  8. c) Endogeneity and key variables • ‘everything depends on everything else’ [Crouchley and Fligelstone 2004] • We know a lot about simple properties of key variables • Key variables often change the main effects of other variables • Simple decisions about contrast categories can influence interpretations • Interaction terms are often significant and influential • We have only scratched the surface of understanding key variables in multivariate context and interpretation • Key variables are often endogenous (because they are ‘key’!) • Work on standards / techniques for multi-process systems and/or comparing structural breaks involving key variables is attractive NCRM, Session 27, 1 July 2008

  9. d) Social science variables and functional form Functional form = the way in which measures are arithmetically incorporated in quantitative analysis • With occupations, education, ethnicity, and elsewhere, we tend to be too willing to make simplifying categorisations • An alternative - scaling and relative positions – is better suited for complex analytical procedures NCRM, Session 27, 1 July 2008

  10. 1. Data and research on occupations • In the social sciences, occupation is seen as one of the most important things to know about a person • Direct indicator of economic circumstances • Proxy Indicator of ‘social class’ or ‘stratification’ • GEODE – how social scientists use data on occupations • DAMES – extending GEODE resources • Expanding range • Improving usability NCRM, Session 27, 1 July 2008

  11. Stage 1 - Collecting Occupational Data (and making a mess)

  12. www.geode.stir.ac.uk/ougs.html NCRM, Session 27, 1 July 2008

  13. Occupations: we agree on what we should do: • Preserve two levels of data • Source data: Occupational unit groups, employment status • Social classifications and other outputs • Use transparent (published) methods[i.e. OIR’s] • for classifying index units • for translating index units into social classifications for instance.. • Bechhofer, F. 1969. 'Occupations' in Stacey, M. (ed.) Comparability in Social Research. London: Heinemann. • Jacoby, A. 1986. 'The Measurement of Social Class' Proceedings from the Social Research Association seminar on "Measuring Employment Status and Social Class". London: Social Research Association. • Lambert, P.S. 2002. 'Handling Occupational Information'. Building Research Capacity 4: 9-12. • Rose, D. and Pevalin, D.J. 2003. 'A Researcher's Guide to the National Statistics Socio-economic Classification'. London: Sage.

  14. …in practice we don’t keep to this... Inconsistent preservation of source data • Alternative OUG schemes • SOC-90; SOC-2000; ISCO; SOC-90 (my special version) • Inconsistencies in other index factors • ‘employment status’; supervisory status; number of employees • Individual or household; current job or career Inconsistent exploitation of Occupational Information • Numerous alternative occupational information files • (time; country; format) • Substantive choices over social classifications • Inconsistent translations to social classifications – ‘by file or by fiat’ • Dynamic updates to occupational information resources • Strict security constraints on users’ micro-social survey data • Low uptake of existing occupational information resources

  15. GEODE provides services to help social scientists deal with occupational information resources • disseminate, and access other, Occupational Information Resources • Link together their (secure) micro-data with OIR’s NCRM, Session 27, 1 July 2008

  16. Occupational information resources: small electronic files about OUGs… NCRM, Session 27, 1 July 2008

  17. For example: ISCO-88 Skill levels classification NCRM, Session 27, 1 July 2008

  18. and: UK 1980 CAMSIS scales and CAMCON classes NCRM, Session 27, 1 July 2008

  19. Summary on occupations and data management • Extensive debate about occupation-based social classifications • Document your procedures.. • ..as you may be asked to do something different.. • If you need to choose between occupation-based measures… • They all measure, mostly, the same things • Don’t assume concepts measure measures • Lambert, P. S., & Bihagen, E. (2007). Concepts and Measures: Empirical evidence on the interpretation of ESeC and other occupation-based social classifications. Paper presented at the ISA RC28 conference, Montreal (14-17 August), www.camsis.stir.ac.uk/stratif/archive/lambert_bihagen_2007_version1.pdf . NCRM, Session 27, 1 July 2008

  20. NCRM, Session 27, 1 July 2008

  21. NCRM, Session 27, 1 July 2008

  22. July 2008: Existing resources on occupations Popular websites: • http://www2.warwick.ac.uk/fac/soc/ier/publications/software/cascot/ • http://home.fsw.vu.nl/~ganzeboom/pisa/ • www.iser.essex.ac.uk/esec/ • www.camsis.stir.ac.uk/occunits/distribution.html Emerging resource: http://www.geode.stir.ac.uk/ Some papers: • Chan, T. W., & Goldthorpe, J. H. (2007). Class and Status: The Conceptual Distinction and its Empirical Relevance. American Sociological Review, 72, 512-532. • Rose, D., & Harrison, E. (2007). The European Socio-economic Classification: A New Social Class Scheme for Comparative European Research. European Societies, 9(3), 459-490. • Lambert, P. S., Tan, K. L. L., Gayle, V., Prandy, K., & Bergman, M. M. (2008). The importance of specificity in occupation-based social classifications. International Journal of Sociology and Social Policy, 28(5/6), 179-192. NCRM, Session 27, 1 July 2008

  23. Using data on occupations – further speculation • Growing interest in longitudinal analysis and use of longitudinal summary data on occupations • Intuitive measures (e.g. ever in Class I) • Lampard, R. (2007). Is Social Mobility an Echo of Educational Mobility? Sociological Research Online, 12(5). • Empirical career trajectories / sequences • Halpin, B., & Chan, T. W. (1998). Class Careers as Sequences. European Sociological Review, 14(2), 111-130. • Growing cross-national comparisons • Ganzeboom, H. B. G. (2005). On the Cost of Being Crude: A Comparison of Detailed and Coarse Occupational Coding. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 241-257). Mannheim: ZUMA, Nachrichten Spezial. • Treatment of the non-working populations • Seldom adequate to treat non-working as a category • ‘Selection modelling’ approaches expanding NCRM, Session 27, 1 July 2008

  24. 2. Data and research on education • Although there have been standardisation attempts, data on an individual’s level of education is notoriously difficult to collect and compare between studies • Between countries • Between regions • Between time periods • Even between short time periods (Example of the UK Youth Cohort Study) NCRM, Session 27, 1 July 2008

  25. In international research.. There are two leading standards • ISCED www.unesco.org/education/information/nfsunesco/doc/isced_1997.htm • CASMIN education http://www.equalsoc.org/publications/show/40 • But not all researchers adopt them, or are satisfied with them when they do NCRM, Session 27, 1 July 2008

  26. In UK research.. • There are some recommended standard data collection schemes… • Simplified measure (‘other primary standard’) at: www.statistics.gov.uk/about/data/harmonisation/ • ..but many studies build up unstandardised data on highest levels of qualifications • Often hundreds of unique qualification titles • Little standardisation on relative levels • Many surveys collect multiple response data (multiple qualifications held by an individual) NCRM, Session 27, 1 July 2008

  27. BHPS example NCRM, Session 27, 1 July 2008

  28. Family and Working Lives Survey (54 vars per educ record) NCRM, Session 27, 1 July 2008

  29. Data on education levels cf. occupations Underlying qualification units • There are few obvious ‘educational unit groups’ • There are many publicly defined alternative schemes Manipulation of educational data • Few published ‘educational information resources’ • Many open-access sources of data about educational qualifications • e.g. national statistics website reports • There has been less previous recognition of value of standardisation • Though this is emerging in comparative research • Educational data is dynamic and rapidly expanding

  30. Educational data and cohort change • A critical consideration concerns cohort change in educational qualifications and distributions • Appreciating relative value of education level given context • Multivariate analytical procedures • Mean benefit of education within cohort? NCRM, Session 27, 1 July 2008

  31. Summary on education and data management • We should document measures because.. • Some way away from agreeing on preferred measures • Dynamic nature of educational distributions • Debate between categorisers and scorers… • Some useful resources: • Schneider, Silke L. (ed.) (2008), The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. ISBN 978-3-00-024388-2 • ISMF educational databases and recodes: http://home.fsw.vu.nl/hbg.ganzeboom/ISMF/ismf.htm NCRM, Session 27, 1 July 2008

  32. 3. Data and research on ethnicity • Rapid growth in social science interest, and data, on ‘ethnic minority groups’, ‘immigration’, ‘immigrants’ • Data includes: • Generic & specialist studies collecting ethnic ‘referents’ • ‘ethnic identity’; ‘nationality’, parents’ nationality; country of birth; language spoken; religion; ‘race’ • National research and data management: • Most countries have evolving standard definitions of ethnic groups • International research and data management • Seen as highly problematic in many fields except immigration data • Lambert, P.S. (2005). Ethnicity and the Comparative Analysis of Contemporary Survey Data. In J. H. P. Hoffmeyer-Zlotnick & J. Harkness (Eds.), Methodological Aspects in Cross-National Research (pp. 259-277). Manheim: ZUMA-Nachrichten Spezial 11. NCRM, Session 27, 1 July 2008

  33. NCRM, Session 27, 1 July 2008

  34. NCRM, Session 27, 1 July 2008

  35. UK: ONS & ESDS data guides • Input harmonisation within decades • Output harmonisation between decades • Bosveld, K., Connolly, H., & Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. • Academic strategies – ad hoc ‘black’ group, etc • Addition of extra categories over time • Mixed ethnicities, marriages… • UK Focus on ‘ethnic identity’, lack of attention to alternative referents NCRM, Session 27, 1 July 2008

  36. Comparative research solutions? • Measurement equivalence might be achieved by: • Survey data collection • Connecting related groups • Longitudinal linkage • Functional equivalence for categories: • Simplified categorical distinctions • Immigrant cohorts • Scaling ethnic categories NCRM, Session 27, 1 July 2008

  37. Ethnicity and the DAMES project • Hard subject to collate information on • Few recognisable ‘ethnic unit groups’ • Limited previous ‘data management’ reflection • Very few published databases on ethnicity • Important question of sparse distributions • Dynamic, & rapidly expanding • Likely role is to give new guidance on emerging strategies for analysing and exploiting data NCRM, Session 27, 1 July 2008

  38. Concluding summary: Handling data on occupations, educational qualifications and ethnicity Principles for data management: • Keep clear records • Recodes and transformations • Use existing standards • Do something, not nothing • Distributional differences by cohorts • Learn how to match files • Exploiting wider resources / other research NCRM, Session 27, 1 July 2008

More Related