370 likes | 536 Views
Dealing with key variables in social research: New options. Paul Lambert, University of Stirling Presented to the ESRC DAMES node workshop on ‘Operationalising social science variables and using the GEODE, GEMDE and GEEDE services ’ University of Stirling, 30 th January 2012.
E N D
Dealing with key variables in social research: New options Paul Lambert, University of Stirling Presented to the ESRC DAMES node workshop on ‘Operationalising social science variables and using the GEODE, GEMDE and GEEDE services ’ University of Stirling, 30th January 2012
Dealing with key variables in social research: New options • The idea of ‘key variables’ • Contemporary challenges in analysing key variables • Three new contributions from e-Social Science: • GEODE (www.geode.stir.ac.uk): data on occupations • GEEDE (www.dames.org.uk/geede): data on education • GEMDE (www.dames.org.uk/gemde): data on ethnicity
1) The idea of key variables • Recognition that certain factors are measured and analysed time after time in social surveys, and are routinely of relevance to statistical analyses [e.g. Burgess 1986] • By implication, there’s value in methodological reflection and consistency/re-use of measures between studies • A small range of academic reviews: • Stacey (1969): Education; Family status; Income; Occupations • Burgess (1986): Age, gender, ethnicity, health, education, occupational class, employment status, leisure, politics, voluntary associations • Hoffmeyer-Zlotnik and Wolff (2003): Occupation, education, age, religion, ethnicity, household, family (cross-nationally)
Many resources on ‘key variables’ are by-products of data preparation or harmonisation
A daunting volume of resources... • Data providers’ standards and documentation… • Standards adopted in particular surveys: UKDA – www.data-archive.ac.uk ; CESSDA - www.cessda.org/ • Cross-national harmonisation: IPUMS www.ipums.org ; ISSP www.issp.org/ ; WVS www.worldvaluessurvey.org/ ; LIS www.lisproject.org ; ESS www.europeansocialsurvey.org • Publications from ‘Small N’ as well as ‘Large N’ studies • [e.g. Charles & Grusky, 2004 on gender and occs; Wright 1997 on social class] • Resource providers’ recommendations/standards • ESDS – www.esds.ac.uk • Survey Network / Question Bank – Topics (http://surveynet.ac.uk/sqb/topics/introduction.asp) • OECD - http://www.oecd.org/statsportal/ • Edacwowe - http://www.edacwowe.eu/en/ • Harry Ganzeboom’s ISMF - www.harryganzeboom.nl/ismf/ismf.htm
Constructions of key variables in survey research • Are important… • Major part of the hands-on work of survey analysis • Central to many critiques of research/outputs • Existing reflections and resources • Methodological comments [e.g. Stacey 1969; Burgess 1986] • Validity and reliability; harmonisation and standardisation efforts • Cross-nationally comparative research into ‘equivalence’ • ..But …. • Attention to variables is marginalised in methodological reviews, which focus on data and/or techniques [cf. Raftery 2001] • Reviews/resources on variables often don’t give good advice to those conducting complex statistical models of social processes
“Reviews/resources on variables often don’t give good advice to those conducting complex statistical models of social processes”? • Univariate perspective and evaluations • Inconvenient (categorical) functional forms • Large #’s of categories (e.g. 8 NS-SEC classes) • Reliance on detailed source data (e.g. 351 SOCs + Emp. Stat) • Ill-suited to arithmetic standardisation, modelling interaction effects, and temporal/x-national comparisons
“Reviews/resources on variables often don’t give good advice to those conducting complex statistical models of social processes”? Issues of interpretation: • (1) Asymmetric validity evaluations • e.g. occupation-based measures [Lambert/Bihagen 2012] • (2) Unrealistic forms of equivalence • Assertion of ‘measurement equivalence’ is rarely convincing
2) Contemporary challenges in analysing key variables …over the last 20 yrs… • Vastly increasing volumes of microdata (esp. comparative) • Huge social surveys; routine data; x-natl./temporal comparisons • Increasing computational power/statistical options • Multivariate statistical models; non-linear outcome models; mixed and multi-process models • Increase in volume of academic studies • Difficulty of reviewing previous approaches • Proliferation, in practice, of variable operationalisations • Options for internet dissemination • E.g. Journals’ ‘Electronic Supplementary Materials’
Parlous shortcuts: ‘Don’t get it right, get it published’? • Many of us navigate through the abundance of options by being (arbitrarily) selective, using convenient variables, and disregarding more intricate variable construction literatures • Though understandable, this isn’t good science! • Lack of documentation for replication [cf. Dale 2006] • Keep making new measures / re-inventing the wheel • Use unwieldy categorical measures (restricting models) • Misinterpret/erroneously estimate the effects of concepts Claim: In our field, good science involves (a) sensitivity analysis of measures and (b) building models conceptually (rather than according to functional form of measures)
Focus on three especially important categorical variables: Education; Occupation; Ethnicity • Problems for analysts include: • Many alternative/competing measurement options (e.g. class schemes) • Changing structural contexts (distributions; correlations; sparsity) • Changing measurement practices (e.g. decennial revisions; admin data) • Different needs in different studies (e.g. ISCO88-SOC; Grad. v’s schl quals) * 2008 BHPS 20+yrs, qfedhi, jbsoc, ‘xeth’ using race{l} + 0.01 downwt for Wh
Measurement equivalence for comparisons – coding to the lowest common denominator? • Are the compatible categories equivalent? • Who does the work (data distributors and/or analysts)…? • …& who records it (e.g. Mohler et al., 2008)?
Meaning equivalence • For categorical data, equivalence for comparisons is often best approached in terms of meaning equivalence (because of non-linear relations between categories and shifting underlying distributions) (even if measurement equivalence seems possible) • Arithmetic standardisation offers a convenient form of meaning equivalence by indicating relative position with the structure defined by the current context • For categorical data, this can be achieved/approximated by scaling categories in one or more dimension of difference
‘Effect proportional scaling’ using parents’ occupational advantage
Asserting that the ideal research using models involving key variables… • …derives and compares many plausible candidate variable operationalisations, and documents/publishes the sensitivity analysis • …thoroughly tests whether linear functional forms can be used, to facilitate complex modelling • Overheads include: • substantial work in deriving and documenting measures • against own interests to disseminate documentation • hard to communicate linear effects to social scientists ..can e-Science come to the rescue..?
3) Three new contributions from e-Social Science See e.g. Digital Social Research, www.digitalsocialresearch.net (e-Science broadly involves using emergent computer technologies with enhanced capacities for communication/collaboration & data processing) • DAMES node – objective to support ‘data management’ in social research (e.g. organising/ manipulating/enhancing data; pre-analysis tasks; documentation/replication) • Three new ‘GESDE’ services (from www.dames.org.uk) • Provide a dynamic, user-contributed library-style service for access to specialist information resources concerning occupations, educational qualifications, ethnicity • Encourage researchers to use the three ‘portals’ to find and exploit suitable data, and contribute their own files
DAMES provides online services for data coordination/organisation Tools for handing variables in social science data Recoding measures; standardisation / harmonisation; Linking; Curating
GESDE – Search and browse supplementary data on occupations; educational qualifications; ethnicity
The data curation tool The curation tool obtains metadata and supports the storage and organisation of data resources in a more generic way (‘DDI’ format metadata) It includes an ‘IRODS’ file storage system allowing users to upload files and access their own and others’ files
(a) Using GEODE for data on occupations (i) Usually start with information about detailed ‘occupational unit group’ (ii) then find ways to attaching summary information about occupations to occupational unit groups
Model of parental occupational advantage predicted by own occupation and by gender and age, Britain in 1991 (from Lambert & Bihagen 2012) • Becomes easier to derive and compare multiple variable operationalisations, and for researchers to deposit new tools on occupations, educational qualifications and ethnicity
(b) Using GEEDE for data on educational qualifications ‘Educational unit groups’ are qualification listings in UK/beyond • British Qualifications (Kogan Page, 2010) • Qualifications categories of major surveys • BHPS, LFS, Census, etc • LFS time series standard measure • UCAS degree codes • ISCED: International Standard Classification of Education • (cf. Schneider, 2008) • IPUMS: Census measures over 100 years and 65 countries • LIS: LFS measures over 50 years and 30 countries
Project specific documentation is often well distributed e.g. www.lisproject.org
Example of a new measures: A CAMSIS model for educational qualifications • A common way of scaling occupational data is to analyse social interaction patterns between the incumbents of occupations and depict the dimension of social interaction distance as an indicator of stratification • CAMSIS approach (www.camsis.stir.ac.uk) • Neutral empirical approach, independent of occupational units, comparable across contexts (e.g. Prandy and Jones, 2001) • Same analysis could be applied to qualifications data (see Lambert 2012)
(c) Using GEMDE for data on ethnicity Working with ethnicity data in surveys is hard… - sparse - collinear (e.g. to age, location) - dynamic (cf. comparative research)
GEMDE supports replicability/transparency by promoting data on MUGs and MIRs… Information about ‘Minority Unit Groups’ and ‘Minority Information Resources’ • Document your own recodes, notes, etc • Access somebody else’s recodes/notes/metadata • Identify commonly used recodes (& use them..!)
What's a MIR? • 'Minority Information Resource'. • This is our own terminology. By a MIR, we mean any piece of information which supplies systematic data on a minority unit group (MUG) classification. We've used this term to be deliberately similar to the phrase 'Occupational Information Resources' that we used on GEODE • E.g. summary statistical data about the categories from and documentation or information • E.g. recodings which have been used in a particular study • Social scientists are not in general aware of the existence of MIRs (cf. wide use of popular Occupational Information Resources). In GEMDE we seek to publicise little know resources and promote their uptake: We argue that better communication and dissemination of MIRs is in fact an important step towards better scientific practice of replication and standardisation of research. • In our terms, every MIR necessarily links to a MUG (but not every MUG has a MIR).
As well as file/info. retrieval and depositing functions, GEMDE also permits some bespoke data analysis
Summary: Dealing with key variables in social research • Plurality of variable options makes a scientific case for derivation, comparison & documentation + Relevance of scaling categorical data • DAMES’ GESDE services for storing metadata and facilitating variable derivation, documentation Send your metadata into GESDE..! See the practical session… • Various other projects also exist supporting variable analysis • MethodBox (www.methodbox.org) and ADLS (www.adls.ac.uk) • e-Stat project’s e-Books (www.bristol.ac.uk/cmm/research/estat/)
References • Bosveld, K., Connolly, H., Rendall, M. S. (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. • Burgess, R. G. (Ed.). (1986). Key Variables in Social Investigation. London: Routledge. • Charles, M., & Grusky, D. B. (2004). Occupational Ghettos: The Worldwide Segregation of Women and Men. Stanford: Stanford University Press. • Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. • Hoffmeyer-Zlotnik, J. H. P., & Wolf, C. (Eds.). (2003). Advances in Cross-national Comparison: A European Working Book for Demographic and Socio-economic Variables. Berlin: Kluwer Academic / Plenum Publishers. • Kogan Page Editorial Staff. (2010). British Qualifications 2010: A Complete Guide to Professional, Vocational and Academic Qualifications in the UK. London: Kogan Page. • Lambert, P.S. and Bihagen, E. (2012 under review) Concepts and Measures in Occupation-based Social Classifications. • Lambert, P. S. (2012). Comparative scaling of educational categories by homogamy – Analysis of UK data from the BHPS. Stirling: University of Stirling, Technical paper 2012-1 of the DAMES Node, Data Management through e-Social Science, www.dames.org.uk/publications.html. • Li, Y., & Heath, A. F. (2008). Socio-Economic Position and Political Support of Black and Ethnic Minority Groups in the United Kingdom, 1972-2005 [computer file]. 2nd Edition. Colchester: UK Data Archive [distributor], SN: 5666. • Mohler, P. P., Pennell, B.-E., & Hubbard, F. (2008). Survey Documentation: Toward professional knowledge management in sample surveys. In E. De Leeuw, J. Hox & D. A. Dillman (Eds.), International Handbook of Survey Methodology (pp. 403-420). Hove: Psychology Press. • Prandy, K., & Jones, F. L. (2001). An international comparative analysis of marriage patterns and social stratification. International Journal of Sociology and Social Policy, 21, 165-183. • Raftery, A. E. (2001). Statistics in Sociology, 1950-2000: A selective review. Sociological Methodology, 31, 1-46. • Rose, D., & Harrison, E. (Eds.). (2010). Social Class in Europe: An Introduction to the European Socio-economic Classification London: Routledge. • Schneider, S. L. (2010). Nominal comparability is not enough: (In-)Equivalence of construct validity of cross-national measures of educational attainment in the European Social Survey. Research in Social Stratification and Mobility. • Simpson, L., & Akinwale, B. (2006). Quantifying Stablity and Change in Ethnic Group. Manchester: University of Manchester, CCSR Working Paper 2006-05. • Stacey, M. (Ed.). (1969). Comparability in Social Research. London: Heineman (on behalf of the British Sociological Association). • Treiman, D. J. (2007). The Legacy of Apartheid: Racial Inequalities in the New South Africa. In A. F. Heath & S. Y. Cheung (Eds.), Unequal Chances: Ethnic Minorities in Western Labour Markets. Oxford: Oxford University Press, for the British Academy. • Wright, E. O. (1997). Class Counts : Comparative Studies in Class Analysis. Cambridge: Cambridge University Press.