270 likes | 409 Views
Cross-national data in DAMES and GE*DE. Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24 th June 2009
E N D
Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24th June 2009 This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk
Today’s workshop: ‘Where next?’ • Problems / challenges with cross-national survey analysis • Quantity of data (and metadata) • Debates on harmonisation, equivalence, data quality • Access to data • The contribution of e-social science
Why is e-Science relevant? • e-Science models cover distributed computing & enabling of collaborations [e.g. Foster et al., 2001] • e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009] • Cross-national survey projects include complex distributed data & a clear need for collaborations… • Hitherto, cross-national survey projects have not generally made use of e-science initiatives
Part 1: What is e-Social Science doing for cross-national survey research? • Projects on the research lifecycle • data collection • data management [DAMES] • data analysis • Projects on a national scale • Projects on data, but not necessarily survey data[e.g. digital records; aggregate data; metadata]
‘Data management’ means… • ‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’[…DAMES Node..] • Usually performed by social scientists themselves • Most overt in quantitative survey data analysis • Preparing or ‘enabling’ survey analysis • Usually a substantial component of the work process • But not explicitly rewarded (and sometimes penalised) • Here we differentiate from archiving / controlling data itself
‘The significance of data management for social survey research’(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151) • The data management studied across the DAMES Node is a major component of the social survey research workload • Pre-release manipulations performed by distributors / archivists • Coding measures into standard categories • Dealing with missing records • Post-release manipulations performed by researchers • Re-coding measures into simple categories • We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently • So the ‘significance’ of DM is about how much better research might be if we did things more effectively…
In GE*DE, we’re developing • Services for accessing and depositing specialist data • Occupations, educational qualifications, ethnicity • UK Administrative data (with ADLS) • Materials specifically oriented to comparative analytical approaches • Data resources often from major cross-national studies • Producing new cross-national data resources • (see also talk on standardization of categorical data in session 4a)
GEODE v1: Organising and distributing specialist data resources (on occupations)
Cross-national data in DAMES and GE*DE • New specialist data on occupations, education and ethnicity • Curation and re-release of existing data • Generation of new data (and/or metadata), with focus on standardisation/ harmonisation • Conduit to existing resources • Generic resources for workflow documentation and replication
Part 2: The contribution of e-Science The contribution should concern: • Navigating complex data • Security • Workflows • Compare with current issues for cross-national surveys: • Quantity of data (and metadata) • Debates on harmonisation, equivalence, data quality • Access to data
(a) Quantity of data (& metadata) …current trends • Moving beyond macro-data analysis* to exploiting large-scale micro-data *Country level analysis, e.g. Fuchs (2009) • Interest in / access to secure micro-data • Exploitation of complex micro-data • Longitudinal data and the life-course [Mayer, 2005] • Micro-data and links with macro-data • Metadata about the quality of the micro-data
(a) … can be helped by… • Interest in / access to secure micro-data E-Science projects building portals for secure access to data (e.g. Sinnott 2008) • Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE) Metadata provision on data resources (e.g. PolicyGrid) Comparative standardisations (e.g. GE*DE) Tools for complex analysis (e.g. e-Stat) Tools for simulation (e.g. NeISS) Tools for visualisation of complex data (e.g. Maptube) Tools for workflow records for research lifecycle (cf. MyExperiment]
(b) Harmonisation, equivalence and data quality • Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations • E-Science resources support • Documenting / replicating ex post harmonisations e.g. syntax databases at GE*DE • Furnishing new scaling tools (meaning equivalence) e.g. scales of educational qualifications at GE*DE • Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of alternative measures • Pluralistic/open source v’s quality control
More on GE*DE and issues of data quality • GE*DE covers Occupations; Educational qualifications; Ethnicity and migration • These are ‘key variables’ in social science research • Regularly measured • Link to concepts of central interest • Multivariate context (Critical relations with gender, age cohort, etc)
c) Access to data ..need for • Facilities for granting access to data Including new [potentially secure] data • Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly] • E-Social science contributions • Security infrastructures (e.g. portal frameworks) offer much stronger models for secure access to data • Services for organising / distributing metadata
The contribution of e-Science - reflections The contribution should concern: • Navigating complex data • Security • Workflows • But, generally, it isn’t taken up (cf. existing networks, e.g. LIS, IPUMS, ESS, etc)
Possible explanations • E-science tools and services too heavyweight compared to ad hoc sharing solutions • Overheads in adopting e-Science tools (cf. existing working models) • E-science tools are unduly generic (c.f. ongoing focussed projects and related resources) • Working habits: Experts and software • Major cross-national projects pre-date e-Science initiatives • Key role of project-specific experts • Many projects are ‘small N’ and don’t seem to require heavyweight inputs • Survey researchers collaborate through proprietary software (e.g. Stata, SPSS)
Conclusions – will things change? • Overheads of e-Science engagement might decline • GE*DE aims: user friendly services, service delivery emphasis, training workshops, mainstream software • Existing ad hoc practices could become insufficient • Data of greater scale and complexity • Data with security limits • Need for integrated access and complex analysis • Need for plurality in analyses of multiple measures (even in ‘Small N’ comparisons) • Need for documentation for replication
References cited • Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press. • Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press. • Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. • Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222. • Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58. • Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4). • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. • Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. • Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0. Minneapolis: University of Minnesota. • Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. • Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service and Applications. London: CRC Press. • Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. • Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.