1 / 27

Cross-national data in DAMES and GE*DE

Cross-national data in DAMES and GE*DE. Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24 th June 2009

arty
Download Presentation

Cross-national data in DAMES and GE*DE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cross-national data in DAMES and GE*DE Paul Lambert, University of Stirling Prepared for the Workshop on Cross-Nationally comparative social survey research, Fifth International Conference on e-Social Science, Cologne, 24th June 2009 This talk presents materials from the DAMES Node, an ESRC funded research Node of the National Centre for e-Social Science www.dames.org.uk

  2. Some recent history –Atkinson (1996: 47)

  3. Stewart et al. (2009: 5)

  4. Today’s workshop: ‘Where next?’ • Problems / challenges with cross-national survey analysis • Quantity of data (and metadata) • Debates on harmonisation, equivalence, data quality • Access to data • The contribution of e-social science

  5. Why is e-Science relevant? • e-Science models cover distributed computing & enabling of collaborations [e.g. Foster et al., 2001] • e-Social Science directed to research infrastructures for collaboration, and for supporting the lifecycle of data oriented research [e.g. Halfpenny & Procter, 2009] • Cross-national survey projects include complex distributed data & a clear need for collaborations… • Hitherto, cross-national survey projects have not generally made use of e-science initiatives

  6. Part 1: What is e-Social Science doing for cross-national survey research? • Projects on the research lifecycle • data collection • data management [DAMES] • data analysis • Projects on a national scale • Projects on data, but not necessarily survey data[e.g. digital records; aggregate data; metadata]

  7. The example of DAMES and GE*DE www.dames.org.uk

  8. ‘Data management’ means… • ‘the tasks associated with linking related data resources, with coding and re-coding data in a consistent manner, and with accessing related data resources and combining them within the process of analysis’[…DAMES Node..] • Usually performed by social scientists themselves • Most overt in quantitative survey data analysis • Preparing or ‘enabling’ survey analysis • Usually a substantial component of the work process • But not explicitly rewarded (and sometimes penalised) • Here we differentiate from archiving / controlling data itself

  9. ‘The significance of data management for social survey research’(see http://www.esds.ac.uk/news/eventdetail.asp?id=2151) • The data management studied across the DAMES Node is a major component of the social survey research workload • Pre-release manipulations performed by distributors / archivists • Coding measures into standard categories • Dealing with missing records • Post-release manipulations performed by researchers • Re-coding measures into simple categories • We do have existing tools, facilities and expert experience to help us…but we don’t make a good job of using them efficiently or consistently • So the ‘significance’ of DM is about how much better research might be if we did things more effectively…

  10. In GE*DE, we’re developing • Services for accessing and depositing specialist data • Occupations, educational qualifications, ethnicity • UK Administrative data (with ADLS) • Materials specifically oriented to comparative analytical approaches • Data resources often from major cross-national studies • Producing new cross-national data resources • (see also talk on standardization of categorical data in session 4a)

  11. GEODE v1: Organising and distributing specialist data resources (on occupations)

  12. Cross-national data in DAMES and GE*DE • New specialist data on occupations, education and ethnicity • Curation and re-release of existing data • Generation of new data (and/or metadata), with focus on standardisation/ harmonisation • Conduit to existing resources • Generic resources for workflow documentation and replication

  13. E.g. (1a) Occupations[cf. Leiulfsrud et al. 2005]

  14. E.g. (1b) Ethnicity / Migration

  15. E.g. (2): Occupations

  16. E.g. (3): Workflow documentation

  17. Part 2: The contribution of e-Science The contribution should concern: • Navigating complex data • Security • Workflows • Compare with current issues for cross-national surveys: • Quantity of data (and metadata) • Debates on harmonisation, equivalence, data quality • Access to data

  18. (a) Quantity of data (& metadata) …current trends • Moving beyond macro-data analysis* to exploiting large-scale micro-data *Country level analysis, e.g. Fuchs (2009) • Interest in / access to secure micro-data • Exploitation of complex micro-data • Longitudinal data and the life-course [Mayer, 2005] • Micro-data and links with macro-data • Metadata about the quality of the micro-data

  19. (a) … can be helped by… • Interest in / access to secure micro-data E-Science projects building portals for secure access to data (e.g. Sinnott 2008) • Exploitation of complex micro-data Services for organising complex data (e.g. GE*DE) Metadata provision on data resources (e.g. PolicyGrid) Comparative standardisations (e.g. GE*DE) Tools for complex analysis (e.g. e-Stat) Tools for simulation (e.g. NeISS) Tools for visualisation of complex data (e.g. Maptube) Tools for workflow records for research lifecycle (cf. MyExperiment]

  20. (b) Harmonisation, equivalence and data quality • Variable manipulations require standardization through measurement or meaning equivalence, and adequate documentation / justification for those manipulations • E-Science resources support • Documenting / replicating ex post harmonisations e.g. syntax databases at GE*DE • Furnishing new scaling tools (meaning equivalence) e.g. scales of educational qualifications at GE*DE • Facilitating manipulations and standardizations e.g. user-friendly services on variables at GE*DE to enable plurality of alternative measures • Pluralistic/open source v’s quality control

  21. More on GE*DE and issues of data quality • GE*DE covers Occupations; Educational qualifications; Ethnicity and migration • These are ‘key variables’ in social science research • Regularly measured • Link to concepts of central interest • Multivariate context (Critical relations with gender, age cohort, etc)

  22. Key variables: concepts and measures

  23. c) Access to data ..need for • Facilities for granting access to data Including new [potentially secure] data • Distribution of suitably detailed metadata [cf. Highly selective approach of existing projects, and benefits of pre-harmonisation accordingly] • E-Social science contributions • Security infrastructures (e.g. portal frameworks) offer much stronger models for secure access to data • Services for organising / distributing metadata

  24. The contribution of e-Science - reflections The contribution should concern: • Navigating complex data • Security • Workflows • But, generally, it isn’t taken up (cf. existing networks, e.g. LIS, IPUMS, ESS, etc)

  25. Possible explanations • E-science tools and services too heavyweight compared to ad hoc sharing solutions • Overheads in adopting e-Science tools (cf. existing working models) • E-science tools are unduly generic (c.f. ongoing focussed projects and related resources) • Working habits: Experts and software • Major cross-national projects pre-date e-Science initiatives • Key role of project-specific experts • Many projects are ‘small N’ and don’t seem to require heavyweight inputs • Survey researchers collaborate through proprietary software (e.g. Stata, SPSS)

  26. Conclusions – will things change? • Overheads of e-Science engagement might decline • GE*DE aims: user friendly services, service delivery emphasis, training workshops, mainstream software • Existing ad hoc practices could become insufficient • Data of greater scale and complexity • Data with security limits • Need for integrated access and complex analysis • Need for plurality in analyses of multiple measures (even in ‘Small N’ comparisons) • Need for documentation for replication

  27. References cited • Abbott, A. (2006). Mobility: What? When? How? In S. L. Morgan, D. B. Grusky & G. S. Fields (Eds.), Mobility and Inequality. Stanford: Stanford University Press. • Atkinson, A. B. (1996). Seeking to explain the distribution of income. In J. Hills (Ed.), New Inequalities: The changing distirbution of income and wealth in the United Kingdom. Cambridge: Cambridge University Press. • Bosveld, K., Connolly, H., Rendall, M. S., & (2006). A guide to comparing 1991 and 2001 Census ethnic group data. London: Office for National Statistics. • Foster, I., Kesselman, C., & Tuecke, S. (2001). The Anatomy of the Grid: Enabling Scaleable Virtual Organizations. International Journal of Supercomputer Applications, 15(3), 200-222. • Fuchs, C. (2009). The Role of Income Inequality in a Multivariate Cross-National Analysis of the Digital Divide. Social Science Computer Review, 27(1), 41-58. • Halfpenny, P., Procter, R., & (2009). Guest editorial: Special issue on e-Social Science. Social Science Computer Review, 27(4). • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. • Mayer, K. U. (2005). Life courses and life chances in a comparative perspective. In S. Svallfors (Ed.), Analyzing Inequality: Life Chances and Social Mobility in Comparative Perspective. Stanford: Stanford University Press. • Minnesota Population Center. (2009). Integrated Public Use Microdata Series - International: Version 5.0. Minneapolis: University of Minnesota. • Schneider, S. L. (2008). The International Standard Classification of Education (ISCED-97). An Evaluation of Content and Criterion Validity for 15 European Countries. Mannheim: MZES. • Sinnott, R. O., & (2008). Grid Security. In L. Wang, W. Jie & J. Chen (Eds.), Grid Computing: Technology, Service and Applications. London: CRC Press. • Stewart, K., Sefton, T., & Hills, J. (2009). Introduction. In J. Hills, T. Sefton & K. Stewart (Eds.), Towards a more equal society? Poverty, inequality and policy since 1997. Bristol: The Policy Press. • Treiman, D. J. (2009). Quantitative Data Analysis: Doing Social Research to Test Ideas. New York: Jossey Bass.

More Related