1 / 38

Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects. Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER. Structure of this talk. What is e-Science What is the Grid

mirari
Download Presentation

Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E-science resources for handling data on occupations, educational qualifications and ethnicity – the DAMES and GEODE projects Paul Lambert University of Stirling Vernon Gayle University of Stirling & ISER

  2. Structure of this talk • What is e-Science • What is the Grid • What can e-Social Science do for survey research? • Grid Enabled Specialist Data Environments • Specialist files and resources • A little on occupations • Something on education • Almost nothing on ethnicity (it is only 30 mins)

  3. What is e-Science • Originally experiments to connect together a few powerful computers • The ability to connect high powered computers to undertake enormous calculations often on huge datasets • “The Grid” = the co-ordination of geographically dispersed computing and data resources

  4. What is e-Science • “What is exciting about the Grid is the combination of extensive connectivity, massive computer power and vast quantities of digitised data – all three of which are still rapidly expanding – making possible new applications that are orders of magnitude more potent than even a few years ago” • “The term 'e-research' is sometimes used instead of 'e-science', with the advantage that gives more emphasis to the end result of better, richer, faster or new research results, rather than the technologies used to get them” (http://www.ncess.ac.uk/)

  5. The Grid • Grid computing (or the use of a computational grid) is the application of several computers to a single problem at the same time • usually to a scientific or technical problem that requires a great number of computer processing cycles or access to large amounts of data • According to John Patrick, IBM’s vice president for Internet strategies, “the next big thing will be grid computing”

  6. E-Social Science in the UK • ‘e-Science’ nowadays used as a broader term involving use of technologies associated with the Grid and with other collaborations between computing and software resources • NCeSS: UK programme of projects looking at e-Science applications in social science projects (e.g. distributed computing; access and analysis of complex data; secure access to sensitive data)

  7. Less obviously some other activities where e-science could potentially help the research process Data Preparation & Management • Manipulating data • Recoding categories / ‘operationalising’ variables These are the focus of the DAMES research Node, www.dames.org.uk

  8. Data Collection (survey data agencies, academics etc.) The Orthodox Survey Research Process Data Storage & Curation (data archives etc.) Data Management and Analysis – “Lone Researcher” Stand alone computer (usually a PC) Statistical Software (e.g. SPSS)

  9. Statistical Analysis Process • Awesome increases in desk top computing power (and storage capacity) • Almost instant data download (from archives etc) • The time ratio of data preparation to statistical modelling is probably about 10:1

  10. e-Social Science Possibilities Software (e.g. Sabre R) Data Linking computing networks secure access Harmonisation

  11. e-Social Science Possibilities Software (e.g. Sabre R) Specialist Files & Resources Data Linking computing networks secure access Harmonisation

  12. Grid Enabled Specialist Data Environments (‘GE*DE’) • Programme of activities within the DAMES research Node • Coordinating access to and exploitation of specialist information resources in the fields of • Occupations • Educational qualifications • Ethnicity and migration

  13. An Example: Specialist Files & Resources • A researcher has a survey with occupational information and wants to construct an occupation based social class measure

  14. An Example:Specialist Files & Resources • Historically, the ‘information’ to construct the measure will be in the following forms • books (or paper files) • www files (e.g. national statistical agencies) • computer files (e.g. Stata .do files)

  15. An Example:Specialist Files & Resources Clear problems... • Access to the information • are the files publicly (or easily) available? • a “.do file” on a single researcher’s hard drive • Unnecessary re-working • certain sources (e.g. paper) need lots of working to produce properly coded survey data

  16. Replicability • Is there clear information that allows a secondary researcher to use the resource? • e.g. clear documentation • what the information science community call metadata i.e. “data about other data” • Dale (2006) for a discussion

  17. Motivation for our projects? • We currently observe inadequate practices in survey data analysis • Substandard practices in data management is observed in the following areas • Not keeping adequate records • Not linking relevant data • Not trying out relevant variable operationalisations

  18. Key Variables • We concentrate (so far) on “key” variables These are variables that are central to, and commonly found in, survey data analysis They include... occupation, education, ethnicity, gender, age, income (some survey variables are easier to deal with than others)

  19. An Example: Occupational Social Class • As far back as the late 60s Frank Bechhoffer recommends that researchers should use established (and therefore replicable) social class schemes • Returning to the researcher with a survey with occupational information (e.g. SOC 90) and employment status information who wants to construct a social class measure

  20. GEODE - • Portal to log into • Searchable – can find resources • e.g. SPSS file that allows linkage of their survey data to an occupation-based social class scheme • Further examples is our working paper (2008-1)

  21. Education: • Education is a key social science measure that is included in an extremely wide variety of substantive analyses • Education as an explanatory (X) variable: Education is frequently used in statistical analyses as a key explanatory variable (usually with a number of other explanatory variables) This is usual in areas such as sociology, social policy and economics

  22. Education: • Education as an outcome (Y) variable: In more specialist studies an education measure is itself of interest as an outcome (for example gaining a specific qualification or level of attainment) This is common in educational studies and within the sociology of education

  23. Education: “the question of how to measure education and qualifications – or indeed what ‘measure’ means – raises interesting issues…Since there is no agreed standard way of categorising educational qualifications” (Prandy, Unt & Lambert 2004)

  24. Comparing Education with Occupational information • Survey starts with textual description • Translated into Occupational Unit Group • Agreed standards of data collection & classification OUG Scheme; Industrial sector; employment status • No similar consensus with educational data

  25. Obvious issues with Educational variables • Many measures (not just qualifications) • Organisation and structure changes • Changes in distributions over time • We can learn from international comparisons

  26. Many Measures Some Examples of the 41 Categories Highest Qualification (General Household Survey 2003) highest qualification | Freq. ----------------------------------------+------------ 1. higher degree | 669 2. nvq level 5 | 20 3. first degree | 1,416 4. other degree | 278 5. nvq level 4 | 71 6. diploma in higher education | 282 7. hnc/hnd btec higher etc | 551 9. teaching - secondary education | 55 10. teaching - primary education | 69 12. nursing etc | 267 14. other higher education below degree | 151 21. scotish 6th year certificate/csys | 24 28. city & guilds craft/part 2 | 306 29. btec/scotvec first or gen diploma e | 42 30. o level, gcse grase a*-c or equival | 2,460 31. nvq level 1 or equivalent | 102 33. gse below grade 1, gcse below grade | 693 41. dont know | 79 ----------------------------------------+------------ Total | 24,489

  27. Many Measures Highest Academic Qualification (British Household Panel Survey 1991 – Wave A) highest academic | qualification | Freq. Percent Cum. ----------------------------+----------------------------------- -9. missing | 19 0.19 0.19 -7. proxy respondent | 352 3.43 3.61 1. higher degree | 122 1.19 4.80 2. 1st degree | 598 5.83 10.63 3. hnd,hnc,teaching | 496 4.83 15.46 4. a level | 1,362 13.27 28.73 5. o level | 2,510 24.45 53.19 6. cse | 529 5.15 58.34 7. none of these | 4,276 41.66 100.00 ----------------------------+----------------------------------- Total | 10,264 100.00

  28. Organisational Changes Type of School Attended by Birth Cohorts British Household Panel Survey 1991 – Wave A (Extract column percentages) cohorts type of school | attended | Prewar 1944 Act Crossland | Total ----------------------+---------------------------------+---------- comprehensive sch | - 10.47 53.25 | 25.92 ----------------------+---------------------------------+---------- grammar not fee pa | 9.58 19.14 8.06 | 12.10 ----------------------+---------------------------------+---------- grammar fee-paying | 4.55 1.93 0.97 | 2.25 ----------------------+---------------------------------+---------- public & private | 5.52 5.63 4.68 | 5.22 ----------------------+---------------------------------+---------- elementary | 35.20 2.45 - | 10.35 ----------------------+---------------------------------+---------- secondary modern | - 52.11 24.01 | 33.64 ----------------------+---------------------------------+---------- technical | - 3.49 0.80 | 2.15 ----------------------+---------------------------------+---------- 1.Suspect errors – potentially misleading measure

  29. Changes in Qualification (titles & levels) GHS 1983 GHS 2003 O’Levels GCSE

  30. Changes in Distributions British Household Panel Survey (Wave M)Respondent’s Education Level and Father’s Education Level

  31. We can learn from international comparisons CASMIN Brynin Example of BHPS & GSOEP

  32. Can e-Social science help us? • Data discipline • Data matching / merging • Data access (confidential records) (future changes in access agreements)

  33. What should we do in DAMES? • Database of typologies of qualifications linking to broader educational measures • Listings / taxonomies of educational titles • e.g. based on what major social surveys have used • Enhanced access to specialist data on educational qualifications • Same model as GEODE? • User friendly prescriptions for best practice in using educational data • User friendly support for distributing data (and metadata) on education

  34. Ethnicity and the DAMES project • Tricky topic to collate information on • Few recognisable ‘ethnic unit groups’ • Limited previous ‘data management’ reflection • Very few published databases on ethnicity • Important question of sparse distributions • Dynamic, & rapidly expanding (or contracting) • Likely role is to give guidance on existing data / taxonomies and routines to allow their analysis • category recodings \ scaling of categories • support for analysis in context of age \ gender \ region • {GEODE model with far fewer ‘Ethnicity unit groups’}

  35. Conclusions • e-Social Science resources can help improve survey research • assist with access to disparate resources • help with data management (especially key variables) • help with data standard and best practice • help with replicability (and improve incremental science)

  36. Brynin, M. (2003). Using CASMIN: the effect of education on wages in Britain and Germany, in Hoffmeyer-Zlotnik, J. and Wolf, C., Advances in Cross-National Comparison: A European Working Book for Demographic and Socio-Economic Variables, Kluwer: Amsterdam, 327-44. Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. Lambert, P. S., Tan, K. L. L., Turner, K. J., Gayle, V., Prandy, K., & Sinnott, R. O. (2007). Data Curation Standards and Social Science Occupational Information Resources.International Journal of Digital Curation, 2(1), 73-91. Lambert, P.S., Gayle, V., Tan, L., Blum, J., Bowes, A., Jones, S., Turner, K., Warner, G., Sinnott, R., & Bihagen, E. (2008). Grid Enabled Specialist Data Environments: Forward Planning for GE*DE Services for Specialist Data Occupations, Educational Qualifications, and Ethnicity, Dames Project Technical Paper 2008-1 Prandy, K., Unt, M., & Lambert, P. S. (2004). Not by degrees: Education and social reproduction in twentieth-century Britain. Paper presented at the ISA RC28 Research Committee on Social Stratification and Mobility.

More Related