290 likes | 399 Views
Paul Lambert, 24-25 November 2010 Talk to the ‘Documentation and Workflows for Social Survey Research’ training workshop, part of the Data Management through e-Social Science ESRC research Node www.dames.org.uk. e-Science, Data Management and Frontiers in Survey Research.
E N D
Paul Lambert, 24-25 November 2010 Talk to the ‘Documentation and Workflows for Social Survey Research’ training workshop, part of the Data Management through e-Social Science ESRC research Node www.dames.org.uk e-Science, Data Management and Frontiers in Survey Research DAMES, 24-5/NOV/2010
Part 1: E-Social Science / Digital Social Research ESRC & JISC initiatives a major UK investment in ‘e-social science’ technology (see www.digitalsocialresearch.net) • Handling and displaying large volumes of complex data • E.g. GeoVue; LifeGuide; DReSS; Obesity e-lab • Resources for computationally demanding analyses • CQeSS; MoSeS; eStat; NeISS • Standards setting in collaboration, data preparation, data and research support – DRS; MeRC; OeSS;DAMES • US - ‘Cyberinfrastructures’; EU - ‘EUGrid’ DAMES, 24-5/NOV/2010
Example: Data on occupations, educational qualifications and ethnicity (ww.dames.org.uk) • Linking complex data • Metadata • Security • Workflows DAMES, 24-5/NOV/2010
Example: Understanding New Forms of Digital Records (DReSS) • transcribed talk • audio • video • digital records • system logs • location video code tree transcript system log DAMES, 24-5/NOV/2010
..more examples.. E-Stat @ National e-Infrastructure for Social Simulation Expert led simulation demonstrations Combining data resources Workflows for the simulation analysis Modify and re-specify existing simulation templates www.neiss.org.uk • Design a tool to specify complex statistical models in generic / visual terms • Multilevel models • Multiple data permutations and analytical alternatives • Ready access to a suite of complex modelling tools • www.cmm.bristol.ac.uk/research/NCESS-EStat/ DAMES, 24-5/NOV/2010
Other selected e-Science projects (concerned with accessing/handling complex data) DAMES, 24-5/NOV/2010
E-Science as ‘dealing with data’ • Collect/access vast quantities of data • Complex surveys & comparisons • Admin. & other data resources • Guarding secure microdata (e.g. health records; intv. transcripts) • Data management and Data Analysis • Standards setting • Exciting new facilities/analyses • {Global} communication and collaboration …amongst researchers and data resources… DAMES, 24-5/NOV/2010
The relevance of e-Science to data management • ‘Data management through e-Social Science’ • ‘E-Science’ refers to adopting a number of particular approaches and standards from computing science, to applied research areas • These approaches include ‘the Grid’; distributed computing; data and computing standardisation; metadata; security; research infrastructures • UK investment in capitalising on these developments • DAMES (2008-11) – developing services / resources using e-Science approaches which will help social scientists in undertaking data management tasks DAMES, 24-5/NOV/2010
E-Science and Data Management E-Science isn’t essential to good DM, but it has capacity to improve and support conduct of DM… • Concern with standards setting in communication and enhancement of data • Linking distributed/heterogeneous/dynamic data Coordinating disparate resources; interrogating live resources • Contribution of metadata tools/standards for variable harmonisation and standardisation • Linking data subject to different security levels • The workflow nature of many DM tasks DAMES, 24-5/NOV/2010
GESDE – Search and browse supplementary data on occupations; educational qualifications; ethnicity DAMES, 24-5/NOV/2010
The contribution of DAMES8 project themes DAMES, 24-5/NOV/2010
Storage Storage HPC HPC Social scientist Social scientist Social scientist Data Data Experiment Computing Analysis Analysis Tapping in: Portals & e-Infrastructural overviews slide from Peter Halfpenny (2009), see www.merc.ac.uk Seamless integration of data, analytic tools and compute resources Grid Middle- ware Simple interface Single sign on e-Infrastructure
Tapping into the e-InfrastructureLong, arduous road from innovation to seamless service delivery.. • ‘Working together’: computer- & social- science collaborations • The ‘social shaping’ of e-Science – www.oii.ox.ac.uk/microsites/oess/ • Teamwork, ‘divisions of knowledge’, separation of data and analysis, are all routine [cf. Mauthner & Doucet, 2008] • Ability to engage with advanced information • e.g. social simulation; network locations [cf. Prior 2008] • classic sociology – class, ethnicity, social structures • new technological opportunities – e.g. public health projects • Requirements of existing tools and services …advanced quantitative methods [cf. Williams et al 2008; 2004] …patience, & some O.S. facility..! …selective access to technologies [by researchers – cf. Murthy, 2008] DAMES, 24-5/NOV/2010
The researched: Ethics, security, anxiety • Fair ethical scrutiny of e-Science research e.g. secure access to health data Oxford e-Social Science Node on e-research ethics • Residual anxieties Is e-Science data effectively covert? • Informed consent & overt/covert continuum[cf. Calvey, 2008] The voice of the researched? E-Infrastructure overheads as gatekeepers {at present} Managing mass engagement – e.g. of Lifeguide as prescriptive? DAMES, 24-5/NOV/2010
Part 2) Frontiers in social survey research? • The changing terrain of social survey research and four exciting developments/frontiers: • Data access • Data management • Data analysis • Log books DAMES, 24-5/NOV/2010
1) Access to data.. Example: Accessing surveys via UK Data Archive Shibboleth authentication Download and analyse in Stata, SPSS, etc
Complex data example: British Household Panel Survey dataset [SN 5151] • This example shows BHPS being analysed in Stata. BHPS re-contacts subjects annually (since 1991) • 4294 interviewed as adults every year for 17 years. • Analysis methods, and measurement issues over time, are challenging.
Large and complex social surveys • several thousand variables • tens of thousands of cases (micro-data) • additional complex survey data features (e.g. household clustering) DAMES, 24-5/NOV/2010
Supplementary (digital) data • E.g. ‘Occupational information resources’ = data files within information on occupations, which can be usefully linked to micro-data about occupations e.g. GEODE acts as a library of OIRs, www.geode.stir.ac.uk Such resources are often not widely known about, but have the ability to enhance analysis DAMES, 24-5/NOV/2010
Steady accumulation of options / permutations / approaches in… • Data Management • Pre-analysis (and re-analysis) routines • Sensitivity analysis • Standardisation, harmonisation • Data Analysis • Descriptive tools • Ongoing development of complex analytical models • GLMMs for structural data features, multi-process systems, etc DAMES, 24-5/NOV/2010
4) Log books • Software tools for logging work are increasingly well developed See our ‘software session 1’ description • Other initiatives in sharing records of work • E-Stat: Electronic workbooks for the data and model building process • MyExperiment: Depository for project files These haven’t yet been extensively exploited in survey research – but they should be! DAMES, 24-5/NOV/2010
Well-known challenges in survey research • We’re data rich, but analysts’ poor • UK Data Forum (2007); Wiles et al (2009) • Under-use of suitably complex statistical models • Coordination and communication on data processing • Recodes / Standardisation / harmonisation / documentation • Lack of generic/accessible representation of tasks • Limited disciplinary/project/researcher cross-over when dealing with data • Specific software orientations These are not generally problems of scale, but of organisation DAMES, 24-5/NOV/2010
‘Managed’ solutions? • Data handling/analysis capacity-building ESRC programmes (NCRM, RDI, RMP); training workshops/materials; P/G funds; strategic research grant investment • Documentation/replication policies Dale (2006) • Software for data access and analysis NESSTAR – UK Data Archive data/metadata browser Long (2009) on the Stata software Remote access to data (e.g. SDS)
..train and/or constrain the analysts.. Train them -> DAMES, 24-5/NOV/2010
..constrain the analysis.. DAMES, 24-5/NOV/2010
Summary • E-Science would often be seen as about enabling effective research in conditions of abundant resources • In practical terms, for survey researchers, this means navigating through the vast array of data and analytical resources, and undertaking defensible and replicable work.. DAMES, 24-5/NOV/2010
A preposterous conclusion… e-Science adoption and the Industrial revolution…? • Landes (1969) The Unbound Prometheus • Knowledge-based revolution • Importance of standardising technology for cooperation (not just creating it) • Importance of having access to underlying materials – coal, cotton, etc. • Uneven development (nationally) Landes, D. S. (1969). The Unbound Prometheus: Technological Change and Industrial Development in Western Europe from 1750 to the Present. Cambridge: Cambridge University Press. DAMES, 24-5/NOV/2010
Cardiff’s two transformations Images from: www.lovemywales.com/history.php Cardiff docks c1850 Cardiff docks c2005
ReferencesAcknowledgements: The ESRC has funded research into e-Social Science via the NCeSS, www.ncess.ac.uk and Digital Social Research http://www.digitalsocialresearch.net/ groups and their related Nodes and grant projects. • Calvey, D. (2008). The Art and Politics of covert research: Doing 'situated ethics' in the field. Sociology, 42(5), 905-918. • Dale, A. (2006). Quality Issues with Survey Research. International Journal of Social Research Methodology, 9(2), 143-158. • Freese, J. (2007). Replication Standards for Quantitative Social Science: Why Not Sociology? Sociological Methods and Research, 36(2), 153-171. • Halfpenny, P. (2008, 30 June - 3 July). What is.. e-Social Science. Paper presented at the ESRC NCRM Research Methods Festival, St Catherine's College, University of Oxford. • Lambert, P. S., & Gayle, V. (2009). Data management and standardisation: A methodological comment on using results from the UK Research Assessment Exercise 2008. Stirling: University of Stirling, Technical paper 2008-3 of the Data Management through e-Social Science research Node (www.dames.org.uk) • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. • Mauthner, N. S., & Doucet, A. (2008). 'Knowledge once divided can be hard to put together again'. Sociology, 42(5), 971-985. • Murthy, D. (2008). Digital Ethnography: An examination of the use of new technologies in social research. Sociology, 42(5), 837-855. • Prior, L. (2008). Repositioning documents in social research. Sociology, 42(5), 821-836. • Savage, M., & Burrows, R. (2007). The coming crisis of empirical sociology. Sociology, 41(5), 885-899. • UK Data Forum. (2007). The National Strategy for Data Resources for Research in the Social Sciences. Warwick: University of Warwick, http://www2.warwick.ac.uk/fac/soc/nds/ (Accessed 18 June 2007). • Wiles, R., Bardsley, N., & Powell, J. L. (2009). Consultation on research needs in research methods in the UK social sciences. Southampton: University of Southampton / ESRC National Centre for Research Methods, and http://eprints.ncrm.ac.uk/810/ • Williams, M., Collett, T., & Rice, R. (2004). Baseline Study of Quantitative Methods in British Sociology. University of Plymouth: C-SAP Project report to the British Sociological Association. • Williams, M., Payne, G., Hodgkinson, L., & Poade, D. (2008). Does British Sociology Count. Sociology, 42(5), 1003-1021. DAMES, 24-5/NOV/2010