230 likes | 342 Views
Paul Lambert, 31 st January 2012 Talk to the seminar ‘Data management in the social sciences and the contribution of the DAMES Node’, a session organised as part of the Data Management through e-Social Science ESRC research Node www.dames.org.uk.
E N D
Paul Lambert, 31st January 2012 Talk to the seminar ‘Data management in the social sciences and the contribution of the DAMES Node’, a session organised as part of the Data Management through e-Social Science ESRC research Node www.dames.org.uk Opportunities and prospects in social research DAMES, 31/JAN/2012, T6
Start by thinking big… • Landes’ (1969) analysis • Knowledge-based revolutions • Importance of standardising technology for cooperation (not just creating it) • Importance of access to underlying materials – coal, cotton, etc. • Uneven development (nationally) Landes, D.S. (1969). The Unbound Prometheus: Technological Change and Industrial Development in Western Europe from 1750 to the Present. Cambridge: Cambridge University Press. • Emergent uses of computing and the internet, such as in ‘e-Science’ traditions, arguably share similar characteristics • Standardisation, communication, vast volumes of resources • Social research data, e.g. large scale surveys and other large quantitative resources, exemplify these opportunities DAMES, 31/JAN/2012, T6
E-Social Science / Digital Social Research ESRC & JISC initiatives as major UK investment in ‘e-social science’ technology (see www.digitalsocialresearch.net) e-Science broadly involves using emergent computer technologies with enhanced capacities for communication/collaboration & data processing • Handling and displaying large volumes of complex data • E.g. GeoVue; LifeGuide; DReSS; Obesity e-lab • Resources for computationally demanding analyses • CQeSS; MoSeS; eStat; NeISS • Standards setting in collaboration, data preparation, data and research support – DRS; MeRC; OeSS;DAMES DAMES, 31/JAN/2012, T6
Example: Understanding New Forms of Digital Records (DReSS) • transcribed talk • audio • video • digital records • system logs • location video code tree transcript system log DAMES, 31/JAN/2012, T6
..more examples..(strategies for social scientists to tap into the e-Infrastructure) E-Stat @ National e-Infrastructure for Social Simulation Expert led simulation demonstrations Combining data resources Workflows for the simulation analysis Modify and re-specify existing simulation templates www.neiss.org.uk • ‘StatJR’ a tool to specify complex statistical models in generic / visual terms • Multilevel models • Multiple data permutations and analytical alternatives • Ready access to a suite of complex modelling tools • www.bristol.ac.uk/cmm/research/estat/ DAMES, 31/JAN/2012, T6
e-Science, data management, and research revolutions (!) • ‘Data management through e-Social Science’ • DAMES (2008-11) – developing services / resources using e-Science approaches which will help social scientists in undertaking data management tasks • Information / data retrieval (e.g. GESDE systems) • Storage and processing of data and metadata (e.g. secure portals and ‘curation’ and ‘fusion’ tools) • …’Data management’ is at the centre of transformations in the exploitation of information resources… • Collaboration / standardisation in constructing empirical results • Facility to host and distribute new forms of data • Facility to discriminate between the masses of data DAMES, 31/JAN/2012, T6
Prospects in social research • The changing terrain of social research and three exciting developments/frontiers: • Data access • Data management and analysis • Log books • Some thoughts on the trajectory of social research developments DAMES, 31/JAN/2012, T6
1) Access to data.. Example: Accessing surveys via UK Data Archive Shibboleth authentication Download and analyse in Stata, SPSS, etc DAMES, 31/JAN/2012, T6
Supplementary (digital) data • E.g. ‘Occupational information resources’ = data files within information on occupations, which can be usefully linked to micro-data about occupations e.g. GEODE acts as a library of OIRs, www.geode.stir.ac.uk Such resources are often not widely known about, but have the ability to enhance analysis DAMES, 31/JAN/2012, T6
Steady accumulation of options / permutations / approaches in… 2a) Data Management • Pre-analysis (and re-analysis) routines • Sensitivity analysis • Standardisation, harmonisation 2b) Data Analysis • Descriptive tools • Ongoing development of complex analytical models • GLLMMs for structural data features, multi-process systems, etc DAMES, 31/JAN/2012, T6
E-Stat ebooks (image from doc in prep., Browne et al. 2011) (Links to product from StatJR)
3) Log books • Software tools for logging work are increasingly well developed • See our workshops on documentation/replication • Other initiatives in sharing records of work • E-Stat: Electronic workbooks for the data and model building process • MyExperiment: Depository for project files These haven’t yet been extensively exploited in survey research – but they should be! DAMES, 31/JAN/2012, T6
The idea of workflows • Workflow modelling has an exciting future.. • Workflow documentation • MyExperiment [http://www.myexperiment.org/] • Social survey analysis • Long, J.S. (2009) Workflow of Data Analysis using Stata. CRC press • At present… • Tool development in process • Depositing workflows might impose constraints/burdens DAMES, 31/JAN/2012, T1
Example of using MS Excel for workflow documentation in survey research DAMES, 31/JAN/2012, T1
Who will take the initiative? • Long, J. S. (2009). The Workflow of Data Analysis Using Stata. Boca Raton: CRC Press. 1-5: Programming in Stata; 6: Cleaning your data; 7: Analysing data and presenting results; 8: Protecting your work Bespoke solutions or the generic/dynamic approaches of e-Science? • “Because claims in published papers that additional materails are “available from author” usually prove false, at least after a few months, the California Center for Population Research at UCLA recently implemented a mechanism by which additional materials, for example, -do- and –log- files, can be attached to papers posted in its Population Working Paper archive. Other research centers are to be encouraged to do the same” • (p404 of Treiman (2009) Quantitative Data Analysis. NY: Jossey Bass) DAMES, 31/JAN/2012, T1
Well-known challenges in survey research • We’re data rich, but analysts’ poor • UK Data Forum (2007); Wiles et al (2009) (http://eprints.ncrm.ac.uk/810/) • Under-use of suitably complex statistical models • Coordination and communication on data processing • Recodes / Standardisation / harmonisation / documentation • Lack of generic/accessible representation of tasks • Limited disciplinary/project/researcher cross-over when dealing with data • Specific software orientations These are not generally problems of scale, but of organisation DAMES, 31/JAN/2012, T6
‘Managed’ solutions? • Data handling/analysis capacity-building ESRC programmes (NCRM, RDI, RMP); training workshops/materials; P/G funds; strategic research grant investment • Documentation/replication policies • Software for data access and analysis NESSTAR – UK Data Archive data/metadata browser Long (2009) on the Stata software Remote access to data (e.g. SDS) DAMES, 31/JAN/2012, T6
..train and/or constrain the analysts.. Train them -> DAMES, 31/JAN/2012, T6
..constrain the analysis.. DAMES, 31/JAN/2012, T6
‘Social’ solutions? • Tools and infrastructure for better standards to are built up from within (aided by collaborative technologies) • E.g. GESDE, P-ADLS, MethodBox, www.methodbox.org DAMES, 31/JAN/2012, T6
Summary • e-Science would often be seen as about enabling effective research in conditions of abundant resources • In practical terms, for social researchers, this means navigating through the vast array of data and analytical resources, and undertaking robust and replicable work • Likely continuation of mix of generic and specific, managed and social, approaches DAMES, 31/JAN/2012, T6