300 likes | 311 Views
Big Data: Bridging the Qualitative/Quantitative Divide?. 7th ESRC Research Methods Festival, 5-7 July 2016. Professor Jane Elliott, CEO @JaneElliott66. Overview. What is Big Data? ESRC investments in data infrastructure The variety of textual data
E N D
Big Data: Bridging the Qualitative/Quantitative Divide? 7th ESRC Research Methods Festival, 5-7 July 2016 Professor Jane Elliott, CEO @JaneElliott66
Overview • What is Big Data? • ESRC investments in data infrastructure • The variety of textual data • Machine learning and new approaches to analysis • Bridging the qualitative – quantitative divide • Challenges and Conclusions
Volume of Big data • 2.5 quintillion bytes of data created every day (Quintillion = exabyte = 10 to the power 18 or 1,000,000,000,000,000,000) • 90% of the data in the world today has been created in the last two years
Opening of the Archive 1967 Prof A M Potter first director of SSRC Data Bank
The Queen being shown the Archive by director Howard Newby, during her visit in 1985.
The UK Data Archive: a brief history • 1967: The ‘Data Bank’ is opened at the University of Essex • 1972: renamed the ‘Survey Archive’ • 1982: renamed the ‘SSRC Data Archive’ (1984 ESRC) • 1992: specialist History Data Service set up • 1996: renamed ‘The Data Archive’ • 2001: merger with the Qualidata archive & renamed ‘The UK Data Archive • ….now with over 7000 data collections
"in order to fully answer many critical social scientific questions we will, in future, need an integrated and coordinated approach that links social science data with, […] administrative and ecological data...[and] to take this forward we are working with other funders to commence the development of a national data strategy". (ESRC annual Report 2003/4) Professor Sir Ian Diamond
ESRC investment in Big Data • Huge new investment in 2012 for RCUK – continuing priority • ESRC interest in ‘data not originally collected for research purposes’ • Huge potential – as well as complexity and issues • Three Phases of investment • Administrative data • Business data • Social media/ civil society data • Facilitating access to novel forms of data for research purposes UK Data Service Consumer Data Research Centre Urban Big Data Centre Business and Local Government Data Research Centres Administrative Data Research Network
Business and Local Government Data Research Centres – Phase 2 • Facilitate access to data held by private sectors and local government organisations • Cross-cutting themes • Infrastructure, research, training, methods • Three Centres funded: • Urban Big Data Centre – Glasgow • Consumer Data Research Centre – Leeds/ UCL • BLG Data Research Centre – Essex • Started February 2014
Originally written Open ended free text on surveys Children’s essays Mass Observation Newspaper articles Diaries Letters Political manifestos etc Tweets; blogs; comments Broadcast media Overheard conversation Interview transcripts Focus group transcripts Recorded conversations Originally spoken Independent of research process Elicited for research
Originally written Open ended free text on surveys Children’s essays Mass Observation Newspaper articles Diaries Letters Political manifestos etc Tweets; blogs; comments Broadcast media Overheard conversation Interview transcripts Focus group transcripts Recorded conversations Originally spoken Independent of research process Elicited for research
NCDS 11-year old Essays • At age 11, in 1969 NCDS Cohort members completed a short questionnaire (at school) about leisure interests, preferred school subjects and expectations on leaving school • They were also asked to write an essay on the following topic: ‘Imagine you are now 25 years old. Write about the life you are leading, your interests, your home life and your work at the age of 25. (You have 30 minutes to do this).’ • 13669 essays completed, mean length 204 words • Copies of the original essays (in children’s handwriting) are available on microfiche at CLS and have been digitised.
Mohr (2015) approaches to text analysis • Thin readings • Content analysis • Word counts • Topics • Thick readings • Poetic analysis • Genre • Rhetoric • Rhythm • Hermeneutics
Machine learning • Supervised models • Classification system already established • Unsupervised models • Classification system may be developed by the algorithm
Supervised model: e.g. distinguishing pictures of cats and dogs
4 principles of quantitative text analysis (Grimmer and Stewart 2013) • All quantitative models of language are wrong (but some are useful) • Quantitative methods for text amplify and augment humans • There is no globally best method for automated text analysis • Validate, validate, validate
Source: Benoit et al (2016) Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. (American Political Science review
Source: Benoit et al (2016) Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. (American Political Science review
‘Instead of quantitative researchers trying to build fully automated methods and qualitative researchers trying to make do with traditional human-only methods, now both are heading toward using or developing computer-assisted methods that empower both groups. This development has the potential to end the divide, to get us working together to solve common problems, and to greatly strengthen the research output of social science as a whole’ (King 2014, p3)
Conclusions: three main challenges • Developing tools • Developing substantive research questions • Ethical research