1 / 30

Big Data: Bridging the Qualitative/Quantitative Divide?

Big Data: Bridging the Qualitative/Quantitative Divide?. 7th ESRC Research Methods Festival, 5-7 July 2016. Professor Jane Elliott, CEO @JaneElliott66. Overview. What is Big Data? ESRC investments in data infrastructure The variety of textual data

tpaula
Download Presentation

Big Data: Bridging the Qualitative/Quantitative Divide?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data: Bridging the Qualitative/Quantitative Divide? 7th ESRC Research Methods Festival, 5-7 July 2016 Professor Jane Elliott, CEO @JaneElliott66

  2. Overview • What is Big Data? • ESRC investments in data infrastructure • The variety of textual data • Machine learning and new approaches to analysis • Bridging the qualitative – quantitative divide • Challenges and Conclusions

  3. Volume of Big data • 2.5 quintillion bytes of data created every day (Quintillion = exabyte = 10 to the power 18 or 1,000,000,000,000,000,000) • 90% of the data in the world today has been created in the last two years

  4. Opening of the Archive 1967 Prof A M Potter first director of SSRC Data Bank

  5. The Queen being shown the Archive by director Howard Newby, during her visit in 1985.

  6. The UK Data Archive: a brief history • 1967: The ‘Data Bank’ is opened at the University of Essex • 1972: renamed the ‘Survey Archive’ • 1982: renamed the ‘SSRC Data Archive’ (1984 ESRC) • 1992: specialist History Data Service set up • 1996: renamed ‘The Data Archive’ • 2001: merger with the Qualidata archive & renamed ‘The UK Data Archive • ….now with over 7000 data collections

  7. "in order to fully answer many critical social scientific questions we will, in future, need an integrated and coordinated approach that links social science data with, […] administrative and ecological data...[and] to take this forward we are working with other funders to commence the development of a national data strategy". (ESRC annual Report 2003/4) Professor Sir Ian Diamond

  8. ADRN State of the art safe facilities

  9. ESRC investment in Big Data • Huge new investment in 2012 for RCUK – continuing priority • ESRC interest in ‘data not originally collected for research purposes’ • Huge potential – as well as complexity and issues • Three Phases of investment • Administrative data • Business data • Social media/ civil society data • Facilitating access to novel forms of data for research purposes UK Data Service Consumer Data Research Centre Urban Big Data Centre Business and Local Government Data Research Centres Administrative Data Research Network

  10. Business and Local Government Data Research Centres – Phase 2 • Facilitate access to data held by private sectors and local government organisations • Cross-cutting themes • Infrastructure, research, training, methods • Three Centres funded: • Urban Big Data Centre – Glasgow • Consumer Data Research Centre – Leeds/ UCL • BLG Data Research Centre – Essex • Started February 2014

  11. www.turing.ac.uk

  12. Originally written Open ended free text on surveys Children’s essays Mass Observation Newspaper articles Diaries Letters Political manifestos etc Tweets; blogs; comments Broadcast media Overheard conversation Interview transcripts Focus group transcripts Recorded conversations Originally spoken Independent of research process Elicited for research

  13. Originally written Open ended free text on surveys Children’s essays Mass Observation Newspaper articles Diaries Letters Political manifestos etc Tweets; blogs; comments Broadcast media Overheard conversation Interview transcripts Focus group transcripts Recorded conversations Originally spoken Independent of research process Elicited for research

  14. NCDS 11-year old Essays • At age 11, in 1969 NCDS Cohort members completed a short questionnaire (at school) about leisure interests, preferred school subjects and expectations on leaving school • They were also asked to write an essay on the following topic: ‘Imagine you are now 25 years old. Write about the life you are leading, your interests, your home life and your work at the age of 25. (You have 30 minutes to do this).’ • 13669 essays completed, mean length 204 words • Copies of the original essays (in children’s handwriting) are available on microfiche at CLS and have been digitised.

  15. Themes in boys’ and girls’ essays

  16. Mohr (2015) approaches to text analysis • Thin readings • Content analysis • Word counts • Topics • Thick readings • Poetic analysis • Genre • Rhetoric • Rhythm • Hermeneutics

  17. Machine learning • Supervised models • Classification system already established • Unsupervised models • Classification system may be developed by the algorithm

  18. Supervised model: e.g. distinguishing pictures of cats and dogs

  19. Be more dog….

  20. 4 principles of quantitative text analysis (Grimmer and Stewart 2013) • All quantitative models of language are wrong (but some are useful) • Quantitative methods for text amplify and augment humans • There is no globally best method for automated text analysis • Validate, validate, validate

  21. Source: Benoit et al (2016) Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. (American Political Science review

  22. Source: Benoit et al (2016) Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data. (American Political Science review

  23. ‘Instead of quantitative researchers trying to build fully automated methods and qualitative researchers trying to make do with traditional human-only methods, now both are heading toward using or developing computer-assisted methods that empower both groups. This development has the potential to end the divide, to get us working together to solve common problems, and to greatly strengthen the research output of social science as a whole’ (King 2014, p3)

  24. Conclusions: three main challenges • Developing tools • Developing substantive research questions • Ethical research

More Related