350 likes | 483 Views
Digital.Humanities@Oxford Summer School, 3 July 2012. Humanities Research Data – Rate me!. Wolfram Horstmann. The Research Data Question. http:// www.flickr.com /photos/ desconciertos /160752180/.
E N D
Digital.Humanities@Oxford Summer School, 3 July 2012 Humanities Research Data – Rate me! Wolfram Horstmann
The Research Data Question http://www.flickr.com/photos/desconciertos/160752180/ Data-driven research is called the 4th Paradigm in the Sciences. Where are humanities in the current discussion about research data?
Ratings, Skepticism & Anxiety http://www.flickr.com/photos/komoda/7187391601/ Research Excellence Framework is a reality. But it is objected that: “Humanities research threatened by demands for 'economic impact'” Guardian 13 October 2009
Outline The current awareness of the importance of research data provides opportunities for the humanities to show their value. ~ The challenge is to communicate what research data means for the humanities. ~ The proposal is to state the obvious more clearly: text and images as research data of the humanities and libraries as humanities research facilities.
Texts and Images as Data • http://www.flickr.com/photos/gorgmorg/9944210/ Humanities work with texts and images as other subject areas work with matter, wetware, hardware or numbers.
Libraries as Research Facilities http://vi.sualize.us/carl_spitzweg_bucherworm_1850_books_library_ladder_reading_picture_2Qp9.html Humanities have institutionalized their research facilities centuries ago, other subject areas did it much later, with labs and centers like CERN or EMBL.
The Advent of the Digital http://www.flickr.com/photos/flex/27334821/ http://tei.oucs.ox.ac.uk/Talks/2008-08-kazan/exercise-2.xml http://www.bodley.ox.ac.uk/librarian/rpc/manchesterpres/slide15.jpg Transforming the physical research facilities into digital is a laborious and expensive exercise – and its potential is not yet exploited.
Digital Humanities & Libraries http://adamcrymble.blogspot.com.es/2012/01/is-old-bailey-online-film-or-science.html World Data Centers or the EBI are centralized – can Humanities Data Centers can be at each institution?
Digital Resources in the Bodleian • approaching petabyte scale of highly structured storage for texts and images • 2.000.000 digitized images, another Million to come in the next 3 years, plus 350.000 Google Books • 100 virtual machines REFERENCE MISSING … and by far most of these are resources of the Humanities.
Cultures of Knowledge http://www.history.ox.ac.uk/cofk/ An example of highly structured, intellectually curated data: more than unique 12.000 people and 3500 locations identified in 60.000 letters with 25.000 annotations.
What’s the Score? http://www.whats-the-score.org/ In only a few months over 10.000 scores have been described by the public.
Broadside Ballads http://ballads.bodley.ox.ac.uk Collaborative research introduces novel qualities into humanities research data management.
Google Books at the Bodleian Approaching one download a minute: 350.000 Google books with estimated 10.000.000 pages and 25.000.000.000 words
Size matters! http://randommization.com/2011/03/08/library-has-giant-books-for-facade/ Even though humanities often use qualitative and hermeneutic methodology – rather than quantitative – the size of data is significant.
Structure matters! 011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010010011010101001010101010101011000100010101001010001000101010 http://cacm.acm.org/magazines/2010/4/81499-the-data-structure-canon/fulltext Sizable numbers will not give a thorough idea of digital humanities data – structure is evenly important. This can only be understood by example.
Collaboration matters! http://www.flickr.com/photos/ludovicmauduit/2646525907 Involvement of colleagues in collaborative research and the public in crowdsourcing makes a difference.
1st Challenge: Diversity http://www.ucl.ac.uk/archaeology/studying/undergraduate/courses/ARCL2037 Humanities have a varied typology of research data, often requiring idiographic approaches. Thus, standardization is difficult (cf. citation), and so is finding computational skills.
2nd Challenge: Openness http://www.flickr.com/photos/uncene/364730693/ As with all researchers, competition, privacy and exploitation are impediments to data sharing. Do humanities more than others keep the “ivory tower” attitude?
Accessibility of Humanities Texts Waltinger, U., Mehler, A., Lösch, M., & Horstmann, W. (2011). Hierarchical Classification of OAI Metadata Using the DDC Taxonomy. In Chambers et al (Eds.), Advanced Language Technologies for Digital Libraries (Vol. 6699, pp. 29 - 40). Berlin / Heidelberg: Springer. Lösch, M., Waltinger, U., Horstmann, W., & Mehler, A. (2011). Building a DDC-annotated Corpus from OAI Metadata. Journal of Digital Information, 12(2) From some 30.000.000 bibliographic records it is hard to fill the humanities corpus. This might constrain discoverability of Humanities resources.
3rd Challenge: Inherent Obstacles Humanities research data show some peculiarities. An extreme example is the closure of archaeological data to protect sites against tomb raiders. Research in the Humanities and Social Sciences : Hogenaar, A. , H. Tjalsma, & M. Priddy. 2011. “Research in the Humanities and Social Sciences” http://dx.doi.org/10.2390/PUB-2011-7
4th Challenge: Implementing Policy Deposit of resources or datasets Grant Holders in all areas must make any significant electronic resources or datasets created as a result of research funded by the Council available in an accessible and appropriate depository for at least three years after the end of their grant. The choice of depository should be appropriate to the nature of the project and accessible to the targeted audiences for the material produced. http://www.ahrc.ac.uk/FundingOpportunities/Documents/Research%20Funding%20Guide.pdf Funders policies are an approach for opening up data – but humanities produce much data outside of the regular project life cycle.
1st Opportunity: Public Understanding http://www.queenvictoriasjournals.org/home.do Humanities research data are often easier understood by the public than science data. The “Impact Regime” may even be an advantage for the humanities.
2nd Opportunity: Cultural Heritage http://www.europeana.eu/portal/ They are more likely to be accessed and preserved than research data in other subject areas.
3rd Opportunity: Infrastructure National Library of China The requirements of infrastructure for many humanities research data resemble those of digital libraries. No new research facilities have to be built.
4th Opportunity: New Metrics • http://newsinfo.iu.edu/pub/libs/images/usr/9584_h.jpg It is likely that humanities research data have an web impact advantage. High societal interest could result in higher web-o-metric and usage statistics ratings.
Another mindset? …to see text & images as humanities research data. ~ …to see the humanities as data intensive. ~ …to see a web impact advantage for the humanities. ~ …to see libraries as humanities research facilities.
Recommendations Exploit the good accessibility of humanities research themes through newspapers, exhibitions, crowdsourcing and citizen science. ~ Make as many research outputs web accessible as possible. ~ Invest in and support new metrics such as usage statistics and web-impact. ~ Strengthen partnership between humanities and other disciplines and libraries.
Suggestion Rate your data!