350 likes | 498 Views
digiTAAL Some exciting examples Ineke Schuurman coordinator CLARIN-Vlaanderen. Digital Humanities. Language as object of research Language as means for research Modern languages Old languages Written, audio, video (collections of) documents. Treebanks.
E N D
digiTAALSome exciting examplesIneke Schuurmancoordinator CLARIN-Vlaanderen
Digital Humanities • Language as object of research • Language as means for research • Modern languages • Old languages • Written, audio, video (collections of) documents
Treebanks Available for most ‘modern’ languages But also possible for ‘dead’ languages like Latin, Ancient Greek http://nlp.perseus.tufts.edu/syntax/treebank/getinvolved.html Index Thomisticus Treebank, Milano http://itreebank.marginalia.it/ Full query language needed 3
More treebanks • Medieval Portuguese treebank • Under construction • In the near future: INPOLDER (CLARIN NL) A parser, not yet a corpus, BUT: through web interface raw older Dutch text can be entered, and parsed text (syntactically analysed) will be returned • Uncorrected, but manual correction is possible
Visualization Gabmap: doing dialect analysis on the web ADEPT-project (CLARIN-NL) Dialects (examples Netherlands/Flanders + USA) www.gabmap.nl, including tutorial, manual, video, FAQ, … 5
Pronunciation distance Gabmap: doing dialect analysis on the web 6
Dendrogram Gabmap: doing dialect analysis on the web 7
Audio CLARIN pilot (NL/FL) TTNWW, audio part TAAL2SPRAAK (CLARIN-Vlaanderen) Audio as a means to enlarge accessibility of larger collections of data (tapes) Transcription, even if not 100% correct, is very helpful in finding what you are looking for, especially if synchronized with time (useful for psychology, sociology, history)
Audio and older texts • Digitization of old texts still problematic (cf DigiHIST) Experiment: Read medieval text aloud and have it automatically transcribed (not trained, modern language model used)
Audio Leuvense Schepenbank • http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris.hardsubs.mp4 • http://www.ccl.kuleuven.be/CLARIN/SAL8130_0093_inge_moris_4gr.pdf Raw material !!
Written part TTNWW • Relate documents, make texts more accessible by making explicit data that are not expressed as such Paris formulated objections, London/John didn’t What is a name, what kind of name is it? • Analysis of names in fiction • Sagalassos project (archaeology): temporal and geospatial analysis web service, end of 2012
Some more examples • When is ‘now’? And where?
Stylometry Stylene (CLARIN-Vlaanderen) • UAntwerpen/Univ.College Gent • Is text as a whole written by same person? • Show development in style of a specific author • Is a text clear? Is it really understandable by , say, children age 10-12? Web service (autumn 2012)
‘stylometry’ as means • Is thesis X written by student or by ‘Wikipedia’ • Reliability • Can text X be written by a 10 year old girl paedophily
Reusability of data • For same kind of research • For completely other kind of research Both should be encouraged • time and money To be taken into account: IPR !
Veterans project • Interviews veterans Dutch military actions (1940-2010) • 1000 interviews (2.5 h), semi-structured Original: social and military historians • Who else can use this archive ? • First: reluctance
Veterans 2 People from divers disciplines invited to write paper: theology, psychology, discourse analysis, anthropology, sociology,..) Turned out to be a very valuable corpus! Digital Humanities aspect: several tools were made available to facilitate research in different disciplines, tools to give access to spoken content
“Circulation of Knowledge” “Geleerdenbrievenproject” (Letters of scientists) 17th century: Grotius (Hugo de Groot), Constantijn Huygens, Christiaan Huygens, Descartes, … 20.000 letters, mainly Dutch, French, Latin Intended for “history of science”, of course also relevant for other disciplines
Polish example: Sejm • Polish parliament, 1918 – now • Texts, records, video Goal: all kinds of linguistic research • But of course: wealth of information for other disciplines as well
Conclusions • Several ‘easy-to-use’ research possibilities are (or will soon be) available • Others are still more complex, but do offer possibilities for new kinds of projects (or easier ways of doing research) • Lots of material could be used by third parties as well: do not keep stuff “in your drawer” • Students and (young) researchers should be made aware of new possibilities