170 likes | 285 Views
eTRACES-subproject Text re-use in Literature. Christian Kötteritzsch / Gerhard Lauer / Annette Geßner. Team eTRACES Göttingen. Christian Kötteritzsch Gerhard Lauer Annette Geßner GCDH & Uni Leipzig GCDH & Uni Göttingen GCDH
E N D
eTRACES-subprojectText re-use in Literature Christian Kötteritzsch / Gerhard Lauer / Annette Geßner
Team eTRACES Göttingen • Christian Kötteritzsch Gerhard Lauer Annette Geßner • GCDH & Uni Leipzig GCDH & Uni Göttingen GCDH • ASV German studies Classics
Central question Analysis of text re-use in German literature - to understand better how literature make use of other texts - to understand better specific re-use of given texts in a large corpus of literature - to understand better specific types of intertextuality - to facilitate the identification of (indirect) quotations for editorial purposes
Corpus zeno.org-corpus (http://www.textgrid.de/en/digitale-bibliothek.html) includes fictional texts from Luther to Kafka Preprocessing of the xml files through a toolchain to extract and format xml-based corporathanks to Frederik Baumgardt (3 Mann-Monate)
Text Re-use • A firstidea: • A longtermanalysisoftheemergingautonomousaesthetic in German literature, • especiallynovels
Text Re-use but - text mining depends on genre and text re-use styles - to look for text re-use only within a German corpus would miss the many foreign quotations - looking for a simpler starting point: one book in thousands
Objectives • Test case: re-use of the Bible in German literature • - find biblical quotations and allusions • - offer a web-based text re-use tool • - online working environment to create a digital edition
Re-use Style • Identify types of biblical re-use by hand • Design a table of quotation styles • Categorize types of "Re-use Style" • Schiller's "Die Räuber" (77 entries) • Fontane's "Effi Briest" (11 entries) • ...
Natural language processing + Analysis: Tracer-software (ASV, Leipzig) + Server: Virtual machine (Gesellschaft für Wissenschaftliche Datenverarbeitung Göttingen [GWDG]) + Frontend: Google Web Toolkit Framework
Possible extensions + more texts (and Bibles) + more own texts + more features ? more crowd editing or more personal edition ? more distant reading: text statistics --> Any suggestions?
Next milestones May and June 2012: - collect different bible versions (Zürcher, Allioli, Keppler etc., revisions) - integrate into the text re-use tool - clarify server issues with GWDG - determine re-use style by analysing more genres, historical and other specifice re-use styles
Next milestones Summer 2012: - first run on folder 'Romane' of zeno.org-corpus - evaluation and rerun Until end of 2012: - develop a statistic sub-tool - version 1.0 of front end online
Next milestones Beginning of 2013: - intern evaluation and optimization April to July 2013: - teach a seminar on text re-use (and let students evaluate tool) - invite editors for test cases
Next milestones Till end of 2013: - optimizing and bugfixing, finish tool - enlarge the corpus of literature and of Bibles - do statistical research cases - community workshop with editors and text analysts Till end of project 2014: - write final report