310 likes | 492 Views
From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project. Maria Pantelia July 2006. Thesaurus Linguae Graecae® (TLG ®) Latin for ‘Treasury of the Greek Language’ 3450 Berkeley Place UC Irvine. Special Research Project
E N D
From Papyrus to Digital: UCI’s Thesaurus Linguae Graecae Project Maria Pantelia July 2006
Thesaurus Linguae Graecae®(TLG ®) Latin for ‘Treasury of the GreekLanguage’3450 Berkeley PlaceUC Irvine • Special Research Project • Comprehensive digital library of Greek literature from antiquity to the present era • Preservation and access
New Testament Aristotle Homer Aeschylus, Oresteia
Audience Researchers in Classics, Byzantine Studies, Ancient History and Philosophy, Lexicography, Religious Studies, Linguists, etc.)
Classics and Technology • Fragmentary texts (papyri, inscriptions) • Dating of materials • Reconstruction of antiquity (virtual)
Classics Databases • Perseus Project (Tufts University) Texts, large collection of images, lexicographical tools • Packard Humanities Institute (Inscriptions, documentary papyri, Latin texts) • Database of Classical Bibliography (L'Année philologique ) • Classical Atlas (Ancient World Mapping Center)
History of the Project • UCI (1972) Dr. Marianne McDonald (UCSD) Collection of texts in digital form Mirror image of printed critical editions • International collaboration
The Ibycus system • David Packard (PHI) • Ibycus Computer • Beta code • Magnetic tapes (1976) • CD-ROM (1985)
From Ibycus to…the modern era Stephanus Ibycus
Current status of the collection • Homer to A.D. 400 (complete) • Byzantine period A.D. 4-15 (in progress) • Expansion to medieval and modern works to follow • 5-6 million new words added annually • Contents: • 3,800 authors • 15,000 works • 95-million words • 1.365 million distinct forms
Use and advantages • Preservation • Access to rare texts and editions • Portability • Access from any place • Browsing (Full-text) • Ability to search the corpus for particular words or phrases • Research and pedagogy
Creating a Digital Collection • Digitization • Data Management • Dissemination
Dissemination: TLG CD ROMs • 1985 TLG A (27-million) • 1988 TLG C (42-million) • 1992 TLG D (57-million) • 2000 TLG E (76-million) 2001 Online TLG
The Online TLG • TLG developed Search Engine • Quarterly updates • Bibliographies and Demo version open to the public • Full-Text Browsing and Searching • Search Full-Corpus or selection of Authors • Fonts (input and display Greek characters) Unicode Project (http://repositories.cdlib.org/tlg/unicode/)
Distribution 58 countries TLG E (CD ROM) 1,100 institutions 1,500 individuals Online TLG 250 institutions 50,000 users 5 million hits in 2005
Canon of Greek authors and works15,000 entries (including information such as dates, genre, origin, etc.)
Digitization • Selection of text editions 2. Text markup (beta code) • Data entry • Correction in-house Importance of ‘Verification and Correction’ (…where Google has a long way to go…)
Digitization • The Critical Edition • Homer, Odyssey 16.180-193
Challenges • Dealing with a large corpus developed over a period of 30+ years Editorial choices and markup Corpus retrofitting Accuracy Non-Roman script • Conversion to standard encoding (Unicode--TEI/XML)
Lexical Database Used for fast data retrieval Goal: Full corpus lemmatization • Morpheus (Perseus) • 1,365,000 unique forms (approx. 250,000 lemmata) • Morphological recognition for a highly-inflected language