140 likes | 303 Views
CMD and TEI. CMDI interoperability workshop 2013-06-04 - Utrecht Matej Ďu r č o, ICLTT, Vienna. TEI at ICLTT. AAC – Austrian Academy Corpus diachronic corpus ~ 500 mil. tokens being converted into TEI C4 – distributed corpus of german of 20 th century Basel, Berlin, Bozen , Wien
E N D
CMD and TEI CMDI interoperabilityworkshop2013-06-04 - UtrechtMatej Ďurčo, ICLTT, Vienna
TEI at ICLTT • AAC – Austrian Academy Corpus • diachronic corpus ~ 500 mil. tokens • being converted into TEI • C4 – distributed corpus of german of 20th century • Basel, Berlin, Bozen, Wien • harmonized format (TEI/teiHeader) • Dict-Gate • TEI encoded multilingual lexicons (persian, arabic, german, english) • however described with LexicalResourceProfile • Abacus – Austrian Baroque Corpus • 3 (5) historical texts encoded in TEI • elaborate teiHeader
TEI (andfriends?) in CMD • overviewofcurrentlyexistingTEIish CMD-profiles
teiHeader(ICLTT) size = reuse in otherprofiles
teiHeader(DTA) size = countelements in instancedata
TEI andISOcat • a special DCS: TEi Header (2.1.0) • Windhouwer, 2012 • a datcatforeveryelementoftheteiHeader (135 datcats) • based on an ODD-file (ODD2DCIF.xsl and DCIF2ODD.xsl available) • owedto CLARIN-NL projectsusing TEI header • a enriched schema was generated = annotated with these new data categories (dcr:datcat-attribute) put in SCHEMAcat: http://lux13.mpi.nl/schemacat/schema/teiHeader • define relations between TEI and other data categories in RELcat(the relation registry)
Next Step(s) ? • create (oradaptexisting) teiHeaderprofile • as a unionoftheexistingprofiles ? • based on theenrichedschema • i.e. linkingtothenew TEI datacategories • define a relationset in RELcatbetween TEI andISOcat (anddublincore) datacategories
profile: data (LINDAT) dublincore + metashare
profile: data (LINDAT) resourceInfo-component
dublincore I • 2 profileswith dc-terms (55 datacategories) • 2 profileswith dc-elements (called „dc-terms“) asof 2013-01
dublincore II currently (2013-06)4 DCMI-terms profiles
dublincore III (almost) all datcatssharedby all
dublincore IV 1 profilehasextra component:DANS-DC-metadata example:language