100 likes | 226 Views
CHLT Integration. Integration in two directions. Interoperability with indexing structures of Perseus Digital Library Integration of parsers into indexing module of search and visualization tool. Integration with Structure of Perseus Digital Library.
E N D
Integration in two directions • Interoperability with indexing structures of Perseus Digital Library • Integration of parsers into indexing module of search and visualization tool
Integration with Structure of Perseus Digital Library • Perseus text display system transforms XML and legacy SGML files tagged according to an arbitrary DTD and creates a consistent set of core data files that can be read by any application • Sentences • Chunks • Lemmatized • Inflected • Catalog of works (PTEXT DB) • Morphological Databases • Short Definitions
File Locations • The surrogate files are written to a location that is associated with the unique ID assigned to the document in the PDL. • Each chunk or sentence also has a unique identifier • These two pieces of information can be used: • To generate URLs to access full text in DL • To generate human readable citations of the sentences according to scholarly conventions
WP2 Integration: Word Profile Tool • Word Profile tool reads lemmatized files to acquire a complete list of words in IGL corpus • All frequency counts, display sentences, human readable citations, and links to full text are based on surrogate files generated by PDL.
WP2 Integration: Multi-Lingual IR Tool • Author and language selection routines in MLIR tool is dynamically generated from PDL metadata catalog • Database of translation equivalents is created directly from SGML/XML and saved as a core data file that is available to other applications in the system • Translation Equivalence Program works with any TEI conformant dictionary. Dictionary selection screen updates dynamically. • Translated query is handed off to current PDL search engine and the visualization tool based on documented APIs
WP4 Integration: Old Norse Text and Parser • Middleware translates Old Norse Parser output to format used by PDL • ISO Language tags in texts tell system to use Old Norse morphology and link to Old Norse lexicon • PDL short definition program automatically extracts information from Zoega
WP4 & 6: Corpus Integration • TEI makes corpus integration easy • Old Norse texts and lexica and Neo-Latin texts are tagged according to TEI standards • Documentation of tagging conventions.
Parser Integration with WP1 • Similar middleware can link LemLat to PDL • WP1 Visualization Tool also includes a parsing/stemming step • This program is designed generally to work with many systems, not simply those created by PDL • Source code for LemLat and Old Norse so that search/visualization tool can be used to search Old Norse and Latin texts that are not part of PDL
Next Steps: • Implementation of parser integration with WP1 • Seamless integration of MLIR tool and production deployment • Improved documentation of tags required for OAI linking