180 likes | 288 Views
Some Thoughts on HPC in Natural Language Engineering. Steven Bird University of Melbourne & University of Pennsylvania. Sponsorship. Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003.
E N D
Some Thoughts on HPC inNatural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania
Sponsorship Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003
NLE Application Areas • Spoken dialogue systems • Cross-language information retrieval • Word-sense disambiguation • Multi-document summarisation • Natural language database interfaces • Information Extraction • Information Retrieval • Authoring Tools • Language Analysis • Language Understanding • Knowledge Representation • Knowledge Discovery • Spoken Language Input • Written Language Input • Natural Language Generation • Spoken Output • Multilinguality • Multimodality • Discourse and Dialogue
Some NLE Applications in detail • Information extraction from broadcast news • Tokenization, alignment, entity detection, coreference resolution, semantic mapping • Spoken language dialogue systems (SLDS) • Speech recognition, parsing, user modelling, discourse management, generation, synthesis • Language analysis • Interlinear text annotation, lexicon development, morphosyntactic grammar development
Meta Activities • Discovery • What tools work with data in format X? • What lexical resources exist for language Y? • Reuse • Diverse implementation frameworks • Component integration, wrapping, etc • Training and evaluation • Parametric and parallel processing • Comparing systems running on the same data • Gold standard vs theory comparison • Analyzing interaction logs
Learn about NLE • This department hosts a mirror of the ACL digital anthology • 50k pages, 40 years • http://www.cs.mu.oz.au/acl/
Observations • Common components, different arrangements • Multiple components for doing the same task • Most NLE components convert between information types • Parser: from strings to trees • ASR: from speech to text • Summariser: from text to selected text • But: • Many processes benefit from other information sources (e.g. exploiting intonation in input) • Input and output can be aligned • Solution: multilayer annotations
Annotation Graphs • Labelled digraphs with timestamped nodes
Annotation Graphs: complex example • AGTK: Annotation Graph Toolkit • library, applications • agtk.sourceforge.net
NLE and Grids • NLE Applications • typically constructed out of numerous components • each component responsible for a specialised task • executed against large data sets • To use grids in NLE: • subscribe to a model which allows automated discovery of data and components • flexible design of applications, coordination of execution, storage of results • Ideally: • view grid as a commodity, hidden from application developers
Architectural Components • Data • Language resources for analysis • E.g. Switchboard, 2400 annotated telephone conversations (26 CDs) • Software Components • minimal individual functional units • e.g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping • common interface specification • Metadata Repositories • Dublin Core Application Profile for NLE resources • Application • data + components + processing instructions • declarative specification in XML • Grid Service • computational and storage resources for application execution
Conclusion • Natural Language Engineering • interesting test case for grid services • many mature component technologies • applications that are both data and processor intensive • applications for building the multilingual information society of the future...