160 likes | 188 Views
A demonstration and overview of the ongoing progress in the development of the LAMP Lab Natural Language Machine Translation (Chin-MT) project at the University of Maryland. Includes a technical presentation and discussion on future directions.
E N D
In Progress Review LAMP Lab Chin-MT project University of Maryland February 18, 1999
I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT a. Demonstration and overview: 9:30-9:45 Introduction to project, B. Onyshkevych 9:45-9:55 Rationale and Overview of Progress in Development of System Components, A. Weinberg 9:55-10:00 Overview of Demonstration, P. Resnik and W. Shen 10:00-10:30 Demonstration and Questions, P. Resnik and W. Shen
I: NSA - FEBRUARY IPR FOR LAMP LAB NATURAL LANGUAGE - MT EFFORT Cont’d b. Technical Presentation/Future Directions 10:45- Laboratory Management Issues 20 Min. Parsing - Construction Covered to Date New Directions, A. Weinberg, P. Resnik 20 Min. Lexicon - Scalability of Current Components, Creation of Grids, Automatic Acquisition, Mining, B. Dorr 20 Min. Generation- Discussion of Current Algorithm Future Directions, D. Traum
THE LAMP LABORATORY - MT PROJECT Faculty: Dr. Bonnie Dorr, CS, UMIACS Dr. Philip Resnik, Linguistics, UMIACS Dr. Amy Weinberg, Linguistics, UMIACS PostdoctoralDr. Gina Levow, UMIACS* Researchers: Dr. Mari Olsen, UMIACS Dr. David Traum, UMIACS Graduate Joseph Garman, Linguistics Scott Thomas, CS Students: Nazer Habash, CS* Jin Tong, CS Wade Shen, CS
THE LAMP LABORATORY - MT PROJECT Cond’t NSA Visiting Ron Dolan, Library of Congress Scholars: John Kovarik, DoD MaryEllen Okurowski, DoD Visiting Scholars: Dekang Lin, 01/99-08/99
OUR GOAL Automatically created high quality, broad coverage machine translation. Example of Word to Word: <Ask David/Phil to provide> 1. Example where generation output: - perfect - slightly degraded - generation degraded by CLCS - gloss ok
OUR GOAL Example of CLCS output: Example of generated string:
WORK ON CHIN - MT Work on Chin - MT began -- Oct. 1997 1st Phase: Development of Small Scale End to End System on representative (159 sentence) corpus of Chinese newspaper (Tsin hua) articles.
WORK ON CHIN - MT Cont’d Development of Broad Scale Static Resources: Lexicon: Optilex 250 entries augmented with appropriate argument structure (thematic role) grids and Lexical conceptual structures. <Bonnie: current coverage of English lexicons - Chinese lexicons>
WORK ON CHIN - MT Cont’d Parser: small scale; 217 grammar rules Multipath REAP Generation: Add <David Traum>
WORK ON CHIN - MT Cont’d Integration with Currently Existing or Simultaneously Built Resources from Other Institutions - NMSU/Mikrokosmos interface - ISI/Nitrogen
SYSTEM COMPONENTS AND COVERAGE Output: English translated string Shared ONI & ISI (Nitrogen) Output: Composed LCS (CLSCS) transformed to AMR (<David - Abstract meaning represention>) Output: argument structure augmented syntactic string Output 1 parsed corpus with appropriate argument structure features for Lexical - conceptual structure (LCS) composition Output: segmented string with complex names identified as single smts. Input: unsegmented Chinese string { Syntaotic recoding and Realization: translate kcs based features to Nitrogen features: Feb: algorithm implemented NMSU Semantic ontolgoies F(unctional) structure transducer -input to NMSU semantics (90 f-structures to NMSO for evaluation - Dec. 1998) English lexical selection Feb: algorithm implemented <David - coverage> Lexical Conceptual Structure (LCS) composition June: inefficiency composed LCS for--------sentences Feb: -------handled by LCS composition Parser June: 404 fragments - 352 legal parse 269 correct parse Feb: 100 out of 150 full sentences with correct parse Sementor/nometagger June: hand segmentation hand tagging 150 sentences
Slide 4: • Intermediate Milestones/Next Steps: • Full end to end integration with NMSO: • a. f-structure to TMR integration. • b. f-structure to AMR-based generation • Evaluation of LCS as fail soft mechanism. Comparison of translations produced by LCS/Nitrogen. • Improvement of Coverage/Move towards Broad Scale Coverage of all components: • Parsing: - design/experimentation with • - extension to Minper (in cooperation with Dekong (in Vol Monitoba) • Lexicon • - Broad coverage for adjectives and nouns, the latter of which will be automatically subdivided into simple and event-based nominals. Corresponding English refinements. Finish Broad coverage for prepositions. • - Finish English verb grid refinement and Chinese grid generation and checking. Speed up by dividing remaining verbs into Levin classes. • - Port verb grids and refine composition algorithm for event-based nominals, include features from WordNet and will be assigned atomic LCSs. Event-based noun entries will be automatically associated with LCS’s from their verbal counterparts (abduction derived from abduct) for event based nounts in Optilex. • - Broad coverage and representation refinement for functional elements (numbers, numerals, classifiers). These LCSed by hand in the current iteration. • - Port verb-based LCS entries into the noun lexicon for English and Chinese. • Discourse • - Sept 1999: • - Additional testing and improvement of LCS path. Debugging and testing more as the clcses become available. • - Additional of NMSU path -. Then converting nmsu f-structures to English. The plan for that is to convert either to nitrogen lattices, or perhaps amr’s, depending on what these f-structures actually look like.
Laboratory management Problems: 1. version control too many copies of software- code runs on one copy not the other. need to roll back to previous version of some piece of software but its not around unless someone has saved it. Solution: Installation of Concurrent Version System(CVS) check -in/check-out software static resources and running programs checked in. They become the “official version”. Automatic consistency checking at “check-in time”. If differences from previous version, need permission form previous check in to check in new version or merge.
State of implementation: Chinese/English lexicons under CVS - next LCS programs: convert to shorthand/longhand - then, parser, f-structure, generation programs Complete by June
Problem 2: Operating and file system problems: program works on machine A, not machine B. All machines switched to Solaris 2.6 and installation of AFS ( new networked file system manager) AFS provides better management for large programs: shared file speedup, local caching, local control of protections, permissions. Improved environment will allow us to discourage work from home. Lower bandwidth for improved communication between members of the team.