260 likes | 375 Views
Parsing into the Interlingua Using Phrase-Level Grammars and Trainable Classifiers. Alon Lavie, Chad Langley, Lori Levin, Dorcas Wallace,Donna Gates and Kay Peterson AFRL Visit, March 28, 2003. Our Parsing and Analysis Approach.
E N D
Parsing into the Interlingua Using Phrase-Level Grammars and Trainable Classifiers Alon Lavie, Chad Langley, Lori Levin, Dorcas Wallace,Donna Gates and Kay Peterson AFRL Visit, March 28, 2003
Our Parsing and Analysis Approach • Goal: A portable and robust analyzer for task-oriented human-to-human speech, parsing utterances into interlingua representations • Our earlier systems used full semantic grammars to parse complete DAs • Useful for parsing spoken language in restricted domains • Difficult to port to new domains • Current focus is on improving portability to new domains (and new languages) • Approach: Continue to use semantic grammars to parse domain-independent phrase-level arguments and train classifiers to identify DAs AFRL Visit
Interchange Format • Interchange Format (IF) is a shallow semantic interlingua for task-oriented domains • Utterances represented as sequences of semantic dialog units (SDUs) • IF representation consists of four parts • Speaker • Speech Act • Concepts • Arguments speaker : speech act +concept* +arguments* } Domain Action AFRL Visit
Hybrid Analysis Approach Use a combination of grammar-based phrase-level parsing and machine learning to produce interlingua (IF) representations AFRL Visit
Hybrid Analysis Approach Hello. I would like to take a vacation in Val di Fiemme. c:greeting (greeting=hello) c:give-information+disposition+trip (disposition=(who=i, desire), visit-spec=(identifiability=no, vacation), location=(place-name=val_di_fiemme)) AFRL Visit
Argument Parsing • Parse utterances using phrase-level grammars • SOUP Parser (Gavaldà, 2000): Stochastic, chart-based, top-down robust parser designed for real-time analysis of spoken language • Separate grammars based on the type of phrases that the grammar is intended to cover AFRL Visit
Domain Action Classification • Identify the DA for each SDU using trainable classifiers • Two TiMBL (k-NN) classifiers • Speech act • Concept sequence • Binary features indicate presence or absence of arguments and pseudo-arguments AFRL Visit
Using the IF Specification • Use knowledge of the IF specification during DA classification • Ensure that only legal DAs are produced • Guarantee that the DA and arguments combine to form a valid IF representation • Strategy: Find the best DA that licenses the most arguments • Trust parser to reliably label arguments • Retaining detailed argument information is important for translation AFRL Visit
Evaluation: Classification Accuracy • 20-fold cross-validation using the NESPOLE! travel domain database The database: Most Frequent Class: AFRL Visit
Evaluation: Classification Accuracy Classification Performance Accuracy AFRL Visit
Evaluation:End-to-End Translation • English-to-English and English-to-Italian • Training set: ~8000 SDUs from NESPOLE! • Test set: 2 dialogs, only client utterances • Uses IF specification fallback strategy • Three graders, bilingual English/Italian speakers • Each SDU graded as perfect, ok, bad, very bad • Acceptable translation = perfect+ok • Majority scores AFRL Visit
Evaluation:End-to-End Translation AFRL Visit
Evaluation:Data Ablation Experiment AFRL Visit
Domain Portability • Experimented with porting to a medical assistance domain in NESPOLE! • Initial medical domain system up and running, with reasonable coverage of flu-like symptoms and chest pain • Porting the interlingua, grammars and modules for English, German and Italian required about 6 person months in total • Interlingua development: ~180 hours • Interlingua annotation: ~200 hours • Analysis grammars, training: ~250 hours • Generation development: ~250 hours AFRL Visit
New Development Tools AFRL Visit
Questions? AFRL Visit
Grammars • Argument grammar • Identifies arguments defined in the IF s[arg:activity-spec=] (*[object-ref=any] *[modifier=good] [biking]) • Covers "any good biking", "any biking", "good biking", "biking", plus synonyms for all 3 words • Pseudo-argument grammar • Groups common phrases with similar meanings into classes s[=arrival=] (*is *usually arriving) • Covers "arriving", "is arriving", "usually arriving", "is usually arriving", plus synonyms AFRL Visit
Grammars • Cross-domain grammar • Identifies simple domain-independent DAs s[greeting] ([greeting=first_meeting] *[greet:to-whom=]) • Covers "nice to meet you", "nice to meet you donna", "nice to meet you sir", plus synonyms • Shared grammar • Contains low-level rules accessible by all other grammars AFRL Visit
Segmentation • Identify SDU boundaries between argument parse trees • Insert a boundary if either parse tree is from cross-domain grammar • Otherwise, use a simple statistical model AFRL Visit
Using the IF Specification • Check if the best speech act and concept sequence form a legal IF • If not, test alternative combinations of speech acts and concept sequences from ranked set of possibilities • Select the best combination that licenses the most arguments • Drop any arguments not licensed by the best DA AFRL Visit
Grammar Development and Classifier Training • Four steps • Write argument grammars • Parse training data • Obtain segmentation counts • Train DA classifiers • Steps 2-4 are automated to simplify testing new grammars • Translation servers include a development mode for testing new grammars AFRL Visit
Evaluation:IF Specification Fallback • 182 SDUs required classification • 4% had illegal DAs • 29% had illegal IFs • Mean arguments per SDU: 1.47 AFRL Visit
Evaluation:Data Ablation Experiment • 16-fold cross validation setup • Test set size (# SDUs): 400 • Training set sizes (# SDUs): 500, 1000, 2000, 3000, 4000, 5000, 6009 (all data) • Data from previous C-STAR system • No use of IF specification AFRL Visit
Future Work • Alternative segmentation models, feature sets, and classification methods • Multiple argument parses • Evaluate portability and robustness • Collect dialogues in a new domain • Create argument and full DA grammars for a small development set of dialogues • Assess portability by comparing grammar development times and examining grammar reusability • Assess robustness by comparing performance on unseen data AFRL Visit
References • Cattoni, R., M. Federico, and A. Lavie. 2001. Robust Analysis of Spoken Input Combining Statistical and Knowledge-Based Information Sources. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Trento, Italy. • Daelemans, W., J. Zavrel, K. van der Sloot, and A. van den Bosch. 2000. TiMBL: Tilburg Memory Based Learner, version 3.0, Reference Guide. ILK Technical Report 00-01. http://ilk.kub.nl/~ilk/papers/ilk0001.ps.gz • Gavaldà, M. 2000. SOUP: A Parser for Real-World Spontaneous Speech. In Proceedings of the IWPT-2000, Trento, Italy. • Gotoh, Y. and S. Renals. Sentence Boundary Detection in Broadcast Speech Transcripts. 2000. In Proceedings on the International Speech Communication Association Workshop: Automatic Speech Recognition: Challenges for the New Millennium, Paris. • Lavie, A., F. Metze, F. Pianesi, et al. 2002. Enhancing the Usability and Performance of NESPOLE! – a Real-World Speech-to-Speech Translation System. In Proceedings of HLT-2002, San Diego, CA. AFRL Visit
References • Lavie, A., C. Langley, A. Waibel, et al. 2001. Architecture and Design Considerations in NESPOLE!: a Speech Translation System for E-commerce Applications. In Proceedings of HLT-2001, San Diego, CA. • Lavie, A., D. Gates, N. Coccaro, and L. Levin. 1997. Input Segmentation of Spontaneous Speech in JANUS: a Speech-to-speech Translation System. In Dialogue Processing in Spoken Language Systems: Revised Papers from ECAI-96 Workshop, E. Maier, M. Mast, and S. Luperfoy (eds.), LNCS series, Springer Verlag. • Lavie, A. 1996. GLR*: A Robust Grammar-Focused Parser for Spontaneously Spoken Language. PhD dissertation, Technical Report CMU-CS-96-126, Carnegie Mellon University, Pittsburgh, PA. • Munk, M. 1999. Shallow Statistical Parsing for Machine Translation. Diploma Thesis, Karlsruhe University. • Stevenson, M. and R. Gaizauskas. Experiments on Sentence Boundary Detection. 2000. In Proceedings of ANLP and NAACL-2000, Seattle. • Woszczyna, M., M. Broadhead, D. Gates, et al. 1998. A Modular Approach to Spoken Language Translation for Large Domains. In Proceedings of AMTA-98, Langhorne, PA. AFRL Visit