170 likes | 366 Views
HLT Development. NESPOLE! Pittsburgh Meeting December 4, 2000. Session Agenda. HLT Server demo Partner updates on HLT module development: SR, Analysis, Generation Status of HLT servers and architecture: functionality, coverage, Status of data collection, transcription and annotation
E N D
HLT Development NESPOLE! Pittsburgh Meeting December 4, 2000
Session Agenda • HLT Server demo • Partner updates on HLT module development: SR, Analysis, Generation • Status of HLT servers and architecture: functionality, coverage, • Status of data collection, transcription and annotation • Planning of D6: annotated data for 1st showcase • Development timelines
Timeline for Integration and HLT Development • Mar 1,01: Demonstration at EC • Feb 23,01: Complete full system tested • Feb 12,01: Start intensive tests between E/G/F clients and I agent at APT - initial user studies! • Jan 29,01: System integration complete, begin technical tests • Jan 15,01: Each site completes integration with Aethra mediator, starts tests of integration
D6: Annotated Data for SC-1 • Description of scenario development • Description of data collection procedures • Summary of data collected • Annotated data for the four languages? (at least samples) [Fabio - check with PO]
Discussion Issues for Tuesday • Prioritize scenarios, focus on ONE? • Functionality of mediator: audio transmission to both sides. • Status of mediator-HLT integration (timestamps etc.) • Lessons learned from data collection - Celine and Susi
Nespole! HLT Objectives • Scalability- expansion of existing domain: • expanding coverage of IF to broader Travel Domain as required for first showcase • development of analysis and generation approaches that support easy expansion • new broad and general IF representation and • appropriate analysis and generation approaches
Nespole! HLT Objectives • Portability- easy expansion into new domains: • extending existing IF with Domain Actions for other domains (Help Desk for 2nd showcase) • new broad IF representation • new analysis and generation approaches that are appropriate for the new broad IF
Nespole! HLT Objectives • Robustness - ability to handle more corrupt input and graceful degradation of performance: • multiple alternative analysis/translation approaches • better identification of out-of-domain utterances and confidence measures
HLT Server Components • Each HLT Server consists of an Analysis Chain and a Generation Chain • Analysis Chain: • Speech Recognition + analysis into IF • Generation Chain: • Generation from IF + Speech Synthesis • Each site free to develop its own analysis and generation technology • Communication between modules is primarily via IF, using the ComSwitch server and protocol
Main Constraints and Requirements • Maintain site technology freedom and distributed HLT development as much as possible • Leverage off existing C-STAR technology • start with existing analysis and generation engines • use (extend) C-STAR CommSwitch protocol • New server architecture allows: • constant availability for testing and development • plug-and-play of new modules • separation of external API issues from required HLT communication
CMU/UKA Approach • New analysis approach for domain-specific task-oriented language combines rule-based and statistical/trainable methods • New analysis engine for new style IF, using chunk parser followed by new combiner and mapper • Possibly addition of MEMT direct translation approach for coverage and robustness • Effective combination and disambiguation of all above approaches • New generation from IF using GenKit
New Approach: SALT SALT - Statistical Analyzer for Lang. Translation • Combines ML trainable and rule-based analysis methods for robustness and portability • Rule-based parsing restricted to well-defined set of argument-level phrases and fragments • Trainable classifiers (NN, Decision Trees, etc.) used to derive the DA (speech-act and concepts) from the sequence of argument concepts. • Phrase-level grammars are more robust and portable to new domains
Alternative Approach: MEMT Multi Engine Machine Translation • Translates directly into target language (no IF) • Based on Pangloss/Diplomat translation system developed at CMU • Uses a combination of EBMT, phrase glossaries and a bilingual dictionary • English/German system operational • Good fall-back for uncovered utterances
Data Collection for First Showcase • Data collection with APT agent: • real dialogues between users and APT agents • monolingual dialogues • 28+8 English dialogues collected in 4 sessions • 28 dialogues transcribed • none annotated with IF (yet) • Lessons and Comments: • realistic scenario • uneven dialogues: agent dominates conversation • problems with recording/collection setup
Data Transcription andAnnotation • May-00 Goals and Time-line: • 50 dialogues per language, 4 dialogues per hour • data collection by end of August • transcription by end of September • Annotation with IF by end of October • Revised schedule...
Points for Discussion • Definition of the Scenario for SC-1 • Timeline for data annotation • Timeline for HLT module development • Planning D6
Definition of Scenario (May-00) • Analysis of APT email data (Paolo) • 9 main categories • developed ~20 specific scenarios • APT will look at scenarios and prioritize them, and prioritize web pages (for translation to French) within 10 days • We will use existing web pages for APT (in I,G,E), and some translated into French • Goal is to focus on up to 10 scenarios