1 / 17

HLT Specifications

HLT Specifications. Translation Components Nespole! Trento Meeting May 26, 2000. Nespole! HLT Objectives. State of the art (C-STAR II): Broad but limited domain (Travel Planning) Spontaneous spoken language (disfluencies, incomplete and non grammatical sentences)

lucine
Download Presentation

HLT Specifications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HLT Specifications Translation Components Nespole! Trento Meeting May 26, 2000

  2. Nespole! HLT Objectives • State of the art (C-STAR II): • Broad but limited domain (Travel Planning) • Spontaneous spoken language (disfluencies, incomplete and non grammatical sentences) • Task oriented dialogue (non-descriptive) • incomplete coverage (semi-scripted demo)

  3. Main Analysis and Generation Approaches • Robust parsing using domain specific semantic grammars and simple mappers (CMU/UKA) • Phrase analysis grammars and IF classification trees/mappers (IRST) • Syntactic and semantic analysis with mappers to and from IF (CLIPS) • Direct translation using a multi-engine architecture (EBMT, glossaries,dictionaries) (CMU/UKA would like to investigate)

  4. Nespole! HLT Objectives • Scalability- expansion of existing domain: • expanding coverage of IF to broader Travel Domain as required for APT showcase • development of analysis and generation approaches that support easy expansion • new broad and general IF representation and • appropriate analysis and generation approaches

  5. Nespole! HLT Objectives • Portability- easy expansion into new domains: • extending existing IF with Domain Actions for other domains (Help Desk for 2nd showcase) • new broad IF representation • new analysis and generation approaches that are appropriate for the new broad IF

  6. Nespole! HLT Objectives • Robustness - ability to handle more corrupt input and graceful degradation of performance: • multiple alternative analysis/translation approaches • better identification of out-of-domain utterances and confidence measures

  7. CMU/UKA Planned Approaches • New analysis approach for domain-specific task-oriented language combines rule-based and statistical/trainable methods • New analysis engine for new style IF, using chunk parser followed by new combiner and mapper • Possibly addition of MEMT direct translation approach for coverage and robustness • Effective combination and disambiguation of all above approaches • New generation from IF using GenKit

  8. New Approach: SALT SALT - Statistical Analyzer for Lang. Translation • Combines ML trainable and rule-based analysis methods for robustness and portability • Rule-based parsing restricted to well-defined set of argument-level phrases and fragments • Trainable classifiers (NN, Decision Trees, etc.) used to derive the DA (speech-act and concepts) from the sequence of argument concepts. • Phrase-level grammars are more robust and portable to new domains

  9. Alternative Approach: MEMT Multi Engine Machine Translation • Translates directly into target language (no IF) • Based on Pangloss/Diplomat translation system developed at CMU • Uses a combination of EBMT, phrase glossaries and a bilingual dictionary • English/German system operational • Good fall-back for uncovered utterances

  10. HLT Server Components • Each HLT Server consists of an Analysis Chain and a Generation Chain • Analysis Chain: • Speech Recognition + analysis into IF • Generation Chain: • Generation from IF + Speech Synthesis • Each site free to develop its own analysis and generation technology • Communication between modules is primarily via IF, using the ComSwitch server and protocol

  11. Main Constraints and Requirements • Maintain site technology freedom and distributed HLT development as much as possible • Leverage off existing C-STAR technology • start with existing analysis and generation engines • use (extend) C-STAR CommSwitch protocol • New server architecture allows: • constant availability for testing and development • plug-and-play of new modules • separation of external API issues from required HLT communication

  12. Data Collection for Translation Component Development • Analysis of extended domain for first showcase • CLIPS and APT data (also translated into English) • Preparations for data collection with APT • real dialogues between users and APT agents • monolingual dialogues • schedule? Amount of data be collected? • Annotation of collected data?

  13. Points for Discussion • Definition of the Scenario for SC-1 • Data Collection with APT • Overview of Approaches • HLT Servers

  14. Definition of Scenario • Analysis of APT email data (Paolo) • 9 main categories • developed ~20 specific scenarios • APT will look at scenarios and prioritize them, and prioritize web pages (for translation to French) within 10 days • We will use existing web pages for APT (in I,G,E), and some translated into French • Goal is to focus on up to 10 scenarios

  15. Data Collection with APT • Logistics: • dedicated line (to be determined) • recording done centrally at the APT side by IRST with data provided via the web site • Time-line: • Start time to be determined (end June?) • 50 dialogues per language, 4 dialogues per hour • data collection by end of August • transcription by end of September • Annotation with IF by end of October

  16. HLT Servers • Modify existing C-STAR II components into a server module • initial server version ready by ~end of June • Comm Server between HLT modules will be updated by CMU and sent to Nespole web site

  17. Overview of Approaches • IRST: emphasis on statistical approaches to analysis and classification into IF; generation using a rule-based system • CLIPS: new IF-to-French generator; analysis approach will initially stay similar

More Related