10 likes | 125 Views
Recognition/Analysis. Thai / English medical. SR+Parsing (CFG-Grammar). Source Lang Speech. SR+LM. Stat. Analysis SOUP. IF. English / Thai medical. Direct SMT. Symbolic Generation GenKit. Target Lang Speech. Target Language Text. Synthesis Cepstral. Statistical Generation
E N D
Recognition/Analysis Thai / English medical SR+Parsing (CFG-Grammar) Source Lang Speech SR+LM Stat. Analysis SOUP IF English / Thai medical Direct SMT Symbolic Generation GenKit Target Lang Speech Target Language Text Synthesis Cepstral Statistical Generation IF2NL A Thai Speech Translation System for Medical Dialogs Tanja Schultz, Dorcas Alexander, Alan W Black, Kay Peterson, Sinaporn Suebvisai, Alex Waibel System Architecture Speech Recognition • Rapid Development in new languages • Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test • Romanization of Thai script in order to: • allows non-Thai researchers to work with the Roman representation like in the grammar development • romanized output basically provides the pronunciation > easier for speech synthesis component • Current dictionary covers the given 6-hours database = 734 words • Rapid bootstrapping of acoustic models using a 7-lingual GlobalPhone model set (Ch, Cr, Fr, Ge, Ja, Sp, Tu) • Results on ASR indicate that rapid bootstrapping can be done successfully for limited domain (see table) • Word accuracy [%] in Thai language on the evaluation set: • CI-AM 83.63% CD-AM (500) 84.44% CD-AM (1000) 82.71% • Tcl/Tk based Communication Server • Runs on Windows and Linux platforms • Integrates several languages: Thai, English, Spa, Ch, ... • Integrates different speech recognition approaches • Decoding along n-grams versus Context Free Grammars • Integrates different translation approaches • IF-based Translation versus statistical MT • Integrates two natural language generations from IF • knowledge-based generation with the pseudo-unification • statistical generation • Allows transmission of IF across devices for (wireless) multi-party translation (see demo: Laptop PDA ) • Interface: • Hypothesis • Thai+ Roman script • Parse tree (CFG) • Translation • IF representation Translation Speech Synthesis • Interlingua based Machine Translation component - Interchange Format (IF) • abstracts from variation in syntax across languages • allows monolingual development for analysis and generation • provides paraphrase back into source language • can be easily extended to new languages due to STAR structure • Some extensions due to Thai characteristics: • The use of a term to indicate the gender of the person: • Thai: zookhee kha1 - Eng: okay (ending) • s[acknowledge] (zookhee *[speaker=]) • An affirmation that means more than simply "yes." • Thai: saap khrap - Eng: know (ending) • s[affirm+knowledge](saap *[speaker=]) • Verb separation of terms for feasibility and other modalities • First Thai voice built in the Festival Speech Synthesis System • Limited domain targeting the Hotel Reservation domain • 235 sentence that covered the main aspects of immediate interest • Recorded, auto-labeled, and built a synthetic voice using FestVox tools • Converted to small footprint portable version using Cepstral's Theta engine • Rapid synthesis development in new languages: • Phoneme set shared with Speech Recognition • Lexicon of 522 words vocabulary constructed by hand • Statistically trained letter to sound rules to bootstrap the required word coverage • Unit selection concatenative synthesis • Phones tagged with syllable and tone information for more fluent results