200 likes | 490 Views
Mark Seligman, CEO. Speech-to-Speech Translation: A New Direction for the Speech Industry. SpeechTEK West February 21-23, 2007.
E N D
Mark Seligman, CEO Speech-to-Speech Translation: A New Direction for the Speech Industry SpeechTEK West February 21-23, 2007
Converser for Healthcareis the world’s first commercially available speech-to-speech translation system for wide-ranging conversations. (Input via handwriting, touchscreen, and keyboard is also enabled.) Converser for Healthcareis an affordable, reliable, portable translation system which can improve communication 24/7 between healthcare workers and patients with limited English proficiency.
Overview • Automatic Spoken Language Translation (SLT) • an age-old dream • Practical SLT systems are now coming into use • … but users must cooperate and compromise • History: three classes of SLT systems • categorized by degree of user cooperation and linguistic or topical coverage • Demo • Market • Commercial and research activity
Star Trek? Not! • The goal: speak as usual • freely shift topics • full range of vocabulary, idioms, structures • spontaneous language: fragments, false starts, hesitations • mumble • converse in noisy environments • ignore the translation program • For now: some cooperation, compromise
The scientific problem:component integration • Component technologies (SR, MT, TTS) • imperfect, hard to integrate • Each is usable, but combination may fall below usefulness threshold • error rates combine, compound
Phraselator by VoxTec Class One • Class One: voice-driven phrase book • linguistic coverage: narrow • topical coverage: narrow • cooperation required: low • Fixed expressions or templates only • “I’d like a bottle of [beer, wine, soda], please.” • “I’d like a bottle of [BEVERAGE], please.” • Advantages for user • no need to carry a book -- use telephone • selection of phrase by voice rather than finger • translation output pronounced by native • Technology • Speech recognition: IVR • MT: flat lookup, template or example-based • Engineering exercise: low risk
Other Class One • Sony TalkMan • Pending entries • Sharp • NEC • Future IVR systems?
Class Two • Class Two: robust speech translation in narrow domains • linguistic coverage: broad • topical coverage: narrow • cooperation required: medium • Examples • Uh, could I reserve a double room for next Tuesday, please? • I need to, um, I need a double room please. That’s for next Tuesday. • Hello, I’m calling about reserving a room. I’d be arriving next week on Tuesday. • Advantages • Lots of experience • Can optimize SR, MT: special grammars (patterns) • Interlingua possible for MT • Challenges • Robust parsing still imperfect, so MT input is dirty • Some user frustration inevitable, but balanced by freedom • Risk: medium
Class two: Worldwide Research • CMU/Univ Karlsruhe (USA/Germany) • ATR (Japan) • IRST (Italy) • ETRI (Korea) • GETA-CLIPS (France) • CAS-NLPR (China) • IBM (USA)
Class Three • Class three: highly interactive speech translation with broad linguistic and topical coverage • linguistic coverage: broad • topical coverage: broad • cooperation required: extensive • User achieves broad coverage by supervising • SR: need dictation for broad coverage • MT: need broad coverage, good quality • Must be modifiable to enable interactive correction
In the beginning … • French: Qu’est-ce que vous étudiez? • (What do you study?) • English: Computer science. • (L’informatique.) • French: Qu'est-ce que vous faites plus tard? (What are you doing later?) • English: I'm going skiing. • (Je vais faire du ski.) • French: Vous n'avez pas besoin de travailler? • (You don't need to work?) • English: I'll take my computer with me. • (Je prendrai mon ordinateur avec moi.) • French: Où est-ce que vous mettrez l'ordinateur • pendant que vous skiez? • (Where will you put the computer while you ski?) • English: In my pocket. • (Dans ma poche.)
200,000 potential customers Healthcare venues 6,003 hospitals (2003 www.USNews.com) 836,156 physicians (2001 www.ama.com) 15-20 minutes/meeting $45-$150/hour for human interpreter Market: U.S. Healthcare
Operational significant ROI 24/7 access to interpreting reduced patient waiting time more efficient use of employees (keep staff in their positions) patient SAFETY (real and perceived) reduced liability: bilingual transcripts of interaction with patients compliance Communication benefits privacy more verifiability, consistency than with human interpreter Informed consent Value Proposition
IDC Cross-language software: $67 billion (2000) to $237 billion (2005) Worldwide e-business globalization support: > $540 billion Multilingual communications, collaboration tools: $5 billion (by 2008) Allied Business Intelligence, Inc. Worldwide human translation: $5.7 billion (in 2006) Global Reach 70%+ of online population not native English WorldwideMarket
Defense and Security services, intelligence, allies law enforcement Travel and Tourism Language Instruction/Education Government Service immigration welfare, food stamps, etc. Business B2C: customer service B2B: multinational firms, global partners/operations Consumer online affinity/personal portals (e.g. online dating) Markets
Some Current Research/Commercial Activity • Spoken Translation, Inc. (Converser) • IBM (Mastor) • Sehda (S-Minds) • SpeechGear (Compadre Interpreter) • VoxTec (Phraselator) • Sony/Sharp/NEC (tourist) • Ectaco (Dictionary +) • MIT (flight domain) • CMU (Arabic for military) • BBN (Arabic for military)
Thank you! To view demo visit: www.ConverserforHealthcare.com