Linguistic Resources needed by Nuance

Linguistic Resources needed by Nuance Jan Odijk 060528 Cocosda/Write Workshop

Overview • Nuance History • Nuance Technologies • Nuance Language Coverage • Which Languages are needed • Which data are needed • Advantages

Nuance History • ScanSoft (Digital Imaging) acquired: • Lernout & Hauspie speech divisions (2001) • Philips Speech Processing embedded and network divisions (2002) • Telelogue (2003) • LocusDialog (2003) • SpeechWorks (2004) • Talks (2004) • ART (2005) • Phonetic Systems (2005) • Rhetorical (2005) • MedRemote (2005) • Nuance (2005) company renamed Nuance • Dictaphone (2006)

Nuance Technologies • Digital Imaging • Speech Technologies • Text-to-Speech (TTS) • Automatic Speech Recognition (ASR) • Dictation • Speaker Verification • Audiomining • Speech Applications/Solutions • Automated Attendant Systems • Directory Assistance Systems • Dictation end-user application • Multimodal applications

Nuance Technologies • Platforms • Server • DeskTop • Embedded • Automotive • Mobile Phones • Domains • Horizontal • Vertical • Medical • Legal • Navigation • ....

Nuance Language Coverage • Broad language coverage • OCR supports 114 languages • DeskTop Dictation in 8 languages • TTS > 23 languages • Telephony ASR > 40 languages • Embedded ASR > 11 languages • Broad language coverage necessary • Most business customers are operating internationally • Want a single provider of language and speech technologies

Nuance Language Coverage • Language Coverage must be further broadened! • Data are needed for that, but ... • Costs are high • No single company can afford the investments

Which Languages? • Priority 1 • Arabic, Chinese (Mandarin, Cantonese), Danish, Dutch, English (UK), English (US), Farsi, Finnish, French, French (Canadian), German, Hindi, Indonesian, Italian, Malaysian, Pilipino (Tagalog), Polish, Portuguese, Portuguese (Brazil), Russian, Spanish, Spanish (American), Swedish, Thai, Turkish, Vietnamese,... • Priority 2 • Bulgarian, Croatian, Czech, Estonian, Greek, Gujarati, Hebrew, Hungarian, Icelandic, Japanese, Kannada, Kazak, Khmer, Latvian, Lithuanian, Macedonian, Malayalam, Marathi, Norwegian, Punjabi Romanian, Serbian, Sesotho, Sinhalese, Slovak, Slovenian, Swahili, Tamil, Telugu, Ukrainian, Urdu, Uzbek, Xhosa, Zulu,...

Which Data? • There’s not Data but More Data • but... • Given Time and Costs constraints a minimal set is needed to develop technologies/applications for new languages

Which Data? • Network ASR: SpeechDat family • SpeechDat-II, Orientel, SALA (I and II), LILA • Embedded ASR • Automotive: SpeechDat-Car • Consumer Apps: SPEECON • Pronunciation and Grammatical Lexicons: LC-STAR • TTS synthesis: TC-STAR • see • http://www.speechdat.org • http://www.tc-star.org • http://www.lc-star.com

Which Data? • Desktop Office data • Large Text Corpora (>300 million tokens plain text) • news • business / finance • traffic messages, weather messages • e-mail • SMS • ...

Advantages • Research can be done in your own language • Part of the costs can be recovered by licensing data via ELRA to companies • Companies can develop technologies/applications for your languages • Contributes to securing the position of your language in the Internet era • Ask your government for funding and support • Some good examples: • STEVIN Programme Netherlands/Flanders • UPC databases for Catalan (Asunción Moreno)

Linguistic Resources needed by Nuance

Linguistic Resources needed by Nuance

Presentation Transcript

INTRODUCING SPYDER NUANCE

Nuance Desktop Products

Dragon Dictate by Nuance Communication

Nuance

List of all resources needed

Mobilizing Multimedia Linguistic Resources

Nuance 2006

Determining Where Resources Are Most Needed

Linguistic Resources for Localisation

list-of-all-resources-needed

Linguistic

LIRICS Linguistic Infrastructure for Interoperable Resources and Systems

Nuance Vocalizer for Automotive Nuance Vocalizer Studio

Replicating Linguistic Resources

Nuance Promo Code

Usmlelab Needed Resources One

Nuance Paperport 14

nuance customer support

Nuance Users List

LIRICS Linguistic Infrastructure for Interoperable Resources and Systems

nuance dragon login

Nuance Dragon Support