240 likes | 369 Views
Semi-Supervised Learning Processes in Speech Recognition Systems. Atzmon Ghilai AVIOS,September 2005. May 2005 Announced Acquisition of Nuance. ScanSoft’s Heritage. Nov 2004 – Acquired ART, Phonetic, and Rhetorical. Aug 2003 – Acquired LocusDialog. Aug 2003 – Acquired SpeechWorks.
E N D
Semi-Supervised Learning Processes in Speech Recognition Systems Atzmon Ghilai AVIOS,September 2005
May 2005AnnouncedAcquisition of Nuance ScanSoft’s Heritage Nov 2004 – Acquired ART, Phonetic, and Rhetorical Aug 2003 – Acquired LocusDialog Aug 2003 – Acquired SpeechWorks Jan 2003 – Acquired Philips Speech Processing Dec 2001 – Acquired L&H Speech Assets Mar 1999 – Merged with Visioneer, renamed ScanSoft and listed on Nasdaq: SSFT 1993 -- Established as an Independent Xerox Business Unit
What Does ScanSoft Provide? Network-based Speech • Automated Speech Recognition (ASR) • Text-to-Speech (TTS) • Speaker Verification • Auto-Attendant • Directory Assistance • Packaged Solutions • Professional Services Embedded Speech • Automated Speech Recognition (ASR) • Text-to-Speech (TTS) • Speaker Verification • Command and Control • Automotive Solutions • Cell Phone Solutions • Professional Services Desktop Speech • Dictation and Transcription • Document Conversion/ OCR • Document Scanning & Management • Electronic Forms • PDF Solutions • Network SDK Services (e.g., Medical Transcription)
A Typical Automated Directory Assistance Application • Think about a 144 application with no operators…
Automated DA – A Huge Technological Challenge • Huge vocabulary: millions of listings • Open to the public via land-line and cellular phones • People do not know exactly how listings appear in directory • People are not aware of caption hierarchy
It has level zero performance When the System is Deployed Out of the Box… • Typically only 15-20% of the calls are actually automated • All the rest continue to live operators
How Can System Performance Improve? • In order to improve its performance we implemented a Semi Supervised Learning Process • We took advantage of 2 facts: • Our systems process 220,000,000 calls a year! • Non-automated calls are treated by an operator
What Are the Areas of Improvement? • Acoustic Model retraining using Live Calls • Adding Rules to DB Expert • Improving Phonetic Transcriptions • Decorations • Accumulating Statistics on Apriori Probabilities • Defining Expanded Locality Boundaries • Improving Telephony User Interface
R A Z L Phoneme “Cloud” A N E B O S M 1) Acoustic Model Retraining • Acoustic Model retraining using live calls Phoneme Decoding
2) Aliasing Rules “New England Hospital” “Hospital on Washington Avenue” “Children Clinic in New England Hospital”
Home Depot HOwM DIyPOT 3) Phonetic Transcriptions • Improving Phonetic Transcriptions Text to Phoneme Phonetic DB Transformation Rules • Manual transcriptions may be added to Phonetic DB
please I would like my account balance 4) Decorations • Decorations An example from our Call Center Solutions: • “I would like My Account Balance Please”
5) Apriori Probabilities • Accumulating statistics on apriori probabilities Card Balance Account Balance The probability that a caller will request “Account Balance” is much higher than “Card Balance”
6) Expanded Localities (Halos) • Definitions of expanded locality boundaries in DA • If boundaries are too narrow… • Probable user errors will not be corrected • If boundaries are too wide… • Excessive search • Confusions
7) VUI (Voice User Interface) • Telephony User Interface improvements • We analyzed thousands of operator assisted calls • What makes the user experience pleasant? • What drives the user crazy? • System: What city and state? • Caller: Boston Mass. • System: What listing? • Caller: Office Depot • System: What was the listing again? • Caller: Oh, Jesus… Office Depot. • System: I found… Jesus, is that OK?
How is Learning Process Done? Phone DB Released # Live calls Call collection & delivery Session log Automated Analysis Performance benchmarks Batch run execution Batch run generation Manual Annotation Filtering/ Prioritization Learning Process Manual DB Search
Automated Analysis Correlate caller’s utterance with DB listings Good correlation otherwise Send to manual analysis process Send to automatic Learning process
Manual Annotation • Analysts listen and transcribe what caller had said • Born and raised in target country
Prioritization/Filtering • Max number of identical contents to be searched (at least 1…) • Text matching of cleaned contents to DB
Manual DB Search Content: Birth Certificates Any City
We send pre-recorded calls via a simulator… Batch Simulator recordings Batch Run • Instead of taking live calls… IVR Platform Phonetic Servers IP PSTN
Batch Run Results • We can quantify expected improvement • We can make sure no adverse effect • We can test stability, density
Thank You! Atzmon Ghilai Atzmon.Ghilai@ScanSoft.com Tel: +972-3-9292602