1 / 24

Semi-Supervised Learning Processes in Speech Recognition Systems

Semi-Supervised Learning Processes in Speech Recognition Systems. Atzmon Ghilai AVIOS,September 2005. May 2005 Announced Acquisition of Nuance. ScanSoft’s Heritage. Nov 2004 – Acquired ART, Phonetic, and Rhetorical. Aug 2003 – Acquired LocusDialog. Aug 2003 – Acquired SpeechWorks.

maya
Download Presentation

Semi-Supervised Learning Processes in Speech Recognition Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semi-Supervised Learning Processes in Speech Recognition Systems Atzmon Ghilai AVIOS,September 2005

  2. May 2005AnnouncedAcquisition of Nuance ScanSoft’s Heritage Nov 2004 – Acquired ART, Phonetic, and Rhetorical Aug 2003 – Acquired LocusDialog Aug 2003 – Acquired SpeechWorks Jan 2003 – Acquired Philips Speech Processing Dec 2001 – Acquired L&H Speech Assets Mar 1999 – Merged with Visioneer, renamed ScanSoft and listed on Nasdaq: SSFT 1993 -- Established as an Independent Xerox Business Unit

  3. What Does ScanSoft Provide? Network-based Speech • Automated Speech Recognition (ASR) • Text-to-Speech (TTS) • Speaker Verification • Auto-Attendant • Directory Assistance • Packaged Solutions • Professional Services Embedded Speech • Automated Speech Recognition (ASR) • Text-to-Speech (TTS) • Speaker Verification • Command and Control • Automotive Solutions • Cell Phone Solutions • Professional Services Desktop Speech • Dictation and Transcription • Document Conversion/ OCR • Document Scanning & Management • Electronic Forms • PDF Solutions • Network SDK Services (e.g., Medical Transcription)

  4. A Typical Automated Directory Assistance Application • Think about a 144 application with no operators…

  5. Automated DA – A Huge Technological Challenge • Huge vocabulary: millions of listings • Open to the public via land-line and cellular phones • People do not know exactly how listings appear in directory • People are not aware of caption hierarchy

  6. It has level zero performance When the System is Deployed Out of the Box… • Typically only 15-20% of the calls are actually automated • All the rest continue to live operators

  7. How Can System Performance Improve? • In order to improve its performance we implemented a Semi Supervised Learning Process • We took advantage of 2 facts: • Our systems process 220,000,000 calls a year! • Non-automated calls are treated by an operator

  8. What Are the Areas of Improvement? • Acoustic Model retraining using Live Calls • Adding Rules to DB Expert • Improving Phonetic Transcriptions • Decorations • Accumulating Statistics on Apriori Probabilities • Defining Expanded Locality Boundaries • Improving Telephony User Interface

  9. R A Z L Phoneme “Cloud” A N E B O S M 1) Acoustic Model Retraining • Acoustic Model retraining using live calls Phoneme Decoding

  10. 2) Aliasing Rules “New England Hospital” “Hospital on Washington Avenue” “Children Clinic in New England Hospital”

  11. Home Depot HOwM DIyPOT 3) Phonetic Transcriptions • Improving Phonetic Transcriptions Text to Phoneme Phonetic DB Transformation Rules • Manual transcriptions may be added to Phonetic DB

  12. please I would like my account balance 4) Decorations • Decorations An example from our Call Center Solutions: • “I would like My Account Balance Please”

  13. 5) Apriori Probabilities • Accumulating statistics on apriori probabilities Card Balance Account Balance The probability that a caller will request “Account Balance” is much higher than “Card Balance”

  14. 6) Expanded Localities (Halos) • Definitions of expanded locality boundaries in DA • If boundaries are too narrow… • Probable user errors will not be corrected • If boundaries are too wide… • Excessive search • Confusions

  15. 7) VUI (Voice User Interface) • Telephony User Interface improvements • We analyzed thousands of operator assisted calls • What makes the user experience pleasant? • What drives the user crazy? • System: What city and state? • Caller: Boston Mass. • System: What listing? • Caller: Office Depot • System: What was the listing again? • Caller: Oh, Jesus… Office Depot. • System: I found… Jesus, is that OK?

  16. How is Learning Process Done? Phone DB Released # Live calls Call collection & delivery Session log Automated Analysis Performance benchmarks Batch run execution Batch run generation Manual Annotation Filtering/ Prioritization Learning Process Manual DB Search

  17. Automated Analysis Correlate caller’s utterance with DB listings Good correlation otherwise Send to manual analysis process Send to automatic Learning process

  18. Manual Annotation • Analysts listen and transcribe what caller had said • Born and raised in target country

  19. Prioritization/Filtering • Max number of identical contents to be searched (at least 1…) • Text matching of cleaned contents to DB

  20. Manual DB Search Content: Birth Certificates Any City

  21. We send pre-recorded calls via a simulator… Batch Simulator recordings Batch Run • Instead of taking live calls… IVR Platform Phonetic Servers IP PSTN

  22. Batch Run Results • We can quantify expected improvement • We can make sure no adverse effect • We can test stability, density

  23. It Really Works!!!

  24. Thank You! Atzmon Ghilai Atzmon.Ghilai@ScanSoft.com Tel: +972-3-9292602

More Related