1 / 36

Wake-Up-Word Speech Recognition:

Wake-Up-Word Speech Recognition:. A Missing Link to Natural Language Understanding Dr. Veton Këpuska ECE Department vkepuska@fit.edu. What is: Wake-Up-Word Recognition. Wake-Up-Word ( WUW ) Speech/Voice Recognition ( SR ):

alvin-hess
Download Presentation

Wake-Up-Word Speech Recognition:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wake-Up-Word Speech Recognition: A Missing Link to Natural Language Understanding Dr. Veton Këpuska ECE Department vkepuska@fit.edu

  2. What is: Wake-Up-Word Recognition • Wake-Up-Word (WUW) Speech/Voice Recognition (SR): • Automatic Speech Recognition Task of identifying a single word/phrase in a continuous free speech – Correct Recognition (e.g.): • <HAL> – Arthur Clark’s “Space Odyssey 2001”, • <Computer> – Capt. Pickard’s Star Trek’s computer on the starship “Enterprise”, or • <Operator> – Capt. Këpuska’s WUW-SR System & more importantly • Automatic Recognition of any other noise/sound/word/phrase etc. NOT to be that WUW – Correct Rejection. Dr. Veton Këpuska

  3. WUW-SR • WUW-SR Requires Continuous Monitoring of Speech • WUW can be used to: • Get Attention, • Provide/Change Context, • Resynchronize Communication • Mimic Human to Human Interaction and Communication that currently is not possible, & • Provides for significantly more efficient Solution (Memory and CPU) vs. any Natural Language Understanding System. • It is a mode of communication that would enable more natural interaction of man and machine. Dr. Veton Këpuska

  4. Natural Language Understanding (NLU) Task • Massachusetts Institute of Technology’s (MIT’s) Spoken Language Systems Laboratory’s mission statement states: • “Our goal is both simple and ambitious – create technology that makes it possible for everyone in the world to interact with computers via natural spoken language. Conversational interfaces will enable us to converse with machines in much the same way that we communicate with one another and will play a fundamental role in facilitating our move toward an information-based society”. • To achieve this goal, SR and NLU communities implicitly position the solution to WUW problem in the context of solving overall natural language understanding problem. • When a system that can understand the whole language is developed, the WUW problem will be solved. Dr. Veton Këpuska

  5. Natural Language Understanding Task - Problem • There are two major problems with the approach that requires solving the WUW problem within a general framework of the speech and natural language understanding system: • Is an expensive solution (CPU, memory, etc.) • It does not exist yet because it is very difficult to achieve. • Even if it is possible to develop NLU Systems close to human capabilities – WUW is still needed (see previous slide 3). Dr. Veton Këpuska

  6. WUW-SR Acoustic-Linguistic Context • Current Implementation of WUW recognizes how he/she intuitively would use a proper name to get attention: • It does not respond to other contexts where the same word (e.g., “OPERATOR”) is used for other purposes. • What are other WUW contexts? Dr. Veton Këpuska

  7. Wizard of Oz Experiment (NSF 05-551 Proposal) • Study possible uses of WUW in human-to-human communication. • Collaboration with: • Dr. Deborah Carstens – Human Machine Interface Specialist (FIT - Management Information Systems) • Dr. Ron Wallace – Bio-Behavioral Anthropology and English Language (UCF). • Department of Psychology – Behavior Analysis Laboratory. Dr. Veton Këpuska

  8. History of Wake-Up-Word Speech Recognition • Wildfire of Waltham Massachusetts: • Introduced rudimentary capability for Wake-Up-Word (WUW) Recognition through Personal Assistant application in mid 90’s. • At that time the solution was not recognized nor was developed as being a WUW-SR problem. • Application was restricted to specific word: • “Wildfire” • This custom solution did not perform sufficiently well and thus Wildfire does not exist any longer. Dr. Veton Këpuska

  9. History of Wake-Up-Word Speech Recognition (cont.) • Këpuska generalized and introduced a novel way of performing WUW Recognition while at ThinkEngine Networks, Marlborough, MA (2001-2003) • Recognition performance of the patented solution allows practical application of WUW for any suitable word (e.g., Verizon’s “IOBI” project). • Demonstration uses fixed point DSP implementation simulated in Windows platform. • New generation of WUW-SR system using floating-point C++ implementation almost ready for prime time. • Simulations of floating-point system indicate significant improvement over the fixed point implementation Dr. Veton Këpuska

  10. Wake-Up-Word Speech Recognition Technology • ~26000 Number of Lines of Fixed Point Implementation of C Code & Model Data. • Uses Dynamic Time Warping Algorithm for Pattern Matching (DTW) • Features are based on Mel-Scale Cepstral Coefficients (MFCC) + Delta’s and Second Order Delta’s • Uses single Speaker Independent Model. • Achieves high density on DSP Dr. Veton Këpuska

  11. WUW-SR System: Initial Development • ThinkEngine Networks, Marlborough, MA • 84 Simultaneous Channels of WUW Recognition on each fixed point TI’s TMS320C205 DSP • 200MHz • Memory Space: • 64K Byte Program • 64K Byte Data • 2M Byte External Data • Total of 672 Channels with farm of 8 DSPs • Recognition Rate >95% with ~0% False Acceptance. Dr. Veton Këpuska

  12. Solution: 3 Patented Inventions • Fundamental Contribution to Pattern RecognitionPatent Application 13323-009001 - 10/152,095: “Dynamic Time Warping (DTW) Matching” • Extended DTW Matching.Patent Application 13323-010001 - 10/152,447: “Rescoring using Distribution Distortion Measurements of Dynamic Time Warping Match” • Feature Based Voice Activity Detector (VAD)Patent Application 13323-011001 - 10/144,248: “Voice Activity Detection Based on Cepstral Features” Dr. Veton Këpuska

  13. 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 WUW Fixed-Point System Performance Distribution Plot of Confidence Scores for WUW "Operator" 1.0 INV INV-CUMMULATIVE 0.9 Equal Error Rate OOV OOV-CUMMULATIVE Operating Threshold [%] 0 20 40 60 80 100 Confidence Score (0-100)% Dr. Veton Këpuska

  14. WUW-SR Development Status • Implemented C++ ETSI-MFCC Front End: • Extraction of Mel-Filtered Cepstral Coefficients • Standard Processing Technique to be used as a baseline • C++ Framework and applied implementation emphasizes modularity to facilitate research • Implemented Dynamic Time Warping (DTW) as a Back-End of the Recognition system. • Integrated Perl scripts to automate model building and accuracy testing procedures. • Includes automatic graph generation Dr. Veton Këpuska

  15. Current Architecture of WUW-SR System Dr. Veton Këpuska

  16. Performance of WUW-SR Floating Point System Dr. Veton Këpuska

  17. WUW-SR System Performance • How is it possible to achieve this performance? Considering: • Single Speaker Independent Model for WUW • No Additional Modeling for other acoustic events: noise/tone/sound/word/phrase • Clever use of Two-Pass Scoring Dr. Veton Këpuska

  18. Usual Recognition Scoring: First Score • Standard “First” Recognition Score Performance Lowest Score of an OOV Sample Dr. Veton Këpuska

  19. “Second” Score is NOT-Independent from the “First” Score • Distribution of Second Score as Function of First Score Lowest Score of an OOV Sample Dr. Veton Këpuska

  20. How to Obtain “Second” Score? • All modern Speech Recognition Systems use multiple scoring techniques: • Re-scoring N-best hypothesis to Improve Correct Recognition based on: • More elaborate recognition algorithm • Baum-Welch Forward-Backward HMM Scoring vs. • Viterbi Scoring • Different Features • MFCC (Mel-scale Filtered Cepstral Coefficients) • RASTA-PLP (Relative Spectral Transform - Perceptual Linear Prediction) • Other Proprietary front-end’s • Re-scoring using additional models (of non-WUW’s) to improve Correct Rejection (“Garbage Models”) Dr. Veton Këpuska

  21. WUW-SR System • Uses Proprietary solution that • Does not require additional “Garbage Models” to increase robustness and Correct Rejection Rate, e.g., • It is model independent, and even • It is matching algorithm independent (DTW, HMM, Graphical Modeling, or any other paradigm). Dr. Veton Këpuska

  22. What Next? • WUW-SR: Useful technology for numerous applications: • “Voice Activated” Car Navigation System • Current Solutions apply mixed interfaces: Driver must press a button while speaking to the system. • Dictation Systems: Require lunching the application and “informing” the system when dictation is “on” and when is “off”. • PDA – removing stylus as necessary interface tool. • Keyboard-less laptop computers. • “Smart Rooms” Dr. Veton Këpuska

  23. Smart Room Application Dr. Veton Këpuska

  24. Microphone Arrays • Applied Perception Laboratory CE313 Dr. Veton Këpuska

  25. Noise Removal • First Place at UML-ADI Competition June, 2005. • Developed Wiener Filter Nose Removal and implemented on Analog Devices “Shark” DSP: Dr. Veton Këpuska

  26. Speech Processing and Recognition System Architecture • 48 kHz to 8 kHz Down-sampling with 70 Tap FIR Filter • Wiener Filter Based Noise Removal: • Switch Controlled Activation of the De-noising Algorithm • Automatic Gain Control: • Switch Controlled Activation of the Algorithm • LED Indicate the processing state of the System • Wake-Up-Word Speech Recognition Software • ~26000 Lines of Speech Recognition Engine Code & Model Data in C. • ~5000 Lines of Embedded C code Dr. Veton Këpuska

  27. Experimental Results Windows PC Noisy test file: After de-noise: Dr. Veton Këpuska

  28. Experimental Results Windows PC Footloose: Not Footloose: Dr. Veton Këpuska

  29. Results: why didn’t this work? Hair dryer: Still there?!?!: Dr. Veton Këpuska

  30. Experimental Results Windows PC Hair dryer: Gone: Dr. Veton Këpuska

  31. Experimental Results on DSP • Brown Noise Example: Dr. Veton Këpuska

  32. Experimental Results on DSP • Drill Test Dr. Veton Këpuska

  33. Experimental Results on DSP • Closer Drill Noise Dr. Veton Këpuska

  34. Experimental Results on DSP • Brown Noise + Drill Dr. Veton Këpuska

  35. Research: Tools Development • MATLAB (NSF EMD-MLR), perl, gnuplot Dr. Veton Këpuska

  36. What is missing? • In need of more of highly motivated students. • No news there! • Business opportunities and ventures need to be considered. • Help, advice, … welcome. Dr. Veton Këpuska

More Related