1 / 12

Zheng-Hua Tan, Paul Dalsgaard and Bø rge Lindberg Aalborg University, Denmark

Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End. Zheng-Hua Tan, Paul Dalsgaard and Bø rge Lindberg Aalborg University, Denmark. Outline. Background and motivation Half frame-rate front-end Experimental evaluation

Download Presentation

Zheng-Hua Tan, Paul Dalsgaard and Bø rge Lindberg Aalborg University, Denmark

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End Zheng-Hua Tan, Paul Dalsgaard and Børge Lindberg Aalborg University, Denmark IEEE MMSP 2005, Shanghai, China

  2. Outline • Background and motivation • Half frame-rate front-end • Experimental evaluation • Adaptive multi-frame-rate DSR scheme • Experimental evaluation • Conclusions IEEE MMSP 2005, Shanghai, China

  3. Source & channel coding Source & channeldecoding Network constraints Background and motivation • Distributed speech recognition (DSR) – automatic speech recognition (ASR) over mobile networks • Networking introduced challenges: • Bandwidth limitations • Transmission errors Speech Words Feature extraction ASR decoding IEEE MMSP 2005, Shanghai, China

  4. Background and motivation • Existing solutions: • Source coding to compress speech features, e.g. split vector quantization, discrete cosine transform • Channel coding and error concealment to protect and recover speech features • Our alternative solutions: in the front-end feature extraction stage based on the redundancies known to exist in full frame-rate (FFR) features  • half frame-rate (HFR) front-end • adaptive multi-frame-rate scheme IEEE MMSP 2005, Shanghai, China

  5. Full frame-rate front-end • Temporal correlation between speech features caused by • Vocal tract inertia • Overlapping in the feature extraction procedure: ms 10 20 25 35 45 00 15 ms overlap 10 ms frame shift 25 ms frame length IEEE MMSP 2005, Shanghai, China

  6. Half frame-rate front-end • 25 ms frame length & 20 ms frame shift  5 ms overlap • But why is FFR front-end prevalent in ASR systems? • And why is HFR front-end promising in DSR? ms 10 20 25 35 45 00 5 ms overlap 20 ms frame shift 25 ms frame length IEEE MMSP 2005, Shanghai, China

  7. HFR front-end in DSR • Observation: the performance degradation of DSR is marginal when packet loss occurs in short bursts on the condition that a proper error concealment technique is applied. • so why not deliberately drop some packets (speech frames)?  HFR + repetition ‘error concealment’: Prior to server-side recognition, each HFR feature vector is repeated once to construct the FFR vector equivalent. IEEE MMSP 2005, Shanghai, China

  8. Experiments • Recognition accuracy (%) across the front-ends for three databases using FFR models • Repetition of each HFR feature vector is critical! IEEE MMSP 2005, Shanghai, China

  9. The HFR front-end – half the bit rate • FFR-based one-frame coding • FFR-based interleaving24 No delay when transmission errors as opposed to the regular interleaving! Derived DSR schemes • The FFR-based ETSI-DSR standard • FFR-based multiple description coding (MDC): odd-numbered & even-numbered feature vectors IEEE MMSP 2005, Shanghai, China

  10. Comparison of DSR schemes • Robustness against transmission errors (Word Error Rate %) • Aurora 2 database corrupted by GSM error pattern 3 (4 dB C/I ratio) Error-free MDC Interleaving24 Half frame-rate – Repetition ETSI-DSR Standard No CRC Which is the best? WER IEEE MMSP 2005, Shanghai, China

  11. FFR Front-End Split VQ Coder HFR Front-End Network Context Adaptive multi-frame-rate scheme Client Front-End Speech Channel Encoder Error-Prone Channel Words Channel Decoder incl. EC Recogniser Split VQ Decoder Server Back-End IEEE MMSP 2005, Shanghai, China

  12. Conclusions • Half frame-rate front-end for DSR: • half frame-rate, half bit-rate, half client-side computation. • comparable performance, but repetition of HFR features is critical. • Adaptive multi-frame-rate DSR scheme • HFR • one-frame coding • Interleaving • no transmission errors, no delay • MDC • a performance close to error-free channel IEEE MMSP 2005, Shanghai, China

More Related