120 likes | 248 Views
Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End. Zheng-Hua Tan, Paul Dalsgaard and Bø rge Lindberg Aalborg University, Denmark. Outline. Background and motivation Half frame-rate front-end Experimental evaluation
E N D
Adaptive Multi-Frame-Rate Scheme for Distributed Speech Recognition Based on a Half Frame-Rate Front-End Zheng-Hua Tan, Paul Dalsgaard and Børge Lindberg Aalborg University, Denmark IEEE MMSP 2005, Shanghai, China
Outline • Background and motivation • Half frame-rate front-end • Experimental evaluation • Adaptive multi-frame-rate DSR scheme • Experimental evaluation • Conclusions IEEE MMSP 2005, Shanghai, China
Source & channel coding Source & channeldecoding Network constraints Background and motivation • Distributed speech recognition (DSR) – automatic speech recognition (ASR) over mobile networks • Networking introduced challenges: • Bandwidth limitations • Transmission errors Speech Words Feature extraction ASR decoding IEEE MMSP 2005, Shanghai, China
Background and motivation • Existing solutions: • Source coding to compress speech features, e.g. split vector quantization, discrete cosine transform • Channel coding and error concealment to protect and recover speech features • Our alternative solutions: in the front-end feature extraction stage based on the redundancies known to exist in full frame-rate (FFR) features • half frame-rate (HFR) front-end • adaptive multi-frame-rate scheme IEEE MMSP 2005, Shanghai, China
Full frame-rate front-end • Temporal correlation between speech features caused by • Vocal tract inertia • Overlapping in the feature extraction procedure: ms 10 20 25 35 45 00 15 ms overlap 10 ms frame shift 25 ms frame length IEEE MMSP 2005, Shanghai, China
Half frame-rate front-end • 25 ms frame length & 20 ms frame shift 5 ms overlap • But why is FFR front-end prevalent in ASR systems? • And why is HFR front-end promising in DSR? ms 10 20 25 35 45 00 5 ms overlap 20 ms frame shift 25 ms frame length IEEE MMSP 2005, Shanghai, China
HFR front-end in DSR • Observation: the performance degradation of DSR is marginal when packet loss occurs in short bursts on the condition that a proper error concealment technique is applied. • so why not deliberately drop some packets (speech frames)? HFR + repetition ‘error concealment’: Prior to server-side recognition, each HFR feature vector is repeated once to construct the FFR vector equivalent. IEEE MMSP 2005, Shanghai, China
Experiments • Recognition accuracy (%) across the front-ends for three databases using FFR models • Repetition of each HFR feature vector is critical! IEEE MMSP 2005, Shanghai, China
The HFR front-end – half the bit rate • FFR-based one-frame coding • FFR-based interleaving24 No delay when transmission errors as opposed to the regular interleaving! Derived DSR schemes • The FFR-based ETSI-DSR standard • FFR-based multiple description coding (MDC): odd-numbered & even-numbered feature vectors IEEE MMSP 2005, Shanghai, China
Comparison of DSR schemes • Robustness against transmission errors (Word Error Rate %) • Aurora 2 database corrupted by GSM error pattern 3 (4 dB C/I ratio) Error-free MDC Interleaving24 Half frame-rate – Repetition ETSI-DSR Standard No CRC Which is the best? WER IEEE MMSP 2005, Shanghai, China
FFR Front-End Split VQ Coder HFR Front-End Network Context Adaptive multi-frame-rate scheme Client Front-End Speech Channel Encoder Error-Prone Channel Words Channel Decoder incl. EC Recogniser Split VQ Decoder Server Back-End IEEE MMSP 2005, Shanghai, China
Conclusions • Half frame-rate front-end for DSR: • half frame-rate, half bit-rate, half client-side computation. • comparable performance, but repetition of HFR features is critical. • Adaptive multi-frame-rate DSR scheme • HFR • one-frame coding • Interleaving • no transmission errors, no delay • MDC • a performance close to error-free channel IEEE MMSP 2005, Shanghai, China