1 / 51

Histogram-based Quantization for Distributed / Robust Speech Recognition

Histogram-based Quantization for Distributed / Robust Speech Recognition. Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16. Outline. Introduction Histogram-based Quantization (HQ) Joint Uncertainty Decoding (JUD) Three-stage Error Concealment (EC)

xanti
Download Presentation

Histogram-based Quantization for Distributed / Robust Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C. 2007/08/16

  2. Outline • Introduction • Histogram-based Quantization (HQ) • Joint Uncertainty Decoding (JUD) • Three-stage Error Concealment (EC) • Conclusion

  3. Mismatch between fixed VQ codebook and test data increases distortion Problems of Distance-based VQ • Conventional Distance-based VQ (e.g. SVQ) was popularly used in DSR • Dynamic Environmental noise and codebook mismatch jointly degrade theperformance of SVQ Noise moves clean speech to another partition cell (X to Y) Quantization increases difference between clean and noisy features • Histogram-based Quantization (HQ) is proposed to solve the problems

  4. T Histogram-based Quantization (HQ) • Decision boundaries yi{i=1,…,N} are dynamically defined by C(y). • Representative values zi {i=1,…,N} are fixed, transformed by a standard Gaussian.

  5. T Histogram-based Quantization (HQ) The actual decision boundaries (horizontal scale) for xt are dynamically defined by the inverse transformation of C(y).

  6. T Histogram-based Quantization (HQ) • With histogram C’(y’), decision boundaries automatically changed to . • Decision boundaries are adjusted according to local statistics, no codebook mismatch problem.

  7. T Histogram-based Quantization (HQ) • Based on CDF on the vertical scale and histogram, less sensitive to noise on the horizontal scale • Disturbances are automatically absorbed into HQ block • Dynamic nature of HQ • hidden codebook on vertical scale • transformed by dynamic C(y) • {yi} Dynamic on horizontal scale

  8. Histogram-based Vector Quantization (HVQ)

  9. Discussions about robustness of Histogram-based Quantization (HQ) • Distributed speech recognition: SVQ v.s. HQ • Robust speech recognition: HEQ v.s. HQ

  10. Fixed codebook cannot well represent the noisy speech Dynamically adjusted to local statistics, no codebook mismatch Quantization increases difference between clean and noisy speech. Inherent robust nature, noise disturbances automatically absorbed by C(y) Comparison of Distance-based VQand Histogram-based Quantization (HQ) • HQ solves the major problems of conventional Distance-based VQ

  11. HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization) • HEQ performed point-to-point transformation • point-based order-statistics are more disturbed • HQ performed block-based transformation • automatically absorbed disturbance within a block • with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding • Averaged normalized distance between clean and corrupted speech features based on AURORA 2 database

  12. HEQ (Histogram Equalization) v.s. HQ (Histogram-based Quantization) • HEQ performed point-to-point transformation • point-based order-statistics are more disturbed • HQ performed block-based transformation • automatically absorbed disturbance within a block • with proper choice of block size, block uncertainty can be compensated by GMM and uncertainty decoding • HQ gives smaller d for all SNR condition • less influenced by the noise disturbance

  13. HQ as a feature transformation method

  14. HQ as a feature quantization method

  15. HQ as a feature quantization method

  16. HQ as a feature quantization method

  17. HQ as a feature quantization method

  18. HQ as a feature quantization method

  19. HQ as a feature quantization method

  20. Further analysisBit rates v.s. SNR Clean-condition training multi-condition training

  21. Client Server HQ-JUD • For both robust and/or distributed speech recognition • For robust speech recognition • HQ is used as the front-end feature transformation • JUD as the enhancement approach at the backend recognizer • For Distributed Speech Recognition (DSR) • HQ is applied at the client for data compression • JUD at the server Front-end Back-end Robustness DSR

  22. Joint Uncertainty Decoding (1/4)- Uncertainty Observation Decoding w: observation, o: uncorrupted features Assume • HMM would be less discriminate on features with higher uncertainty • Increasing larger variance for more uncertain features

  23. Joint Uncertainty Decoding (2/4)- Uncertainty for quantization errors • Codeword is the observation w • Samples in the partition cell are the uncorrupted features o • p(o) is the pdf of the samples within the partition cell Variance of samples within partition cell

  24. More uncertain regions Loosely quantized cells Joint Uncertainty Decoding (2/4)- Uncertainty for quantization errors • Codeword is the observation w • Samples in the partition cell are the possible distribution o • p(o) is the pdf of the samples within the partition cell Variance of samples within partition cell • Increases the variances for the loosely quantized cells

  25. Joint Uncertainty Decoding (3/4)-Uncertainty for environmental noise Histogram shift • Increase the variances for HQ features with a larger histogram shift

  26. Joint Uncertainty Decoding (4/4) • Jointly consider the uncertainty caused by both the environmental noise and the quantization errors. • One of the above two would dominate • Quantization errors (High SNR) • Disturbance absorbed into HQ block • Environment noise (Low SNR) • Noisy features moved to another partition cells

  27. HQ-JUDfor robust speech recognition

  28. Client HEQ-SVQ Server UD Client HQ Server JUD Client HEQ-SVQ Client HQ HQ-JUDfor distributed speech recognition • Different types of noise, averaged over all SNR values

  29. Client HEQ-SVQ Server UD Client HEQ-SVQ HQ-JUDfor distributed speech recognition • Different types of noise, averaged over all SNR values • HEQSVQ-UD was slightly worse than HEQ for set C

  30. Client HQ Server JUD Client HQ HQ-JUDfor distributed speech recognition • Different types of noise, averaged over all SNR values • HEQSVQ-UD was slightly worse than HEQ for set C • HQ-JUD consistently improved the performance of HQ

  31. Client HEQ-SVQ Client HQ HQ-JUDfor distributed speech recognition • Different types of noise, averaged over all SNR values • HQ performed better than HEQ-SVQ for all types of noise

  32. Client HQ Server JUD HQ-JUDfor distributed speech recognition Client HEQ-SVQ Server UD • Different types of noise, averaged over all SNR values • HQ performed better than HEQ-SVQ for all types of noise • HQ-JUD consistently performed better than HEQSVQ-UD

  33. Client HQ Server JUD HQ-JUDfor distributed speech recognition • Different SNR conditions, averaged over all noise types Client SVQ Server UD Client HEQ-SVQ Server UD Client HQ Server JUD • HQ-JUD significantly improved the performance of SVQ-UD • HQ-JUD consistently performed better than HEQSVQ-UD

  34. Three-stage error concealment (EC)

  35. Stage 1 : error detection • Frame-level error detection • The received frame-pairs are first checked with CRC • Subvector-level error detection • The erroneous frame-pairs are then checked by the HQ consistency check • The quantized codewords for HQ represent the order-statistics information of the original parameters • Quantizaiton process does not change the order-statistics • Re-perform HQ on received subvector codeword should fall in the same partition cell

  36. Stage 1 : error detection • Noise seriously affects the SVQ with data consistency check -precision degradation (from 66% at clean down to 12% at 0 dB) • HQ-based consistency approach is much more stable at all SNR values, - both recall and precision rates are higher.

  37. Stage 2 : reconstruction • Based on the Maximum a posterior (MAP) criterion -Considering the probability for all possible codewords St(i) at time t, given the current and previous received subvector codewords, Rt and Rt-1, -prior speech source statistics : HQ codeword bigram model -channel transition probability : the estimated BER from stage1 -reliability of the received subvectors : consider the relative reliability between prior speech source and wireless channel prior channel

  38. Stage 2 : reconstruction • Channel transition probability P(Rt | St(i)) -significantly differentiated (for different codeword i, with different d) when Rt is more reliable (BER is smaller) -put more emphasis on prior speech source when Rt is less reliable -the estimated BER is the number of inconsistent subvectors in the present frame divided by the total number of bits in the frame

  39. Stage 2 : reconstruction • Prior source information P(St (i)| Rt-1) -based on the codeword bi-gram trained from cleaning training data in AURORA 2 -HQ can estimate the lost subvectors more preciously than SVQ -The conditional entropy measure

  40. Stage 3 : Compensation in Viterbi decoding • The distribution of P(St (i)|Rt ,Rt-1) characterizes the uncertainty of the estimated features • Assume the distribution P(St (i)|Rt ,Rt-1) is Gaussian, the variance of the distribution P(St (i)|Rt ,Rt-1) is used in Uncertainty Decoding • Make the HMMs less discriminative for the estimated subvectors with higher uncertainty

  41. HQ-based DSR system with transmission errors • Features corrupted by noise are more susceptible to transmission errors • For SVQ, 98% to 87% (clean), 60% to 36% (10 dB SNR)

  42. HQ-based DSR system with transmission errors • The improvements that HQ offered over HEQ-SVQ when transmission errors were present are consistent and significant at all SNR values • HQ is robust against both environmental noise and transmission errors

  43. Analyze the degradation of recognition accuracy caused by transmission errors • Comparison of SVQ, HEQ-SVQ and HQ for the percentage of words which were correctly recognized if without transmission errors, but incorrectly recognized after transmission.

  44. HQ-Based DSR with Wireless Channels and Error Concealment g: GPRS r: ETSI repetition c: three-stage EC • ETSI repetition technique actually degraded the performance of HEQ-SVQg • the whole feature vectors including the correct subvectors are replaced by inaccurate estimations

  45. HQ-Based DSR with Wireless Channels and Error Concealment g: GPRS r: ETSI repetition c: three-stage EC • Three-stage EC improved the performance significantly for all cases. • Robust against not only transmission errors, but against environmental noise as well.

  46. HQ-Based DSR with Wireless Channels and Error Concealment

  47. Different client traveling speed (1/3)

  48. Different client traveling speed (2/3)

  49. Different client traveling speed (3/3)

  50. Conclusions • Histogram-based Quantization (HQ) is proposed • a novel approach for robust and/or distributed speech recognition (DSR) • robust against environmental noise (for all types of noise and all SNR conditions) and transmission errors • For future personalized and context aware DSR environments • HQ can be adapted to network and terminal capabilities • with recognition performance optimized based on environmental conditions

More Related