1 / 184

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

TUTORIAL. Single and Multi Channel Feature Enhancement for Distant Speech Recognition. John McDonough (1) , Matthias Wölfel (2) , Friedrich Faubel (3). (1). (2). (3). Saarland University. Spoken Language Systems. TUTORIAL. Overview. I - Applications of DSR

sharla
Download Presentation

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TUTORIAL Single and Multi Channel Feature Enhancement for Distant Speech Recognition John McDonough(1) , Matthias Wölfel(2), Friedrich Faubel(3) (1) (2) (3) Saarland University Spoken Language Systems

  2. TUTORIAL Overview I - Applications of DSR II - Characteristics of Human Speech III - The Acoustic Environment IV - Speech feature enhancement • V - Speaker Tracking • VI - Digital Filter Banks VII - Beamforming

  3. TUTORIAL Part III The Acoustic Environment

  4. Sound PropagationSound Waves • Sound waves are disturbances of air molecules, which propagate as wave fronts of compressed air, followed by decompressed air. sound source speed of sound attenuation wave front

  5. Sound PropagationSuperposition • Sound waves follow the law of superposition, i.e. individual waves from different sources add up at each point in the room. source 1 source 3 source 2

  6. Sound PropagationReflection • Sound waves are reflected by obstacles, such as walls, columns, chairs and tables.

  7. Sound PropagationReflection • Sound waves are reflected by obstacles, such as walls, columns, chairs and tables.

  8. Sound PropagationReflection • Sound waves are reflected by obstacles, such as walls, columns, chairs and tables.

  9. Sound PropagationReflection • Sound waves are reflected by obstacles, such as walls, columns, chairs and tables.

  10. Sound PropagationReflection • Sound waves are reflected by obstacles, such as walls, columns, chairs and tables.

  11. Sound PropagationAbsorption • Obstacles do not only reflect sound. They also absorb some portions in a frequency dependant manner. frequency- dependant absorption

  12. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound.

  13. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound.

  14. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound.

  15. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound. amplification =

  16. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound. attenuation =

  17. Sound PropagationColoration • Constructive and destructive interference of reflections can cause comb filter like “coloration” of the sound. amplitude frequency

  18. Sound PropagationDiffraction • Sound waves spread out behind small openings and bend around corners.

  19. Sound PropagationOther Effects • Refraction: wave passes from one medium to another at an angle • Scattering:diffuse reflections of waves • Surface waves: longitudinal and transverse motion along surfaces; lower attenuation than normal sound waves

  20. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection: direct path reflection listener source wall

  21. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection:

  22. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection: • Generalization:

  23. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection: • Generalization: • Impulse response model:

  24. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection: • Generalization: • Impulse response model: with impulse response

  25. Room Impulse ResponseIntroduction • Model for a single sound source with one reflection: • Generalization: • Impulse response model: intensity time

  26. Room Impulse ResponseReinforcement & Reverberation • Early reflections: semi-distinct reflections from various reflective surfaces with a delay of up to 50ms; reinforce the sound (<35ms). early reflections intensity time

  27. Room Impulse ResponseReinforcement & Reverberation • Early reflections: semi-distinct reflections from various reflective surfaces with a delay of up to 50ms; reinforce the sound (<35ms). • Late reflections: reflections with lower amplitude that are closely spaced in time; responsible for “reverberant” sound. early reflections late reflections intensity time

  28. Room Impulse ResponseEcho • Echo: distinct, strong late reflection with a delay of more than 100ms. early reflections late reflections intensity time

  29. Close Talking versus Distant Speech • Reverberation smears spectra in time • Noise fills spectral valleys close talking . distant .

  30. Effects of Noise • Shown in the figure is a simplified plot of relative sound pressure vs. time for an utterance of the word “cat” in additive noise. • Note that the final /t/ is obscured by the noise floor. • Hence the word is indistinguishable from “cab” or “cap”.

  31. Effects of Reverberation • Shown in the figure is a simplified plot of relative sound pressure vs. time for an utterance of the word “cat” in the presence of reverberation. • Note that the final /t/ is still obscured, but this time by the ring down of the preceding phones. • Hence the word is still indistinguishable from “cab” or “cap”.

  32. A Model of the Acoustic Environment • Model for one desired source with background noise: noise listener source wall

  33. Central Limit Theorem • Plot of the Gaussian pdf and the pdf obtained by summing together N Laplacian random variables for several values of N. • As predicted by the central limit theorem, the sum becomes ever more Gaussian with increasing N.

  34. Effects of Reverberation and Noise • Both noise and reverberation have the effect of making speech subband samples more nearly Gaussian. • This begs the question: Can speech be enhanced by restoring its original super-Gaussian characteristics?

  35. TUTORIAL Part IV Speech Feature Enhancement

  36. TUTORIAL Part IV-1 Speech Feature Enhancement Motivation

  37. Speech Recognition SystemOverview • A speech recognition system converts speech to text. Speech Recognition System Speech Text

  38. Speech Recognition SystemOverview • A speech recognition system converts speech to text. • It basically consists of two components: Front End Decoder Speech Text

  39. Speech Recognition SystemOverview • A speech recognition system converts speech to text. • It basically consists of two components: • Front End: extracts speech features from the audio signal Front End Decoder Speech Text

  40. Speech Recognition SystemOverview • A speech recognition system converts speech to text. • It basically consists of two components: • Front End: extracts speech features from the audio signal • Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Speech Text

  41. Speech Feature ExtractionWindowing

  42. Speech Feature ExtractionWindowing

  43. Speech Feature ExtractionWindowing

  44. Speech Feature ExtractionWindowing

  45. Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation

  46. Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation

  47. Speech Feature ExtractionPerceptual Representation • Emulation of the logarithmic frequency and intensity perception of the human auditory system

  48. Background Noise • Background noise distorts speech features • Result: features do not match the features used during training • Consequence: severely degraded recognition performance

  49. Speech Feature EnhancementMotivation • Idea: • train speech recognition system on clean speech • try to map distorted features to clean speech features

  50. TUTORIAL Part II-2 Speech Feature Enhancement The Interaction Function

More Related