1.82k likes | 2.03k Views
SCALE Workshop, January 2010. A Tutorial on Bayesian Speech Feature Enhancement. Friedrich Faubel. I. Motivation. Speech Recognition System Overview. A speech recognition system converts speech to text. It basically consists of two components:
E N D
SCALE Workshop, January 2010 A Tutorial on Bayesian Speech Feature Enhancement Friedrich Faubel
I Motivation
Speech Recognition SystemOverview • A speech recognition system converts speech to text. It basically consists of two components: • Front End: extracts speech features from the audio signal • Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Text Speech
Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation
Speech Feature ExtractionTime Frequency Analysis • Performing spectral analysis separately for each frame yields a time-frequency representation
Speech Feature ExtractionPerceptual Representation • Emulation of the logarithmic frequency and intensity perception of the human auditory system
Background Noise • Background noise distorts speech features • Result: features don’t match the features used during training • Consequence: severely degraded recognition performance
Overview of the Tutorial I - Motivation II - The effect of noise to speech features III - Transforming probabilities IV - The MMSE solution to speech feature enhancement V - Model-based speech feature enhancement VI - Experimental results VII - Extensions
II Interaction Function The Effect of Noise
Interaction Function • Principle of Superposition: signals are additive noise clean speech noisy speech = +
Interaction Function • In the signal domain we have the following relationship: noisy speech noise clean speech
Interaction Function • In the signal domain we have the following relationship:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:
Interaction Function • Taking the magnitude square on both sides, we get:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: phase term
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase
Interaction Function • The relative phase between two waves describes their relative offset in time (delay) time relative phase
Interaction Function • When 2 sound sources are present the following can happen: = = amplification amplification = = cancellation attenuation
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: zero in average
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: Acero, 1990
Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: But is that really right?
Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform
Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform
Interaction Function • Phase-averaged relationship between clean and noisy speech:
III Transforming Probabilities