80 likes | 210 Views
Listening to Normalized Speech Mimicking the Normalization Processes of Automatic Speech Recognition. Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be. ASR Preprocessing. signal. Fourier Transform. Magnitude (Spectrogram). Phase Spectrum. pitch removal.
E N D
Listening to Normalized Speech Mimicking the Normalization Processesof Automatic Speech Recognition Dirk Van Compernolle Kris Demuynck, Oscar Garcia compi@esat.kuleuven.be Katholieke Universiteit Leuven – Dept. ESAT Kasteelpark Arenberg 10, 3001 Heverlee, Belgium www.esat.kuleuven.be/~spch
ASR Preprocessing signal Fourier Transform Magnitude (Spectrogram) Phase Spectrum pitch removal Envelope (cepstra) Excitation (pitch) speaker normalization normalized cepstra normalized pitch to ASR Normalized Speech 2
Speech Normalization normalized signal original signal Magnitude Spectrum Phase Spectrum Magnitude Spectrum Phase Spectrum enhanced spectrum Envelope (spectrum) Excitation (pitch) normalized spectrum normalized excitation Griffin & Jim, 1984 Normalized Speech 4
Speech Normalization - Ingredients • Spectral normalization • concept: remove vocal tract length effect • method: utterance based VTLN by linear frequency warping • Pitch normalization • concept: remove pitch effect • method: scale utterance based average and variance to global cross-speaker averages • Phase resynthesis • concept: exploit redundancy in over-sampled spectral envelope • method: iterative algorithm (Griffin & Jim, 1984) Normalized Speech 5
original normalized Normalized Speech 6
original normalized
original inverted