Speech Signal Processing I

Speech Signal Processing I By Edmilson Morais And Prof. Greg. Dogil Second Lecture Stuttgart, October 25, 2001

The Speech Signal • No-stacionary signal • Voiced – almost periodic (Concept of pitch) • Unvoiced (aleatory) • Transitions (Bursts, ...) • Range of the Pitch • Male : • Female :

Sampling Theory Low-pass filter Low-pass filter Sample Hold on • X(n) has to be limited in band • The sampling frequency has to be higher or equal to 2 times the maximum frequency in x(n)

Linear Filters Finite impulse response filters

Mean squared error - E Weight Matlab : Graphical visualization – Optimization in a hiperbolic (quadratic) surface

SDSP : Looking through time Speech signal : Analog and digital amplitude quantization Sampling rate time

SDSP : Transformation and Digital filters • Transformations • Z-Transforms, Fourier transforms • Digital filters • FIR, IIR

SDSP – Frame based analysis Waveform multiplied for the hanning window : xw Hanning window : w Magnitude of the spectrum of xw Freq. Response of the LP-filter

Before smoothing Current After smoothing Current Previous Previous SDSP - Looking at frequency components through time

SDSP : Vector quantization Voronoi Space : Centroid and Distortion meassure

. O r i g i n a l S p e e c h S i g n a l C o d i n g x A – LP coeficients e – LP residue En – Prototypes Fo – Fundamental frequency U/UV – Voiced / Unvoiced transitions M a r k s Parametrization : Mapping the waveform into a set of parameters I n v e r s e F i l t e r A L P A n a l y s i s P i t c h M a r k s 1 A ( z ) A ( z ) e M a r k s P r o t o t y p e s S a m p l i n g U / U V F A E n o S t o r a g e E n v i r o m e n t D e c o d i n g F A U / U V E n o T F I R e s i d u e Reconstruction: Synthesis of the waveform from the set of parameters. P r o s o d i c Prosody : F0 Duration Amplitude S y n t h e s i s I n f o r m a t i o n S y n t h e s i s F i l t e r A ( ) z . x S y n t h e s i z e d S p e e c h S i g n a l . TTS - Waveform generation for TTS • Analysis and Resynthesis – Coding and Decoding

TTS - Waveform generation for TTS • Speech coding • Parametric coders, Waveform coders, Hybrid coders • TTS – Concatenative approach • Time scale and Frequency scale modifications • Spectral smoothings • Unit selection Original TTS Original Resynthesized Modified : sin(x+)

ASR - Automatic Speech Recognition • Front-End Signal Processing • Feature extraction • Perceptual domain, Articulatory domain • Acoustic modeling • HMM : Hidden Markov Model • ANN/HMM : Hybrid models - Artificial Neural Network and HMM • Statistical Language Modeling • N-grammars, smoothing techniques • Search : Decoding • Viterbi, Stack decoding, ...

ASR – HMM - Topology Ergotic model Left-right model

ASR – HMM – Basic principle a a a a a a a a a a a a a

5 0 1 0 0 1 5 0 2 0 0 5 0 1 0 0 1 5 0 2 0 0 ( b ) ( a ) 5 0 1 0 0 1 5 0 2 0 0 5 0 1 0 0 1 5 0 2 0 0 ( c ) ( d ) ASR – HMM - Viterbi alignment

ASR – HMM – Forward-Backward

ASR – ANN/HMM

Evaluation : Exercises and Simulations • List of Exercises • SDSP, TTS, ASR • Simulations • SDSP • Vector quantization • TTS • Waveform Interpolation • ASR • Acoustic modeling using : HMM and ANN+HMM • Language modeling • Decoding

Evaluation : Report • Reports • Write the analysis and results of the simulation in a format of a paper • 4 pages, two colunms. • Sections • Abstract • Introduction • Brief theoretical description of the method • Methodology used to perform the experiment • Results • Conclusions and suggestions for further works • Bibliograph

Days of classes

Speech Signal Processing I