Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model Didier Cadic1, engineering studentsupervised by Olivier Cappé1, Maurice Charbit1, Gérard Chollet1, Eric Moulines1 (presented here by Guido Aversano1,2) 2IIASS, Vietri sul Mare (SA), Italy 1Département TSI, ENST, Paris, France

Plan of the presentation • Text-to-speech: classic methods • HNM model • Analysis • Synthesis • Analysis-Synthesis examples • Conclusions

Text-To-Speech by concatenation Examples realized on the AT&T web site: English, male English, female (vocal server example) English, female (another vocal server example) German, male French, female

Text-To-Speech by concatenation 2 major challenges : • smooth connection between acoustic units • flexible prosody

Analysis : TD-PSOLA method • Pitch estimation • Pitch-synchronous windowing Synthesis : • Rearrangement of frames

TD-PSOLA method Some very good-quality results: • Time-scaling Singing, original Singing, modified • Pitch-shifting Cello, original Cello, modified

Artifacts appearing in non-voiced sounds: TD-PSOLA method "ss", original "ss", slowed down (classic method) "ss", slowed down (improved) "rain", original "rain", 0.5 rate

Phase Vocoder method Intuitive description: Compression/stretchingof (narrow-band) spectrogram’s time-frequency scales… time-scaling pitch-shifting

Main problem : Phase Vocoder method • phase coherence is lost in the synthesized signal Examples : "rain", male voice Slow-motion by Vocoder (PSOLA : ) "The quick fox …", female voice Slow-motion by Vocoder

We need a parametric model • TD-PSOLA and Vocoder allow basic prosodic modifications. • The problem of unit concatenation for TTS is not solved. • Other kinds of modifications (timbre, denoising, …) should be considered.

Harmonic plus Noise Model (HNM) • Main assumption : • stationary segments of a speech signal can be always seen as the superposition of a periodic and a noisy part

HNM Model Modelling : = + S(t) H(t) B(t) where : H(t) =  Ak cos ( 2 k f0 t + k ) and B(t) = white noise passed through an AR filter

HNM analysis of a frame • Pitch estimation  Spectral comb method

HNM analysis of a frame • Pitch estimation "aka…aga" • Good results are obtained • In some cases the method erroneously returns f0/2 • Possibility of tracking…

min s(t) – H(t) 2 ak, bk HNM analysis of a frame • Harmonic part: extraction of amplitudes  Least squares method H(t) = akcos ( 2k f0 t ) + bksin ( 2k f0 t )

HNM analysis of a frame • Extraction of amplitudes Problem: the noisy part gives anon-null contribution to the spectral power • Gain correction for the harmonics(using an euristic formula g(DV), where DV is the estimated voicing degree)

HNM analysis of a frame • Extraction of amplitudes  Residual: R(t) = s(t) - H(t)

HNM analysis of a frame • Extraction of amplitudes  Possibility of improving harmonic estimation

1 a0 + a1 z-1 + … + aN z-N HNM analysis of a frame AR filter estimation for the residual: R(t) = Bg F(t) where Bg = gaussian white noise and F(t) = AR filter, F(z) =  Linear prediction method

. k(ta) = 2k f0(ta) is known by pitch analysis HNM Synthesis • Interpolation for each harmonic between two succesive frames H(t) = ak(t)cos ( 2k f0(t)t ) + bk(t)sin ( 2k f0(t)t ) = =  Ak(t)cos k(t) Ak(ta) and k(ta) are known at analysis instants ta

HNM Synthesis Erroneous pitch (usually f0/2) • harmonic correspondence problem is solved introducing fictitious harmonics

Linear interpolation Unwrapping + cubic interpolation HNM Synthesis Ak cos k(t) 

HNM Synthesis Noisy part • Generation of normally distributed random numbers • AR filtering (abrupt changes of coefficients between 2 windows have no incidence…)

original original original original original original "wazi" : a-e-i-o-u : Tuba : "Carottes" : singing : "Lawyer" : synthesized synthesized synthesized synthesized synthesized synthesized HNM Synthesis Results

original original original original "coiffe" : Discours : "aka aga" : Andie : synthesized synthesized synthesized synthesized HNM Synthesis Results original synthesized Dussolier : noisy part

Synthesis with time-stretching Synthesis instants (ts)  Analysis instants (ta) The following parameters remain unchanged: • Noisy part parameters • The pitch • The amplitudes Ak of the harmonics

Synthesis with time-stretching Phase adaptation • Simple phase trajectories resampling or • "harmonic" rephasing original a-e-i-o-u : slow-motion with phase "stretching" slow-motion with "harmonic" rephasing

Final results Synthesized with rate : Original 1 0.4 0.5 0.6 0.7 0.8 1.2 1.5 2 "carottes" : "lawyer" : tuba : "wazi" : singing : "a-e-i-o-u" : Dussolier : Discours : Andie : "aka aga": "coiffe" :

Conclusions • Good results, showing method’s potential for different applications including TTS • Future work will include other kinds of modifications (pitch shifting, timbre etc.)

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

Implementation of a speech Analysis-Synthesis Toolbox using Harmonic plus Noise Model

Presentation Transcript

Speech synthesis

SPEECH PRODUCTION,RECOGNITION, ANALYSIS, AND SYNTHESIS

TAMIL WORDS SPEECH SYNTHESIS IN COCHLEAR IMPLANT USING ACOUSTIC MODEL

3. SPEECH RECOGNITION, ANALYSIS, AND SYNTHESIS

3. SPEECH RECOGNITION, ANALYSIS, AND SYNTHESIS

Speech Synthesis

HARMONIC ANALYSIS OF BV

ICSCI 2004, Hyderabad, India, 12-15 Feb’ 04 USE OF HARMONIC PLUS NOISE MODEL

Combined Gesture-Speech Analysis and Synthesis

Speech Synthesis

Improved ASR in noise using harmonic decomposition

Combined Gesture-Speech Analysis and Synthesis

Speech enhancement in nonstationary noise environments using noise properties

Harmonic Analysis

Implementation of a noise subtraction algorithm using Verilog HDL

Speech Synthesis

Analysis and Synthesis of Shouted Speech

Evaluation of an Advanced Harmonic Filter for Adjustable Speed Drives using a Toolbox Approach

Harmonic Analysis

Harmonic Analysis