5- Speech Synthesis

5-Speech Synthesis • Speech Synthesis Concept • Phone Units • Phone Sequence To Speech • Speech Naturalness • Concatenative Approaches • Rule-Base Approaches

Speech Synthesis Concept Text Speech Speech Text to Phone Sequence Phone Sequence to Speech Text Natural Language Processing (NLP) Speech Processing

Phone Units • Paragraph ( ) • Sentence ( ) • Word (Depends on the language. Usually more than 100,000) • Syllable • Diphone & Triphone • Phoneme (Between 10 , 100)

Phone Units (Cont’d) • Diphone : We model Transitions between two phonemes . . . . . p1 p3 p2 p4 p5 Diphone Phoneme

Phone Units (Cont’d) • In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically. • Practically the only Diphone that we don’t have in farsi is /zho/ • we have 27000 Triphone Theoretically. But practically we have about 15000 Triphone in farsi.

Phone Units (Cont’d) • Syllable = Onset (Consonant) + Rhyme • Syllable is a set of phonemes that exactly contains one vowel • Syllables in Farsi : CV , CVC , CVCC • We have about 4000 Syllables in farsi • Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . . • Number of Syllables in English is very much

Phone Sequence To Speech • Concatenative Approaches : Trade-Off between Naturality And Memory usage and function amount • Rule-Based Approaches : The most important Rule-Based approach is Klatt method

Phone Sequence To Speech (Cont’d) Phone Sequence to primitive utterance primitive utterance to Natural Speech Text to Phone Sequence Speech Text NLP Speech Processing

Speech Naturalness • Obviation of undesirable noise and distortion and dissociation from speech • Prosody generation • Speech energy • Duration • Intonation • Stress

Speech Naturalness (Cont’d) • Intonation and Stress are very effective in speech naturalness • Intonation : Variation of Pitch frequency along speaking • Stress : Increasing the pitch frequency in a specific time

Concatenative Approaches • In this approaches we store units of natural speech for reconstruction of desired speech • We could select the appropriate phone unit for speech synthesis • we can store compressed parameters instead of main waveform

Concatenative Approaches (Cont’d) • Benefits of storing compressed parameters instead of main waveform • Less memory use • General state instead of a specific storedutterance • Generating prosody easily

Concatenative Approaches (Cont’d) Type of Storing Phone Unit Paragraph Sentence Word Syllable Diphone Phoneme Main Waveform Main Waveform Main Waveform Coded/Main Waveform Coded Waveform Coded Waveform

Concatenative Approaches (Cont’d) • Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing • Overlap-Add-Method is a standard DSP method • PSOLA is a base action for Voice Conversion. • In this method in analysis stage we select frames that are synchronous by pitch markers.

Rule-Base Approach Stages • Determine the speech model and model parameters • Determine type of phone units • Determine some parameter amount for each phone unit • Substitute sequence of phone units by its equivalent parameter sequence • Put parameter sequence in speech model

KLATT 80 Model

KLATT 88 Model

THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER FNP FNZ FTP FTZ F1 B1 BNP BNZ BTP BTZ DF1 DB1 F2 B2 F3 B3 F4 B4 F5 B5 GLOTTAL SOUND SOURCES TL CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES F0 AV OO FL DI SS CP + AH ANV SO A1V + + - + - - B2F A2F A2V + B3F A3F AF A3V B4F A4F A4V B5F + - + - + - A5F ATV B6F F6 A6F PARALLEL VOCAL TRACT MODEL LYRYNGEAL SOUND SOURCES (NORMALLY NOT USED) AB BYPASS PATH PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES

Three Voicing Source Model In KLATT 88 • The old KLSYN impulsive source • The KLGLOTT88 model • The modified LF model

5- Speech Synthesis

5- Speech Synthesis

Presentation Transcript

TEXT TO SPEECH SYNTHESIS

Speech synthesis

Speech Processing Text to Speech Synthesis

6-Text To Speech (TTS) Speech Synthesis

FLST: Text-to-Speech Synthesis

Speech Synthesis

5-Text To Speech (TTS) Speech Synthesis

Slovak synthesis speech quality evaluation

Speech Synthesis

Perspectives for Articulatory Speech Synthesis

Speech Synthesis Technology

Speech Synthesis

Visible Speech Synthesis

4. Speech Synthesis

Vergina: A Modern Greek Speech Database for Speech Synthesis

Text-to-speech Synthesis

Database Driven Speech Synthesis Systems

Text-To-Speech Synthesis

5- Speech Synthesis