190 likes | 226 Views
5- Speech Synthesis. Speech Synthesis Concept Phone Units Phone Sequence To Speech Speech Naturalness Concatenative Approaches Rule-Base Approaches. Speech Synthesis Concept. Text. Speech. Speech. Text to Phone Sequence. Phone Sequence to Speech. Text. Natural Language
E N D
5-Speech Synthesis • Speech Synthesis Concept • Phone Units • Phone Sequence To Speech • Speech Naturalness • Concatenative Approaches • Rule-Base Approaches
Speech Synthesis Concept Text Speech Speech Text to Phone Sequence Phone Sequence to Speech Text Natural Language Processing (NLP) Speech Processing
Phone Units • Paragraph ( ) • Sentence ( ) • Word (Depends on the language. Usually more than 100,000) • Syllable • Diphone & Triphone • Phoneme (Between 10 , 100)
Phone Units (Cont’d) • Diphone : We model Transitions between two phonemes . . . . . p1 p3 p2 p4 p5 Diphone Phoneme
Phone Units (Cont’d) • In farsi we have 30 Phoneme. so we have 30*30 Diphone Theoretically. • Practically the only Diphone that we don’t have in farsi is /zho/ • we have 27000 Triphone Theoretically. But practically we have about 15000 Triphone in farsi.
Phone Units (Cont’d) • Syllable = Onset (Consonant) + Rhyme • Syllable is a set of phonemes that exactly contains one vowel • Syllables in Farsi : CV , CVC , CVCC • We have about 4000 Syllables in farsi • Syllables in English :V, CV , CVC ,CCVC, CCVCC, CCCVC, CCCVCC, . . . • Number of Syllables in English is very much
Phone Sequence To Speech • Concatenative Approaches : Trade-Off between Naturality And Memory usage and function amount • Rule-Based Approaches : The most important Rule-Based approach is Klatt method
Phone Sequence To Speech (Cont’d) Phone Sequence to primitive utterance primitive utterance to Natural Speech Text to Phone Sequence Speech Text NLP Speech Processing
Speech Naturalness • Obviation of undesirable noise and distortion and dissociation from speech • Prosody generation • Speech energy • Duration • Intonation • Stress
Speech Naturalness (Cont’d) • Intonation and Stress are very effective in speech naturalness • Intonation : Variation of Pitch frequency along speaking • Stress : Increasing the pitch frequency in a specific time
Concatenative Approaches • In this approaches we store units of natural speech for reconstruction of desired speech • We could select the appropriate phone unit for speech synthesis • we can store compressed parameters instead of main waveform
Concatenative Approaches (Cont’d) • Benefits of storing compressed parameters instead of main waveform • Less memory use • General state instead of a specific storedutterance • Generating prosody easily
Concatenative Approaches (Cont’d) Type of Storing Phone Unit Paragraph Sentence Word Syllable Diphone Phoneme Main Waveform Main Waveform Main Waveform Coded/Main Waveform Coded Waveform Coded Waveform
Concatenative Approaches (Cont’d) • Pitch Synchronous Overlap-Add-Method (PSOLA) is a famous method in phoneme transmit smoothing • Overlap-Add-Method is a standard DSP method • PSOLA is a base action for Voice Conversion. • In this method in analysis stage we select frames that are synchronous by pitch markers.
Rule-Base Approach Stages • Determine the speech model and model parameters • Determine type of phone units • Determine some parameter amount for each phone unit • Substitute sequence of phone units by its equivalent parameter sequence • Put parameter sequence in speech model
THE KLSYN88 CASCADE PARALLEL FORMANT SYNTHESIZER FNP FNZ FTP FTZ F1 B1 BNP BNZ BTP BTZ DF1 DB1 F2 B2 F3 B3 F4 B4 F5 B5 GLOTTAL SOUND SOURCES TL CASCADE VOCAL TRACT MODEL LARYNGEAL SOUND SOURCES F0 AV OO FL DI SS CP + AH ANV SO A1V + + - + - - B2F A2F A2V + B3F A3F AF A3V B4F A4F A4V B5F + - + - + - A5F ATV B6F F6 A6F PARALLEL VOCAL TRACT MODEL LYRYNGEAL SOUND SOURCES (NORMALLY NOT USED) AB BYPASS PATH PARALLEL VOCAL TRACT MODEL FRICATION SOUND SOURCES
Three Voicing Source Model In KLATT 88 • The old KLSYN impulsive source • The KLGLOTT88 model • The modified LF model