270 likes | 408 Views
SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03. Speech units • Sentences & phrases • Words • Syllables • Phonemes • Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features • Intensity variation • Pitch variation Phonemic features
E N D
SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03
Speech units • Sentences & phrases • Words • Syllables • Phonemes • Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features • Intensity variation • Pitch variation Phonemic features • Articulatory • Acoustic • Perceptual
Classification of phonemes Vowels • Pure vowels • Diphthongs Consonants • Semivowels • Whisper • Stops • Nasals • Fricatives • Affricates
Speech synthesis Generation of speech by a machine Applications • Voice response systems (limited vocabulary) • Text-to-speech synthesis (unlimited vocabulary) • Analysis-by-synthesis (speech research) • Generation of speech-like test signals • Analysis-synthesis systems * channel capacity reduction * secure commn. * speech enhancement * voice transformation * processing for hearing aids
Development of speech synthesizers • Mechanical / electro-mechanical (1760-1930) • Electronic analog with key-board input (1930’s) • Electronic analog analysis-synthesis systems (1930-50) • Digital synthesizer (1950 ..) * software based * hardware based
Mechanical synthesizers Von Kempelen, 1780 Wheatstone’s speaking machine
Dudley, 1930s: Voder Electronic analog synthesizer with mechanical keyboard
Modern synthesis approaches Waveform based • high quality natural output • limited vocabulary • large storage requirement Speech model based • unlimited speech synthesis with small storage • difficulty in parameter generation & concatenation Text-to-speech synthesis • Text pre-processing & phonetic transcription • Parsing for syntactic & semantic structure Prosodic information & Sound units • Speech waveform generation
Speech model based approaches • Articulatory • Source-filter * channel vocoder * LPC vocoder * homomorphic vocoder * formant-based synthesizer • Acoustic * phase vocoder * sinusoidal model * harmonic plus noise model (HNM)
HARMONIC PLUS NOISE MODEL (Stylianou, 1995; 2001) Speech signal divided into: • harmonic part • noise part Harmonic part Noise part Parameters: • Harmonic amplitudes and phases • max. voiced frequency • V/UV & pitch • noise parameters
SEGMENT CONCATENATION For generation of longer units from smaller ones. Steps: 1) Parsing of phonetic transcript 2) Fetching the parameters of required units 3) Pitch and intensity modifications for prosody 4) Smoothening of the parameter tracts at unit boundaries 5) Interpolation of the parameters over the frame length from end point values 6) Synthesis
RESULTS • All VCV syllables and vowels natural & intelligible if synthesized using harmonic part only, except /a∫a/ and /asa/ • HNM preserve the styles (anger, high articulatory rate) Synthesized /a∫a/ Synthesized /asa/
RESULTS (continued) GCIs from glottal signal give better synthesis. Pitch contours for "/ap kΛhœn ja rΛhE hœn/" From glottal signal From speech (Childers and Hu’s, 1994)
RESULTS (continued) Good quality of the larger units constructed from prarameters of the smaller units. Recorded /ΛbhImani/ Synthesized from /ΛbhI/, /Ima/, /ani/
Further developments • High quality multilingual / multi-dialect text-to-speech synthesis • Voice transformations • Processing for aids for the hearing impaired