1 / 27

SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03

SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03. Speech units • Sentences & phrases • Words • Syllables • Phonemes • Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features • Intensity variation • Pitch variation Phonemic features

sophiea
Download Presentation

SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03

  2. Speech units • Sentences & phrases • Words • Syllables • Phonemes • Subphonemic acoustic segments Speech features Prosodic (suprasegmental) features • Intensity variation • Pitch variation Phonemic features • Articulatory • Acoustic • Perceptual

  3. Classification of phonemes Vowels • Pure vowels • Diphthongs Consonants • Semivowels • Whisper • Stops • Nasals • Fricatives • Affricates

  4. Speech production system

  5. Schematic of speech production

  6. Vovel spectrum

  7. Speech synthesis Generation of speech by a machine Applications • Voice response systems (limited vocabulary) • Text-to-speech synthesis (unlimited vocabulary) • Analysis-by-synthesis (speech research) • Generation of speech-like test signals • Analysis-synthesis systems * channel capacity reduction * secure commn. * speech enhancement * voice transformation * processing for hearing aids

  8. Development of speech synthesizers • Mechanical / electro-mechanical (1760-1930) • Electronic analog with key-board input (1930’s) • Electronic analog analysis-synthesis systems (1930-50) • Digital synthesizer (1950 ..) * software based * hardware based

  9. Mechanical synthesizers Von Kempelen, 1780 Wheatstone’s speaking machine

  10. Riesz, 1930’s: Speaking machine

  11. Dudley, 1930s: Voder Electronic analog synthesizer with mechanical keyboard

  12. Fant, 1950s: OVE

  13. Holmes, 1960s: Parallel formant synth.

  14. Klatt, 1970s: Cascade/parallel formant synth.

  15. Modern synthesis approaches Waveform based • high quality natural output • limited vocabulary • large storage requirement Speech model based • unlimited speech synthesis with small storage • difficulty in parameter generation & concatenation Text-to-speech synthesis • Text pre-processing & phonetic transcription • Parsing for syntactic & semantic structure Prosodic information & Sound units • Speech waveform generation

  16. Speech model based approaches • Articulatory • Source-filter * channel vocoder * LPC vocoder * homomorphic vocoder * formant-based synthesizer • Acoustic * phase vocoder * sinusoidal model * harmonic plus noise model (HNM)

  17. HARMONIC PLUS NOISE MODEL (Stylianou, 1995; 2001) Speech signal divided into: • harmonic part • noise part Harmonic part Noise part Parameters: • Harmonic amplitudes and phases • max. voiced frequency • V/UV & pitch • noise parameters

  18. IMPLEMENTATION OF HNM

  19. ANALYSIS

  20. SYNTHESIS

  21. SEGMENT CONCATENATION For generation of longer units from smaller ones. Steps: 1) Parsing of phonetic transcript 2) Fetching the parameters of required units 3) Pitch and intensity modifications for prosody 4) Smoothening of the parameter tracts at unit boundaries 5) Interpolation of the parameters over the frame length from end point values 6) Synthesis

  22. RESULTS • All VCV syllables and vowels natural & intelligible if synthesized using harmonic part only, except /a∫a/ and /asa/ • HNM preserve the styles (anger, high articulatory rate) Synthesized /a∫a/ Synthesized /asa/

  23. RESULTS (continued) GCIs from glottal signal give better synthesis. Pitch contours for "/ap kΛhœn ja rΛhE hœn/" From glottal signal From speech (Childers and Hu’s, 1994)

  24. RESULTS (continued) Good quality of the larger units constructed from prarameters of the smaller units. Recorded /ΛbhImani/ Synthesized from /ΛbhI/, /Ima/, /ani/

  25. DEMONSTRATIONS

  26. Further developments • High quality multilingual / multi-dialect text-to-speech synthesis • Voice transformations • Processing for aids for the hearing impaired

  27. THANKS

More Related