1 / 109

Synthetic Speech and the Human Behavior it Models

Synthetic Speech and the Human Behavior it Models. H. Timothy Bunnell, Ph.D. Nemours Biomedical Research Center for Pediatric Auditory and Speech Sciences. Overview. Acoustics Speech Production & Acoustic Phonetics Speech Synthesis Articulatory Rule-based Acoustic Rule-based Data-based

dyanne
Download Presentation

Synthetic Speech and the Human Behavior it Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthetic Speech and the Human Behavior it Models H. Timothy Bunnell, Ph.D.Nemours Biomedical ResearchCenter for Pediatric Auditory and Speech Sciences

  2. Overview • Acoustics • Speech Production & Acoustic Phonetics • Speech Synthesis • Articulatory Rule-based • Acoustic Rule-based • Data-based • Concatenative • Parametric Bunnell JHU

  3. Physical Acoustics: Time-domain • Simple waveforms • Amplitude, Frequency, and Phase • Physical versus perceptual characteristics • Complex waveforms • Periodic and aperiodic waveforms Bunnell JHU

  4. Simple Waveform For all integer i: a = Amplitudeq = Phasef = Frequencyfs = Sampling Rate Bunnell JHU

  5. Digitized Waveform

  6. Aliasing

  7. Sample resolution

  8. Simple Waveforms 175 Hz 225 Hz 275 Hz 325 Hz 375 Hz 425 Hz 475 Hz Bunnell JHU

  9. Complex waveforms For all integer i: For all integer k | (0 < k < K): ak = Amplitude of kth componentqk = Phase of kth componentfk = Frequency of kth component fs = Sampling Rate Bunnell JHU

  10. Complex Waveforms Bunnell JHU

  11. Aperiodic Waveforms Impulse - only non-zero for one instant. Random - Amplitude variations are without temporal structure. Bunnell JHU

  12. Physical Acoustics: Frequency-domain • Line spectra • Represent periodic signals • One or more sinusoids that are harmonically related • Harmonics can appear only at frequencies that are integer multiples of the fundamental (usually lowest) frequency. • Continuous spectra • Represent aperiodic signals • Energy present (potentially) at any frequency, not just harmonically related frequencies. Bunnell JHU

  13. Line Spectra Bunnell JHU

  14. Continuous Spectra Spectrum of impulse is all frequencies at equal amplitude. Spectrum of random signal is all frequencies at random amplitudes. Bunnell JHU

  15. Quasiperiodicity Log magnitude of a square wave Log magnitude of a jittery square wave Bunnell JHU

  16. Physical Acoustics: Sound shaping • Sources • Generators of sound energy • Periodic or aperiodic • possibly complex temporal/spectral structure • Filters • Low pass • High pass • Band pass • Resonance • Pendulums • Tubes • Frequency domain properties Bunnell JHU

  17. Filters Low pass High pass Band pass Amplitude (dB) 0 0 0 -3 -3 -3 Frequency Bandwidth Bunnell JHU

  18. Speech Acoustics • Vocal Tract Structure • Source-Filter Theory • Speech Source characteristics • Frication • Phonation • Other • Speech Filter characteristics • Formants • Vocal tract area functions Bunnell JHU

  19. Anatomy Bunnell JHU

  20. Source-Filter Theory • First complete treatment was Fant (1960). • Model speech production in terms of source signals exciting a resonating tube of complex shape. • Sources are periodic pulse trains or random noise. • The tube is modeled as a cascade of resonators, each output is input to next. Bunnell JHU

  21. Glottal source spectrum • 100 Hz voicing • 6 dB/Octave roll off • Vocal tract response • Peaks are resonances • Shape due to location and bandwidth of resonances • Radiated speech spectrum • Source harmonics • Source slope • VT resonances • Peaks are formants Bunnell JHU

  22. Frication Source • Generated by jet of air hitting a baffle • Location is typically oral cavity • Spectrum of source is broad band and random • Frication versus Aspiration • Frication due to constriction above glottis • Aspiration due to turbulence at glottis Bunnell JHU

  23. Voicing Source • Vibration of Vocal Folds • Muscle tension holds vocal folds closed • Pressure difference between air in lungs and air in vocal tract drives vibration: • pressure difference overcomes muscle tension & folds open • pressure difference drops so muscle tension overcomes pressure and folds close. Bunnell JHU

  24. Voicing waveform VolumeVelocity Pressure 10 msec Time Bunnell JHU

  25. Other Sources • Mixed • Voiced fricatives • Aspiration + voicing • Aperiodic “voicing” • Clicks Bunnell JHU

  26. Uniform Tube Resonances Bunnell JHU

  27. Tube Constriction Effects Bunnell JHU

  28. Tube Constriction Effects Bunnell JHU

  29. Tube Constriction Effects Bunnell JHU

  30. Tube Constriction Effects Bunnell JHU

  31. High Front Constriction Bunnell JHU

  32. Low Back Constriction Bunnell JHU

  33. High Back + Lip Rounding Bunnell JHU

  34. Acoustic Phonetics • Display formats for acoustic phonetic information • Vowel classification and features • Consonant classification and features • Coarticulation • Suprasegmentals Bunnell JHU

  35. Representation of Acoustic Phonetic Information • Waveform and Spectrogram displays • Auxiliary signals and information Bunnell JHU

  36. Waveform of /ɔ/ with EGG signal Bunnell JHU

  37. /ɔ/ Waveform, EGG, and Spectrogram Bunnell JHU

  38. Waveform & Wideband Spectrogram Bunnell JHU

  39. Waveform and Narrowband Spectrogram Bunnell JHU

  40. Voiced Waveform Detail Bunnell JHU

  41. Unvoiced Waveform Detail Bunnell JHU

  42. Cross Section (unvoiced) Bunnell JHU

  43. Cross Section (voiced) Bunnell JHU

  44. Vowel Classification • High - Low dimension • Approximately how constricted the VT is. • Corresponds (inversely) to frequency of F1 • Front - Back dimension • Whether the constriction is toward the front or back of the oral cavity • Corresponds to Frequency of F2 • Rhotic • Indicates /r/ coloring of vowel • Corresponds to frequency of F3 Bunnell JHU

  45. The Vowel Space Bunnell JHU

  46. Consonant Classification • Voicing distinction • Voiced versus voiceless • Adducted versus Abducted vocal folds • Manner distinction • Degree and type of constriction • Place distinction • Location of constriction Bunnell JHU

  47. Consonant Manner by Place Bunnell JHU

  48. Additional manners & places • Glottal - stop /ʔ/ or approximant /h/ • In all, Ladefoged (1993) identifies 10 manners and 11 places of articulation needed to account for all languages. Bunnell JHU

  49. Phonemes & Allophones • A Phoneme may vary in structure from one context to another. • /t/ in syllable initial position is aspirated th • /t/ in some syllable final contexts is voiced t̬ • /t/ preceding some segments is dental t̪ • /t/ before front vowels has different burst spectrum than before back rounded vowels Bunnell JHU

  50. Aspirated [p] Bunnell JHU

More Related