1.12k likes | 1.34k Views
Synthetic Speech and the Human Behavior it Models. H. Timothy Bunnell, Ph.D. Nemours Biomedical Research Center for Pediatric Auditory and Speech Sciences. Overview. Acoustics Speech Production & Acoustic Phonetics Speech Synthesis Articulatory Rule-based Acoustic Rule-based Data-based
E N D
Synthetic Speech and the Human Behavior it Models H. Timothy Bunnell, Ph.D.Nemours Biomedical ResearchCenter for Pediatric Auditory and Speech Sciences
Overview • Acoustics • Speech Production & Acoustic Phonetics • Speech Synthesis • Articulatory Rule-based • Acoustic Rule-based • Data-based • Concatenative • Parametric Bunnell JHU
Physical Acoustics: Time-domain • Simple waveforms • Amplitude, Frequency, and Phase • Physical versus perceptual characteristics • Complex waveforms • Periodic and aperiodic waveforms Bunnell JHU
Simple Waveform For all integer i: a = Amplitudeq = Phasef = Frequencyfs = Sampling Rate Bunnell JHU
Simple Waveforms 175 Hz 225 Hz 275 Hz 325 Hz 375 Hz 425 Hz 475 Hz Bunnell JHU
Complex waveforms For all integer i: For all integer k | (0 < k < K): ak = Amplitude of kth componentqk = Phase of kth componentfk = Frequency of kth component fs = Sampling Rate Bunnell JHU
Complex Waveforms Bunnell JHU
Aperiodic Waveforms Impulse - only non-zero for one instant. Random - Amplitude variations are without temporal structure. Bunnell JHU
Physical Acoustics: Frequency-domain • Line spectra • Represent periodic signals • One or more sinusoids that are harmonically related • Harmonics can appear only at frequencies that are integer multiples of the fundamental (usually lowest) frequency. • Continuous spectra • Represent aperiodic signals • Energy present (potentially) at any frequency, not just harmonically related frequencies. Bunnell JHU
Line Spectra Bunnell JHU
Continuous Spectra Spectrum of impulse is all frequencies at equal amplitude. Spectrum of random signal is all frequencies at random amplitudes. Bunnell JHU
Quasiperiodicity Log magnitude of a square wave Log magnitude of a jittery square wave Bunnell JHU
Physical Acoustics: Sound shaping • Sources • Generators of sound energy • Periodic or aperiodic • possibly complex temporal/spectral structure • Filters • Low pass • High pass • Band pass • Resonance • Pendulums • Tubes • Frequency domain properties Bunnell JHU
Filters Low pass High pass Band pass Amplitude (dB) 0 0 0 -3 -3 -3 Frequency Bandwidth Bunnell JHU
Speech Acoustics • Vocal Tract Structure • Source-Filter Theory • Speech Source characteristics • Frication • Phonation • Other • Speech Filter characteristics • Formants • Vocal tract area functions Bunnell JHU
Anatomy Bunnell JHU
Source-Filter Theory • First complete treatment was Fant (1960). • Model speech production in terms of source signals exciting a resonating tube of complex shape. • Sources are periodic pulse trains or random noise. • The tube is modeled as a cascade of resonators, each output is input to next. Bunnell JHU
Glottal source spectrum • 100 Hz voicing • 6 dB/Octave roll off • Vocal tract response • Peaks are resonances • Shape due to location and bandwidth of resonances • Radiated speech spectrum • Source harmonics • Source slope • VT resonances • Peaks are formants Bunnell JHU
Frication Source • Generated by jet of air hitting a baffle • Location is typically oral cavity • Spectrum of source is broad band and random • Frication versus Aspiration • Frication due to constriction above glottis • Aspiration due to turbulence at glottis Bunnell JHU
Voicing Source • Vibration of Vocal Folds • Muscle tension holds vocal folds closed • Pressure difference between air in lungs and air in vocal tract drives vibration: • pressure difference overcomes muscle tension & folds open • pressure difference drops so muscle tension overcomes pressure and folds close. Bunnell JHU
Voicing waveform VolumeVelocity Pressure 10 msec Time Bunnell JHU
Other Sources • Mixed • Voiced fricatives • Aspiration + voicing • Aperiodic “voicing” • Clicks Bunnell JHU
Uniform Tube Resonances Bunnell JHU
Tube Constriction Effects Bunnell JHU
Tube Constriction Effects Bunnell JHU
Tube Constriction Effects Bunnell JHU
Tube Constriction Effects Bunnell JHU
High Front Constriction Bunnell JHU
Low Back Constriction Bunnell JHU
High Back + Lip Rounding Bunnell JHU
Acoustic Phonetics • Display formats for acoustic phonetic information • Vowel classification and features • Consonant classification and features • Coarticulation • Suprasegmentals Bunnell JHU
Representation of Acoustic Phonetic Information • Waveform and Spectrogram displays • Auxiliary signals and information Bunnell JHU
Waveform of /ɔ/ with EGG signal Bunnell JHU
/ɔ/ Waveform, EGG, and Spectrogram Bunnell JHU
Waveform & Wideband Spectrogram Bunnell JHU
Waveform and Narrowband Spectrogram Bunnell JHU
Voiced Waveform Detail Bunnell JHU
Unvoiced Waveform Detail Bunnell JHU
Cross Section (unvoiced) Bunnell JHU
Cross Section (voiced) Bunnell JHU
Vowel Classification • High - Low dimension • Approximately how constricted the VT is. • Corresponds (inversely) to frequency of F1 • Front - Back dimension • Whether the constriction is toward the front or back of the oral cavity • Corresponds to Frequency of F2 • Rhotic • Indicates /r/ coloring of vowel • Corresponds to frequency of F3 Bunnell JHU
The Vowel Space Bunnell JHU
Consonant Classification • Voicing distinction • Voiced versus voiceless • Adducted versus Abducted vocal folds • Manner distinction • Degree and type of constriction • Place distinction • Location of constriction Bunnell JHU
Consonant Manner by Place Bunnell JHU
Additional manners & places • Glottal - stop /ʔ/ or approximant /h/ • In all, Ladefoged (1993) identifies 10 manners and 11 places of articulation needed to account for all languages. Bunnell JHU
Phonemes & Allophones • A Phoneme may vary in structure from one context to another. • /t/ in syllable initial position is aspirated th • /t/ in some syllable final contexts is voiced t̬ • /t/ preceding some segments is dental t̪ • /t/ before front vowels has different burst spectrum than before back rounded vowels Bunnell JHU
Aspirated [p] Bunnell JHU