Sonorant Grab Bag

Sonorant Grab Bag March 27, 2014

Speech Synthesis:A Basic Overview • Speech synthesis is the generation of speech by machine. • The reasons for studying synthetic speech have evolved over the years: • Novelty • To control acoustic cues in perceptual studies • To understand the human articulatory system • “Analysis by Synthesis” • Practical applications • Reading machines for the blind, navigation systems

Speech Synthesis:A Basic Overview • There are four basic types of synthetic speech: • Mechanical synthesis • Formant synthesis • Based on Source/Filter theory • Concatenative synthesis • = stringing bits and pieces of natural speech together • Articulatory synthesis • = generating speech from a model of the vocal tract.

1. Mechanical Synthesis • The very first attempts to produce synthetic speech were made without electricity. • = mechanical synthesis • In the late 1700s, models were produced which used: • reeds as a voicing source • differently shaped tubes for different vowels

Mechanical Synthesis, part II • Later, Wolfgang von Kempelen and Charles Wheatstone created a more sophisticated mechanical speech device… • with independently manipulable source and filter mechanisms.

Mechanical Synthesis, part III • An interesting historical footnote: • Alexander Graham Bell and his “questionable” experiments with his dog. • Mechanical synthesis has largely gone out of style ever since. • …but check out Mike Brady’s talking robot.

The Voder • The next big step in speech synthesis was to generate speech electronically. • This was most famously demonstrated at the New York World’s Fair in 1939 with the Voder. • The Voder was a manually controlled speech synthesizer. • (operated by highly trained young women)

Voder Principles • The Voder basically operated like a vocoder. • Voicing and fricative source sounds were filtered by 10 different resonators… • each controlled by an individual finger! • Only about 1 in 10 had the ability to learn how to play the Voder.

Overtone Singing • F0 stays the same (on a “drone”), while singer shapes the vocal tract so that individual harmonics (“overtones”) resonate. • What kind of voice quality would be conducive to this?

Vowels and Sonorants • So far, we’ve talked a lot about the acoustics of vowels: • Source: periodic openings and closings of the vocal folds. • Filter: characteristic resonant frequencies of the vocal tract (above the glottis) • Today, we’ll talk about the acoustics of sonorants: • Nasals • Laterals • Approximants • The source/filter characteristics of sonorants are similar to vowels… with a few interesting complications.

Damping • One interesting acoustic property exhibited by (some) sonorants is damping. • Recall that resonance occurs when: • a sound wave travels through an object • that sound wave is reflected... • ...and reinforced, on a periodic basis • The periodic reinforcement sets up alternating patterns of high and low air pressure • = a standing wave

Resonance in a closed tube t i m e

Damping, schematized • In a closed tube: • With only one pressure pulse from the loudspeaker, the wave will eventually dampen and die out. • Why? • The walls of the tube absorb some of the acoustic energy, with each reflection of the standing wave.

Damping Comparison • A heavily damped wave wil die out more quickly... • Than a lightly damped wave:

Damping Factors • The amount of damping in a tube is a function of: • The volume of the tube • The surface area of the tube • The material of which the tube is made • More volume, more surface area = more damping • Think about the resonant characteristics of: • a Home Depot • a post-modern restaurant • a movie theater • an anechoic chamber

An Anechoic Chamber

Resonance and Recording • Remember: any room will reverberate at its characteristic resonant frequencies • Hence: high quality sound recordings need to be made in specially designed rooms which damp any reverberation • Examples: • Classroom recording (29 dB signal-to-noise ratio) • “Soundproof” booth (44 dB SNR) • Anechoic chamber (90 dB SNR)

Spectrograms classroom “soundproof” booth

Spectrograms anechoic chamber

Inside Your Nose • In nasals, air flows through the nasal cavities. • The resonating “filter” of nasal sounds therefore has: • increased volume • increased surface area •  increased damping • Note: • the exact size and shape of the nasal cavities varies wildly from speaker to speaker.

Nasal Variability • Measurements based on MRI data (Dang et al., 1994)

Damping Effects, part 1 • Damping by the nasal cavities decreases the overall amplitude of the sound coming out through the nose. [m] [m]

Damping Effects, part 2 • How might the power spectrum of an undamped wave: • Compare to that of a damped wave? • A: Undamped waves have only one component; • Damped waves have a broader range of components.

Here’s Why 100 Hz sinewave + 90 Hz sinewave + 110 Hz sinewave

The Result 90 Hz + 100 Hz + 110 Hz • If the 90 Hz and 110 Hz components have less amplitude than the 100 Hz wave, there will be less damping:

Damping Spectra light medium

Damping Spectra heavy • Damping increases the bandwidth of the resonating filter. • Bandwidth = the range of frequencies over which a filter will respond at .707 of its maximum output. •  Nasal formants will have a larger bandwidth than vowel formants.

Bandwidth in Spectrograms F3 of F3 of [m] The formants in nasals have increased bandwidth, in comparison to the formants in vowels.

Nasal Formants • The values of formant frequencies for nasal stops can be calculated according to the same formula that we used for to calculate formant frequencies for an open tube. • fn = (2n - 1) * c • 4L • The simplest case: uvular nasal . • The length of the tube is a combination of: • distance from glottis to uvula (9 cm) • distance from uvula to nares (12.5 cm) • An average tube length (for adult males): 21.5 cm

The Math 12.5 cm • fn = (2n - 1) * c • 4L • L = 21.5 cm • c = 35000 cm/sec • F1 = 35000 • 86 • = 407 Hz • F2 = 1221 Hz • F3 = 2035 Hz 9 cm

The Real Thing • Check out Peter’s production of an uvular nasal in Praat. • And also Dustin’s neutral vowel! • Note: the higher formants are low in amplitude • Some reasons why: • Overall damping • “Nostril-rounding” reduces intensity • Resonance is lost in the side passages of the sinuses. • Nasal stops with fronter places of articulation also have anti-formants.

Anti-Formants • For nasal stops, the occlusion in the mouth creates a side cavity. • This side cavity resonates at particular frequencies. • These resonances absorb acoustic energy in the system. • They form anti-formants

Anti-Formant Math • Anti-formant resonances are based on the length of the vocal tract tube. • For [m], this length is about 8 cm. 8 cm • fn = (2n - 1) * c • 4L L = 8 cm AF1 = 35000 / 4*8 = 1094 Hz AF2 = 3281 Hz etc.

Spectral Signatures • In a spectrogram, acoustic energy lowers--or drops out completely--at the anti-formant frequencies. anti-formants

Nasal Place Cues • At more posterior places of articulation, the “anti-resonating” tube is shorter. •  anti-formant frequencies will be higher. • for [n], L = 5.5 cm • AF1 = 1600 Hz • AF2 = 4800 Hz • for , L = 3.3 cm • AF1 = 2650 Hz • for , L = 2.3 cm • AF1 = 3700 Hz

[m] vs. [n] [m] [e] [n] [o] AF1 (n) AF1 (m) • Production of [meno], by a speaker of Tsonga • Tsonga is spoken in South Africa and Mozambique

Nasal Stop Acoustics: Summary • Here’s the general pattern of what to look for in a spectrogram for nasals: • Periodic voicing. • Overall amplitude lower than in vowels. • Formants (resonance). • Formants have broad bandwidths. • Low frequency first formant. • Less space between formants. • Higher formants have low amplitude.

Perceiving Nasal Place • Nasal “murmurs” do not provide particularly strong cues to place of articulation. • Can you identify the following as [m], [n] or ? • Repp (1986) found that listeners can only distinguish between [n] and [m] 72% of the time. • Transitions provide important place cues for nasals. • Repp (1986): 95% of nasals identified correctly when presented with the first 10 msec of the following vowel. • Can you identify these nasal + transition combos?

Sonorant Grab Bag