360 likes | 605 Views
Obstruent Acoustics. Bonus Learning Fun!. Motor Theory, in a nutshell. The big idea: We perceive speech as abstract “gestures”, not sounds. Evidence: The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds Speech perception is multi-modal
E N D
Obstruent Acoustics Bonus Learning Fun!
Motor Theory, in a nutshell • The big idea: • We perceive speech as abstract “gestures”, not sounds. • Evidence: • The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds • Speech perception is multi-modal • Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues • Limited top-down access to the primary, acoustic elements of speech
Moving On… • One important lesson to take from the motor theory perspective is: • The dynamics of speech are generally more important to perception than static acoustic cues. • Note: visual chimerism and March Madness.
Auditory Chimeras • Speech waveform + music spectrum: frequency bands 1 2 4 8 16 32 • Music waveform + speech spectrum: frequency bands 1 2 4 8 16 32 Originals: Source: http://research.meei.harvard.edu/chimera/chimera_demos.html
Auditory Chimeras • Speech1 waveform + speech2 spectrum: frequency bands 1 2 4 6 8 16 • Speech2 waveform + speech1 spectrum: frequency bands 1 2 4 6 8 16 Originals:
Closure Voicing • The low frequency information that passes through the stop “filter” appears as a “voicing bar” in a spectrogram. • This acoustic information provides hardly any cues for place of articulation. Armenian: [bag]
Stop Transition Cues (again) • With the transition between stop closure and vowel, the perceptual task becomes much easier: • Try the same with Peter’s productions: • stop closures: • with transitions: • The moral of the story (again): • Dynamic changes provide stronger perceptual cues to place than static acoustic information.
Release Bursts • Note: along with transitions, stops have another cue for place at their disposal. • = release bursts • (nasals do not have these) • Here’s a waveform of a [p] release burst: duration 5 msec • What do you think the [p] burst spectrum will look like?
Burst Spectrum • [p] bursts tend to have very diffuse spectra, with energy spread across a wide range of frequencies. • Also: [p] bursts are very weak in intensity. • Extremely short duration of bursts requires lots of damping in the waveform. • broader frequency range
Release Bursts • In a spectrogram: • bilabial release bursts have a very diffuse spectrum, weakly spread across all frequencies. [p] burst • [p] bursts are relatively close to pure transient sounds.
Transients • A transient is: • “a sudden pressure fluctuation that is not sustained or repeated over time.” • An ideal transient waveform:
A Transient Spectrum • An ideal transient spectrum is perfectly flat:
Burst Filtering • The spectra of more posterior release bursts may be filtered by the cavity in front of the burst. • Ex: [t] bursts tend to lack energy at the lowest end of the frequency scale. • And higher frequency components are somewhat more intense. [t] burst
Release Bursts: [k] • Velar release bursts are relatively intense. • They also often have a strong concentration of energy in the 1500-2000 Hz range (F2/F3). • There can often be multiple [k] release bursts. [k] burst
Another Look • [k] bursts tend to be intense right where F2 and F3 meet in the velar pinch: Armenian: [bag]
Finally, Fricatives • The last type of sound we need to consider in speech is an aperiodic, continuous noise. • (Transients are aperiodic but not continuous.) • Ideally: • Q: What would the spectrum of this waveform look like?
White Noise Spectrum • Technical term: White noise • has an unlimited range of frequency components • Analogy: white light is what you get when you combine all visible frequencies of the electromagnetic spectrum
Turbulence • We can create aperiodic noise in speech by taking advantage of the phenomenon of turbulence. • Some handy technical terms: • laminar flow: a fluid flowing in parallel layers, with no disruption between the layers. • turbulent flow: a fluid flowing with chaotic property changes, including rapid variation in pressure and velocity in both space and time • Whether or not airflow is turbulent depends on: • the volume velocity of the fluid • the area of the channel through which it flows
Turbulence • Turbulence is more likely with: • a higher volume velocity • less channel area • All fricatives therefore require: • a narrow constriction • high airflow
Fricative Specs • Fricatives require great articulatory precision. • Some data for [s] (Subtelny et al., 1972): • alveolar constriction 1 mm • incisor constriction 2-3 mm • Larger constrictions result in -like sounds. • Generally, fricatives have a cross-sectional area between 6 and 12 mm2. • Cross-sectional areas greater than 20 mm2 result in laminar flow. • Airflow = 330 cm3/sec for voiceless fricatives • …and 240 cm3/sec for voiced fricatives
Turbulence Sources • For fricatives, turbulence is generated by forcing a stream of air at high velocity through either a narrow channel in the vocal tract or against an obstacle in the vocal tract. • Channel turbulence • produced when airflow escapes from a narrow channel and hits inert outside air • Obstacle turbulence • produced when airflow hits an obstacle in its path
Channel vs. Obstacle • Almost all fricatives involve an obstacle of some sort. • General rule of thumb: obstacle turbulence is much noisier than channel turbulence • [f] vs. • Also: obstacle turbulence is louder, the more perpendicular the obstacle is to the airflow • [s] vs. [x] • [x] is a “wall fricative”
Sibilants • Alveolar, dental and post-alveolar fricatives form a special class (the sibilants) because their obstacle is the back of the upper teeth. • This yields high intensity turbulence at high frequencies.
vs. “shy” “thigh”
Fricative Noise • Fricative noise has some inherent spectral shaping • …like “spectral tilt” • Note: this is a source characteristic • This resembles what is known as pink noise: • Compare with white noise:
Fricative Shaping • The turbulence spectrum may be filtered by the resonating tube in front of the fricative. • (Due to narrowness of constriction, back cavity resonances don’t really show up.) • As usual, resonance is determined by length of the tube in front of the constriction. • The longer the tube, the lower the “cut-off” frequency. • A basic example: • [s] vs.
vs. [s] “sigh” “shy”
Sampling Rates Revisited • Remember: Digital representations of speech can only capture frequency components up to half the sampling rate • the Nyquist frequency • Speech should be sampled at at least 44100 Hz • (although there is little frequency information in speech above 10,000 Hz) • [s] has higher acoustic energy from about 3500 - 10000 Hz • Note: telephones sample at 8000 Hz • 44100 Hz • 8000 Hz
Further Back • In more anterior fricatives, turbulence noise is generally shaped like a vowel made at the same place of articulation. [xoma] palatal vs. velar
Even Further Back • Examples from Hebrew:
At the Tail End • [h] exhibits a lot of coarticulation • [h] is not really a “fricative”; • it’s more like a whispered or breathy voiced vowel. “heed” “had”
Aspirated Fricatives • Like stops, fricatives can be aspirated. • [h] follows the supraglottal frication in the vocal tract. • Examples from Chinese: [tsa] [tsha]
Back at the Ranch • There is not much of a resonating filter in front of labial fricatives… • so their spectrum is flat and diffuse • (like bilabial stop release bursts) • Note: labio-dentals are more intense than bilabial fricatives • (channel vs. obstacle turbulence)
Fricative Internal Cues • The articulatory precision required by fricatives means that they are less affected by context than stops. • It’s easy for listeners to distinguish between the various fricative places on the basis of the frication noise alone. • Result of both filter and source differences. • Examples: • There is, however, one exception to the rule…
Huh? • The two most confusable consonants in the English language are [f] and . • (Interdentals also lack a resonating filter)