450 likes | 631 Views
More Perception + Fricative Acoustics. March 31, 2011. To Begin With…. Today: Some more thoughts on perception And then a brief review of obstruent acoustics On Tuesday, we’ll be doing: A brief description of vocal tract musculature. Static palatography demo!
E N D
More Perception + Fricative Acoustics March 31, 2011
To Begin With… • Today: • Some more thoughts on perception • And then a brief review of obstruent acoustics • On Tuesday, we’ll be doing: • A brief description of vocal tract musculature. • Static palatography demo! • You’re welcome to bring in a camera, if you so desire. • Also, a link: • http://sakurakoshimizu.blogspot.com/
Where were we? • In Categorical Perception: • All stimuli within a category boundary should be labeled the same.
Discrimination • Original task: ABX discrimination • Stimuli across category boundaries should be 100% discriminable. • (= different labels) • Stimuli within category boundaries should not be discriminable at all. • (= same labels) • In practice, categorical perception means: • the discrimination function can be determined from the identification function.
Discrimination • In this discrimination graph-- • Solid line is the observed data • Dashed line is the predicted data • (on the basis of the identification scores) Note: the actual listeners did a little bit better than the predictions.
Categorical, Continued • Categorical Perception was also found for VOT distinctions. • And for stop/glide/vowel distinctions: 10 ms transitions: [b] percept 60 ms transitions: [w] percept 200 ms transitions: [u] percept
Interpretation • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category) • The acoustic details of the stimulus are discarded in favor of an abstract representation. • A continuous acoustic signal: • Is thus transformed into a series of linguistic units:
The Next Level • Interestingly, categorical perception is not found for non-speech stimuli. • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.
The Next Level • They also tested perception of the F3 transitions in isolation. • Listeners did not perceive these transitions categorically.
The Implications • Interpretation: we do not perceive speech in the same way we perceive other sounds. • “Speech is special”… • and the perception of speech is modular. • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.
Module Characteristics • You can think of a module as a “mental reflex”. • A module of the mind is defined as having the following characteristics: • Domain-specific • Automatic • Fast • Hard-wired in brain • Limited top-down access (you can’t “unperceive”) • Example: the sense of vision operates modularly.
A Modular Mind Model central processes judgment, imagination, memory, attention vision hearing touch speech modules transducers eyes ears skin etc. external, physical reality
More Evidence for Modularity • It has also been observed that speech is perceived multi-modally. • i.e.: we can perceive it through vision, as well as hearing (or some combination of the two). • We’re perceiving “gestures” • …and the gestures are abstract. • Interesting evidence: McGurk Effect
McGurk Effect, revealed • Audio Visual Perceived • ba + ga da • ga + ba ba, bga, gba • Some interesting facts: • The McGurk Effect is exceedingly robust. • Adults show the McGurk Effect more than children. • Americans show the McGurk Effect more than Japanese.
Original McGurk Data • Auditory Visual • Stimulus: ba-ba ga-ga • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 19% 36 81 0 • 7-8 36 0 64 0 • 18-40 2 0 98 0
Original McGurk Data • Auditory Visual • Stimulus: ga-ga ba-ba • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 57% 10 0 19 • 7-8 36 21 11 32 • 18-40 11 31 0 54
Audio-Visual Sidebar • Visual cues affect the perception of speech in non-mismatched conditions, as well. • Scientific studies of lipreading date back to the early twentieth century • The original goal: improve the speech perception skills of the hearing-impaired • Note: visual speech cues often complement audio speech cues • In particular: place of articulation • However, training people to become better lipreaders has proven difficult… • Some people got it; some people don’t.
Sumby & Pollack (1954) • First investigated the influence of visual information on the perception of speech by normal-hearing listeners. • Method: • Presented individual word tokens to listeners in noise, with simultaneous visual cues. • Task: identify spoken word • Clear: • +10 dB SNR: • + 5 dB SNR: • 0 dB SNR:
Sumby & Pollack data • Auditory-Only Audio-Visual • Visual cues provide an intelligibility boost equivalent to a 12 dB increase in signal-to-noise ratio.
Tadoma Method • Some deaf-blind people learn to perceive speech through the tactile modality, by using the Tadoma method.
Audio-Tactile Perception • Fowler & Dekle: tested ability of (naive) college students to perceive speech through the Tadoma method. • Presented synthetic stops auditorily • Combined with mismatched tactile information: • Ex: audio /ga/ + tactile /ba/ • Also combined with mismatched orthographic information: • Ex: audio /ga/ + orthographic /ba/ • Task: listeners reported what they “heard” • Tactile condition biased listeners more towards “ba” responses
Fowler & Dekle data orthographic mismatch condition tactile mismatch condition read “ba” felt “ba”
Another Piece of the Puzzle • Another interesting finding which has been used to argue for the “speech is special” theory is duplex perception. • Take an isolated F3 transition: and present it to one ear…
Do the Edges First! • While presenting this spectral frame to the other ear:
Two Birds with One Spectrogram • The resulting combo is perceived in duplex fashion: • One ear hears the F3 “chirp”; • The other ear hears the combined stimulus as “da”.
Duplex Interpretation • Check out the spectrograms in Praat. • Mann and Liberman (1983) found: • Discrimination of the F3 chirps is gradient when they’re in isolation… • but categorical when combined with the spectral frame. • (Compare with the F3 discrimination experiment with Japanese and American listeners) • Interpretation: the “special” speech processor puts the two pieces of the spectrogram together.
fMRI data • Benson et al. (2001) • Non-Speech stimuli = notes, chords, and chord progressions on a piano
fMRI data • Benson et al. (2001) • Difference in activation for natural speech stimuli versus activation for sinewave speech stimuli
Mirror Neurons • In the 1990s, researchers in Italy discovered what they called mirror neurons in the brains of macaques. • Macaques had been trained to make grasping motions with their hands. • Researchers recorded the activity of single neurons while the monkeys were making these motions. • Serendipity: • the same neurons fired when the monkeys saw the researchers making grasping motions. • a neurological link between perception and action. • Motor theory claim: same links exist in the human brain, for the perception of speech gestures
Motor Theory, in a nutshell • The big idea: • We perceive speech as abstract “gestures”, not sounds. • Evidence: • The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds • Speech perception is multi-modal • Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues • Limited top-down access to the primary, acoustic elements of speech
Moving On… • One important lesson to take from the motor theory perspective is: • The dynamics of speech are generally more important to perception than static acoustic cues. • Note: visual chimerism and March Madness.
Auditory Chimeras • Speech waveform + music spectrum: frequency bands 1 2 4 8 16 32 • Music waveform + speech spectrum: frequency bands 1 2 4 8 16 32 Originals: Source: http://research.meei.harvard.edu/chimera/chimera_demos.html
Auditory Chimeras • Speech1 waveform + speech2 spectrum: frequency bands 1 2 4 6 8 16 • Speech2 waveform + speech1 spectrum: frequency bands 1 2 4 6 8 16 Originals:
Finally, Fricatives • The last type of sound we need to consider in speech acoustics is an aperiodic, continuous noise. • Ideally: • Q: What would the spectrum of this waveform look like?
White Noise Spectrum • Technical term: White noise • has an unlimited range of frequency components • Analogy: white light is what you get when you combine all visible frequencies of the electromagnetic spectrum
Turbulence • We can create aperiodic noise in speech by taking advantage of the phenomenon of turbulence. • Some handy technical terms: • laminar flow: a fluid flowing in parallel layers, with no disruption between the layers. • turbulent flow: a fluid flowing with chaotic property changes, including rapid variation in pressure and velocity in both space and time • Whether or not airflow is turbulent depends on: • the volume velocity of the fluid • the area of the channel through which it flows
Turbulence • Turbulence is more likely with: • a higher volume velocity • less channel area • All fricatives therefore require: • a narrow constriction • high airflow
Fricative Specs • Fricatives require great articulatory precision. • Some data for [s] (Subtelny et al., 1972): • alveolar constriction 1 mm • incisor constriction 2-3 mm • Larger constrictions result in -like sounds. • Generally, fricatives have a cross-sectional area between 6 and 12 mm2. • Cross-sectional areas greater than 20 mm2 result in laminar flow. • Airflow = 330 cm3/sec for voiceless fricatives • …and 240 cm3/sec for voiced fricatives
Turbulence Sources • For fricatives, turbulence is generated by forcing a stream of air at high velocity through either a narrow channel in the vocal tract or against an obstacle in the vocal tract. • Channel turbulence • produced when airflow escapes from a narrow channel and hits inert outside air • Obstacle turbulence • produced when airflow hits an obstacle in its path
Channel vs. Obstacle • Almost all fricatives involve an obstacle of some sort. • General rule of thumb: obstacle turbulence is much noisier than channel turbulence • [f] vs. • Also: obstacle turbulence is louder, the more perpendicular the obstacle is to the airflow • [s] vs. [x] • [x] is a “wall fricative”
Sibilants • Alveolar, dental and post-alveolar fricatives form a special class (the sibilants) because their obstacle is the back of the upper teeth. • This yields high intensity turbulence at high frequencies.
vs. “shy” “thigh”
Fricative Noise • Fricative noise has some inherent spectral shaping • …like “spectral tilt” • Note: this is a source characteristic • This resembles what is known as pink noise: • Compare with white noise:
Fricative Shaping • The turbulence spectrum may be filtered by the resonating tube in front of the fricative. • (Due to narrowness of constriction, back cavity resonances don’t really show up.) • As usual, resonance is determined by length of the tube in front of the constriction. • The longer the tube, the lower the “cut-off” frequency. • A basic example: • [s] vs.