1 / 44

More Perception + Fricative Acoustics

More Perception + Fricative Acoustics. March 31, 2011. To Begin With…. Today: Some more thoughts on perception And then a brief review of obstruent acoustics On Tuesday, we’ll be doing: A brief description of vocal tract musculature. Static palatography demo!

megan-lucas
Download Presentation

More Perception + Fricative Acoustics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More Perception + Fricative Acoustics March 31, 2011

  2. To Begin With… • Today: • Some more thoughts on perception • And then a brief review of obstruent acoustics • On Tuesday, we’ll be doing: • A brief description of vocal tract musculature. • Static palatography demo! • You’re welcome to bring in a camera, if you so desire. • Also, a link: • http://sakurakoshimizu.blogspot.com/

  3. Where were we? • In Categorical Perception: • All stimuli within a category boundary should be labeled the same.

  4. Discrimination • Original task: ABX discrimination • Stimuli across category boundaries should be 100% discriminable. • (= different labels) • Stimuli within category boundaries should not be discriminable at all. • (= same labels) • In practice, categorical perception means: • the discrimination function can be determined from the identification function.

  5. Discrimination • In this discrimination graph-- • Solid line is the observed data • Dashed line is the predicted data • (on the basis of the identification scores) Note: the actual listeners did a little bit better than the predictions.

  6. Categorical, Continued • Categorical Perception was also found for VOT distinctions. • And for stop/glide/vowel distinctions: 10 ms transitions: [b] percept 60 ms transitions: [w] percept 200 ms transitions: [u] percept

  7. Interpretation • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category) • The acoustic details of the stimulus are discarded in favor of an abstract representation. • A continuous acoustic signal: • Is thus transformed into a series of linguistic units:

  8. The Next Level • Interestingly, categorical perception is not found for non-speech stimuli. • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.

  9. The Next Level • They also tested perception of the F3 transitions in isolation. • Listeners did not perceive these transitions categorically.

  10. The Implications • Interpretation: we do not perceive speech in the same way we perceive other sounds. • “Speech is special”… • and the perception of speech is modular. • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.

  11. Module Characteristics • You can think of a module as a “mental reflex”. • A module of the mind is defined as having the following characteristics: • Domain-specific • Automatic • Fast • Hard-wired in brain • Limited top-down access (you can’t “unperceive”) • Example: the sense of vision operates modularly.

  12. A Modular Mind Model central processes judgment, imagination, memory, attention vision hearing touch speech modules transducers eyes ears skin etc. external, physical reality

  13. More Evidence for Modularity • It has also been observed that speech is perceived multi-modally. • i.e.: we can perceive it through vision, as well as hearing (or some combination of the two). •  We’re perceiving “gestures” • …and the gestures are abstract. • Interesting evidence: McGurk Effect

  14. McGurk Effect, revealed • Audio Visual Perceived • ba + ga  da • ga + ba  ba, bga, gba • Some interesting facts: • The McGurk Effect is exceedingly robust. • Adults show the McGurk Effect more than children. • Americans show the McGurk Effect more than Japanese.

  15. Original McGurk Data • Auditory Visual • Stimulus: ba-ba ga-ga • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 19% 36 81 0 • 7-8 36 0 64 0 • 18-40 2 0 98 0

  16. Original McGurk Data • Auditory Visual • Stimulus: ga-ga ba-ba • Response types: • Auditory: ba-ba Fused: da-da • Visual: ga-ga Combo: gabga, bagba • Age Auditory Visual Fused Combo • 3-5 57% 10 0 19 • 7-8 36 21 11 32 • 18-40 11 31 0 54

  17. Audio-Visual Sidebar • Visual cues affect the perception of speech in non-mismatched conditions, as well. • Scientific studies of lipreading date back to the early twentieth century • The original goal: improve the speech perception skills of the hearing-impaired • Note: visual speech cues often complement audio speech cues • In particular: place of articulation • However, training people to become better lipreaders has proven difficult… • Some people got it; some people don’t.

  18. Sumby & Pollack (1954) • First investigated the influence of visual information on the perception of speech by normal-hearing listeners. • Method: • Presented individual word tokens to listeners in noise, with simultaneous visual cues. • Task: identify spoken word • Clear: • +10 dB SNR: • + 5 dB SNR: • 0 dB SNR:

  19. Sumby & Pollack data • Auditory-Only Audio-Visual • Visual cues provide an intelligibility boost equivalent to a 12 dB increase in signal-to-noise ratio.

  20. Tadoma Method • Some deaf-blind people learn to perceive speech through the tactile modality, by using the Tadoma method.

  21. Audio-Tactile Perception • Fowler & Dekle: tested ability of (naive) college students to perceive speech through the Tadoma method. • Presented synthetic stops auditorily • Combined with mismatched tactile information: • Ex: audio /ga/ + tactile /ba/ • Also combined with mismatched orthographic information: • Ex: audio /ga/ + orthographic /ba/ • Task: listeners reported what they “heard” • Tactile condition biased listeners more towards “ba” responses

  22. Fowler & Dekle data orthographic mismatch condition tactile mismatch condition read “ba” felt “ba”

  23. Another Piece of the Puzzle • Another interesting finding which has been used to argue for the “speech is special” theory is duplex perception. • Take an isolated F3 transition: and present it to one ear…

  24. Do the Edges First! • While presenting this spectral frame to the other ear:

  25. Two Birds with One Spectrogram • The resulting combo is perceived in duplex fashion: • One ear hears the F3 “chirp”; • The other ear hears the combined stimulus as “da”.

  26. Duplex Interpretation • Check out the spectrograms in Praat. • Mann and Liberman (1983) found: • Discrimination of the F3 chirps is gradient when they’re in isolation… • but categorical when combined with the spectral frame. • (Compare with the F3 discrimination experiment with Japanese and American listeners) • Interpretation: the “special” speech processor puts the two pieces of the spectrogram together.

  27. fMRI data • Benson et al. (2001) • Non-Speech stimuli = notes, chords, and chord progressions on a piano

  28. fMRI data • Benson et al. (2001) • Difference in activation for natural speech stimuli versus activation for sinewave speech stimuli

  29. Mirror Neurons • In the 1990s, researchers in Italy discovered what they called mirror neurons in the brains of macaques. • Macaques had been trained to make grasping motions with their hands. • Researchers recorded the activity of single neurons while the monkeys were making these motions. • Serendipity: • the same neurons fired when the monkeys saw the researchers making grasping motions. •  a neurological link between perception and action. • Motor theory claim: same links exist in the human brain, for the perception of speech gestures

  30. Motor Theory, in a nutshell • The big idea: • We perceive speech as abstract “gestures”, not sounds. • Evidence: • The perceptual interpretation of speech differs radically from the acoustic organization of speech sounds • Speech perception is multi-modal • Direct (visual, tactile) information about gestures can influence/override indirect (acoustic) speech cues • Limited top-down access to the primary, acoustic elements of speech

  31. Moving On… • One important lesson to take from the motor theory perspective is: • The dynamics of speech are generally more important to perception than static acoustic cues. • Note: visual chimerism and March Madness.

  32. Auditory Chimeras • Speech waveform + music spectrum: frequency bands 1 2 4 8 16 32 • Music waveform + speech spectrum: frequency bands 1 2 4 8 16 32 Originals: Source: http://research.meei.harvard.edu/chimera/chimera_demos.html

  33. Auditory Chimeras • Speech1 waveform + speech2 spectrum: frequency bands 1 2 4 6 8 16 • Speech2 waveform + speech1 spectrum: frequency bands 1 2 4 6 8 16 Originals:

  34. Finally, Fricatives • The last type of sound we need to consider in speech acoustics is an aperiodic, continuous noise. • Ideally: • Q: What would the spectrum of this waveform look like?

  35. White Noise Spectrum • Technical term: White noise • has an unlimited range of frequency components • Analogy: white light is what you get when you combine all visible frequencies of the electromagnetic spectrum

  36. Turbulence • We can create aperiodic noise in speech by taking advantage of the phenomenon of turbulence. • Some handy technical terms: • laminar flow: a fluid flowing in parallel layers, with no disruption between the layers. • turbulent flow: a fluid flowing with chaotic property changes, including rapid variation in pressure and velocity in both space and time • Whether or not airflow is turbulent depends on: • the volume velocity of the fluid • the area of the channel through which it flows

  37. Turbulence • Turbulence is more likely with: • a higher volume velocity • less channel area • All fricatives therefore require: • a narrow constriction • high airflow

  38. Fricative Specs • Fricatives require great articulatory precision. • Some data for [s] (Subtelny et al., 1972): • alveolar constriction  1 mm • incisor constriction  2-3 mm • Larger constrictions result in -like sounds. • Generally, fricatives have a cross-sectional area between 6 and 12 mm2. • Cross-sectional areas greater than 20 mm2 result in laminar flow. • Airflow = 330 cm3/sec for voiceless fricatives • …and 240 cm3/sec for voiced fricatives

  39. Turbulence Sources • For fricatives, turbulence is generated by forcing a stream of air at high velocity through either a narrow channel in the vocal tract or against an obstacle in the vocal tract. • Channel turbulence • produced when airflow escapes from a narrow channel and hits inert outside air • Obstacle turbulence • produced when airflow hits an obstacle in its path

  40. Channel vs. Obstacle • Almost all fricatives involve an obstacle of some sort. • General rule of thumb: obstacle turbulence is much noisier than channel turbulence • [f] vs. • Also: obstacle turbulence is louder, the more perpendicular the obstacle is to the airflow • [s] vs. [x] • [x] is a “wall fricative”

  41. Sibilants • Alveolar, dental and post-alveolar fricatives form a special class (the sibilants) because their obstacle is the back of the upper teeth. • This yields high intensity turbulence at high frequencies.

  42. vs. “shy” “thigh”

  43. Fricative Noise • Fricative noise has some inherent spectral shaping • …like “spectral tilt” • Note: this is a source characteristic • This resembles what is known as pink noise: • Compare with white noise:

  44. Fricative Shaping • The turbulence spectrum may be filtered by the resonating tube in front of the fricative. • (Due to narrowness of constriction, back cavity resonances don’t really show up.) • As usual, resonance is determined by length of the tube in front of the constriction. •  The longer the tube, the lower the “cut-off” frequency. • A basic example: • [s] vs.

More Related