Transitions + Perception

Transitions + Perception March 26, 2009

Remainders • Singer’s formant spectrum. • Damping and spectra.

Laterals • Laterals are produced by constricting the sides of the tongue towards the center of the mouth. • Air may pass through the mouth on either both sides of the tongue… • or on just one side of the tongue.

Lateral Acoustics • The central constriction traps the flow of air in a “side branch” of the vocal tract. • This side branch makes the acoustics of laterals similar to the acoustics of nasals. • In particular: acoustic energy trapped in the side branch sets up “anti-formants” • Also: some damping • …but not as much as in nasals.

17.5 cm 4 cm • Primary resonances of lateral approximants are the same as those of for vocal tract length of 17.5 cm • 500 Hz, 1500 Hz, 2500 Hz... • However, F1 is consistently low (300 - 400 Hz) • Anti-formant arises from a side tube of length  4cm • AF1 = 2125 Hz

Laterals in Reality • Check out the Mid-Waghi and Zulu laterals in Praat Mid-Waghi: [alala]

Velarization of [l] • [l] often has low F2 in English because it is velarized • = produced with the back of the tongue raised • = “dark” [l] • symbolized • Perturbation Theory flashback: • There is an anti-node for F2 in the velar region • constrictions there lower F2 • Check out the video evidence.

Dark vs. Clear /l/ • /l/ often has low F2 in English because it is velarized. [alala]

[l] vs. [n] • Laterals are usually more intense than nasals • less volume, less surface area = less damping •  break between vowels and laterals is less clear [ ] [ n ]

[l] vs. • [l] and are primarily distinguished by F3 • much lower in • Also: [l] usually has lower F2 in English [ ] [ ]

Glides • Glides are vowel-like sonorants which are produced… • with slightly more constriction than a vowel at the same place of articulation. • Each glide corresponds to a different high vowel. • Vowel Glide Place • [i] [j] palatal (front, unrounded) • [u] [w] labio-velar (back, rounded) • [y] labial-palatal (front, rounded) velar (back, unrounded) • Each glide’s acoustics will be similar to those of the vowel they correspond to.

Glide Acoustics • Glides look like high vowels, but… • are shorter than vowels • They also tend to lack “steady states” • and exhibit rapid transitions into (or from) vowels • hence: “glides” • Also: lower in intensity • especially in the higher formants

[j] vs. [i]

[w] vs. [u]

Vowel-Glide-Vowel [iji] [uwu]

More Glides [wi:] [ju:]

Transitions • When stops are released, they go through a transition phase in between the stop and the vowel. • From stop to vowel: • Stop closure • Release burst • (glide-like) transition • “steady-state” vowel • Vowel-to-stop works the same way, in reverse, except: • Release burst (if any) comes after the stop closure.

Stop Components vowel closure voicing formant transitions stop release burst • From Armenian: [bag] another closure

Confusions • When the spectrogram was first invented… • phoneticians figured out quite quickly how to identify vowels from their spectral characteristics… • but they had a much harder time learning how to identify stops by their place of articulation. • Eventually they realized: • the formant transitions between vowels and stops provided a reliable cue to place of articulation. • Why?

Formant Transitions • A: the resonant frequencies of the vocal tract change as stop gestures enter or exit the closure phase. • Simplest case: formant frequencies usually decrease near bilabial stops

Stops vs. Glides “baby” • Note: formant transitions are more rapid for stops than they are for glides. “wave”

Formant Transitions: alveolars • For other places of articulation, the formant transition that appears is more complex. • From front vowels into alveolars, F2 tends to slope downward. • From back vowels into alveolars, F2 tends to slope upwards.

[hid] [hæd]

Formant Locus • Whether in a front vowel or back vowel context... • The formant transitions for alveolars tend to point to the same frequency value. ( 1650-1700 Hz) • This (apparent) frequency value is known as the locus of the formant transition. • In the ‘50s, researchers theorized: • the locus frequency can be used by listeners to reliably identify place of articulation. • However, velars posed a problem…

Velar Transitions • Velar formant transitions do not always have a reliable locus frequency for F2. • Velars exhibit a lot of coarticulation with neighboring vowels. • Fronter (more palatal) next to front vowels • Locus is high: 1950-2000 Hz • Backer (more velar) next to back vowels • Locus is lower: < 1500 Hz • F2 and F3 often come together in velar transitions • “Velar Pinch”

The Velar Pinch [bag] [bak]

Testing the Theory • The earliest experiments on place perception were conducted in the 1950s, using a speech synthesizer known as the pattern playback.

Pattern Playback Picture

Haskins Formant Transitions • Testing the perception of two-formant stimuli, with varying F2 transitions, led to a phenomenon known as categorical perception.

Categorical Perception • Categorical perception = • continuous physical distinctions are perceived in discrete categories. • In the in-class experiment from last time: • There were 11 different syllable stimuli • They only differed in the locus of their F2 transition • F2 Locus range = 726 - 2217 Hz • Source: http://www.ling.gu.se/~anders/KatPer/Applet/index.eng.html

Stimulus #1 Stimulus #6 Stimulus #11 Example stimuli from the in-class experiment.

Identification • In Categorical Perception: • All stimuli within a category boundary should be labeled the same.

Discrimination • Original task: ABX discrimination • Stimuli across category boundaries should be 100% discriminable. • Stimuli within category boundaries should not be discriminable at all. In practice, categorical perception means: the discrimination function can be determined from the identification function.

Identification  Discrimination • Let’s consider a case where the two sounds in a discrimination pair are the same. • Example: the pair is stimulus 3 followed by stimulus 3 • Identification data--Stimulus 3 is identified as: • [b] 95% of the time • [d] 5% of the time • The discrimination pair will be perceived as: • [b] - [b] - .95 * .95 = .9025 • [d] - [d] - .05 * .05 = .0025 • Probability of same response is predicted to be: • (.9025 + .0025) = .905 = 90.5%

Identification  Discrimination • Let’s consider a case where the two sounds in a discrimination pair are different. • Example: the pair is stimulus 9 followed by stimulus 11 • Identification data: • Stimulus 9: [d] 80% of the time, [g] 20% of the time • Stimulus 11: [d] 5% of the time, [g] 95% of the time • The discrimination pair will be perceived as: • [d] - [d] - .80 * .05 = .04 • [g] - [g] - .20 * .95 = .19 • Probability of same response is predicted to be: • (.04 + .19) = 23%

Discrimination • In this discrimination graph-- • Solid line is the observed data • Dashed line is the predicted data • (on the basis of the identification scores) Note: the actual listeners did a little bit better than the predictions.

Categorical, Continued • Categorical Perception was also found for VOT distinctions. • And for stop/glide/vowel distinctions: 10 ms transitions: [b] percept 60 ms transitions: [w] percept 200 ms transitions: [u] percept

Interpretation • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category) • The acoustic details of the stimulus are discarded in favor of an abstract representation. • A continuous acoustic signal: • Is thus transformed into a series of linguistic units:

The Next Level • Interestingly, categorical perception is not found for non-speech stimuli. • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.

The Next Level • They also tested perception of the F3 transitions in isolation. • Listeners did not perceive these transitions categorically.

The Implications • Interpretation: we do not perceive speech in the same way we perceive other sounds. • “Speech is special”… • and the perception of speech is modular. • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.

Module Characteristics • You can think of a module as a “mental reflex”. • A module of the mind is defined as having the following characteristics: • Domain-specific • Automatic • Fast • Hard-wired in brain • Limited top-down access (you can’t “unperceive”) • Example: the sense of vision operates modularly.

A Modular Mind Model central processes judgment, imagination, memory, attention vision hearing touch speech modules transducers eyes ears skin etc. external, physical reality

Remember this stuff? • Speech is a “special” kind of sound because it exhibits spectral change over time. •  it’s processed by the speech module, not by the auditory module.

SWS Findings • The uninitiated either hear sinewave speech as speech or as “whistles”, “chirps”, etc. • Claim: once you hear it as speech, you can’t go back. • The speech module takes precedence • (Limited top-down access) • Analogy: it’s impossible to not perceive real speech as speech. • We can’t hear the individual formants as whistles, chirps, etc. • Motor theory says: we don’t perceive the “sounds”, we perceive the gestures which shape the spectrum.

Transitions + Perception

Transitions + Perception

Presentation Transcript

Transitions.

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions...

Transitions

Transitions!

Transitions

Transitions

Transitions:

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions

Transitions: