480 likes | 821 Views
Transitions + Perception. March 26, 2009. Remainders. Singer’s formant spectrum. Damping and spectra. Laterals. Laterals are produced by constricting the sides of the tongue towards the center of the mouth. Air may pass through the mouth on either both sides of the tongue…
E N D
Transitions + Perception March 26, 2009
Remainders • Singer’s formant spectrum. • Damping and spectra.
Laterals • Laterals are produced by constricting the sides of the tongue towards the center of the mouth. • Air may pass through the mouth on either both sides of the tongue… • or on just one side of the tongue.
Lateral Acoustics • The central constriction traps the flow of air in a “side branch” of the vocal tract. • This side branch makes the acoustics of laterals similar to the acoustics of nasals. • In particular: acoustic energy trapped in the side branch sets up “anti-formants” • Also: some damping • …but not as much as in nasals.
17.5 cm 4 cm • Primary resonances of lateral approximants are the same as those of for vocal tract length of 17.5 cm • 500 Hz, 1500 Hz, 2500 Hz... • However, F1 is consistently low (300 - 400 Hz) • Anti-formant arises from a side tube of length 4cm • AF1 = 2125 Hz
Laterals in Reality • Check out the Mid-Waghi and Zulu laterals in Praat Mid-Waghi: [alala]
Velarization of [l] • [l] often has low F2 in English because it is velarized • = produced with the back of the tongue raised • = “dark” [l] • symbolized • Perturbation Theory flashback: • There is an anti-node for F2 in the velar region • constrictions there lower F2 • Check out the video evidence.
Dark vs. Clear /l/ • /l/ often has low F2 in English because it is velarized. [alala]
[l] vs. [n] • Laterals are usually more intense than nasals • less volume, less surface area = less damping • break between vowels and laterals is less clear [ ] [ n ]
[l] vs. • [l] and are primarily distinguished by F3 • much lower in • Also: [l] usually has lower F2 in English [ ] [ ]
Glides • Glides are vowel-like sonorants which are produced… • with slightly more constriction than a vowel at the same place of articulation. • Each glide corresponds to a different high vowel. • Vowel Glide Place • [i] [j] palatal (front, unrounded) • [u] [w] labio-velar (back, rounded) • [y] labial-palatal (front, rounded) velar (back, unrounded) • Each glide’s acoustics will be similar to those of the vowel they correspond to.
Glide Acoustics • Glides look like high vowels, but… • are shorter than vowels • They also tend to lack “steady states” • and exhibit rapid transitions into (or from) vowels • hence: “glides” • Also: lower in intensity • especially in the higher formants
Vowel-Glide-Vowel [iji] [uwu]
More Glides [wi:] [ju:]
Transitions • When stops are released, they go through a transition phase in between the stop and the vowel. • From stop to vowel: • Stop closure • Release burst • (glide-like) transition • “steady-state” vowel • Vowel-to-stop works the same way, in reverse, except: • Release burst (if any) comes after the stop closure.
Stop Components vowel closure voicing formant transitions stop release burst • From Armenian: [bag] another closure
Confusions • When the spectrogram was first invented… • phoneticians figured out quite quickly how to identify vowels from their spectral characteristics… • but they had a much harder time learning how to identify stops by their place of articulation. • Eventually they realized: • the formant transitions between vowels and stops provided a reliable cue to place of articulation. • Why?
Formant Transitions • A: the resonant frequencies of the vocal tract change as stop gestures enter or exit the closure phase. • Simplest case: formant frequencies usually decrease near bilabial stops
Stops vs. Glides “baby” • Note: formant transitions are more rapid for stops than they are for glides. “wave”
Formant Transitions: alveolars • For other places of articulation, the formant transition that appears is more complex. • From front vowels into alveolars, F2 tends to slope downward. • From back vowels into alveolars, F2 tends to slope upwards.
[hid] [hæd]
Formant Locus • Whether in a front vowel or back vowel context... • The formant transitions for alveolars tend to point to the same frequency value. ( 1650-1700 Hz) • This (apparent) frequency value is known as the locus of the formant transition. • In the ‘50s, researchers theorized: • the locus frequency can be used by listeners to reliably identify place of articulation. • However, velars posed a problem…
Velar Transitions • Velar formant transitions do not always have a reliable locus frequency for F2. • Velars exhibit a lot of coarticulation with neighboring vowels. • Fronter (more palatal) next to front vowels • Locus is high: 1950-2000 Hz • Backer (more velar) next to back vowels • Locus is lower: < 1500 Hz • F2 and F3 often come together in velar transitions • “Velar Pinch”
The Velar Pinch [bag] [bak]
Testing the Theory • The earliest experiments on place perception were conducted in the 1950s, using a speech synthesizer known as the pattern playback.
Haskins Formant Transitions • Testing the perception of two-formant stimuli, with varying F2 transitions, led to a phenomenon known as categorical perception.
Categorical Perception • Categorical perception = • continuous physical distinctions are perceived in discrete categories. • In the in-class experiment from last time: • There were 11 different syllable stimuli • They only differed in the locus of their F2 transition • F2 Locus range = 726 - 2217 Hz • Source: http://www.ling.gu.se/~anders/KatPer/Applet/index.eng.html
Stimulus #1 Stimulus #6 Stimulus #11 Example stimuli from the in-class experiment.
Identification • In Categorical Perception: • All stimuli within a category boundary should be labeled the same.
Discrimination • Original task: ABX discrimination • Stimuli across category boundaries should be 100% discriminable. • Stimuli within category boundaries should not be discriminable at all. In practice, categorical perception means: the discrimination function can be determined from the identification function.
Identification Discrimination • Let’s consider a case where the two sounds in a discrimination pair are the same. • Example: the pair is stimulus 3 followed by stimulus 3 • Identification data--Stimulus 3 is identified as: • [b] 95% of the time • [d] 5% of the time • The discrimination pair will be perceived as: • [b] - [b] - .95 * .95 = .9025 • [d] - [d] - .05 * .05 = .0025 • Probability of same response is predicted to be: • (.9025 + .0025) = .905 = 90.5%
Identification Discrimination • Let’s consider a case where the two sounds in a discrimination pair are different. • Example: the pair is stimulus 9 followed by stimulus 11 • Identification data: • Stimulus 9: [d] 80% of the time, [g] 20% of the time • Stimulus 11: [d] 5% of the time, [g] 95% of the time • The discrimination pair will be perceived as: • [d] - [d] - .80 * .05 = .04 • [g] - [g] - .20 * .95 = .19 • Probability of same response is predicted to be: • (.04 + .19) = 23%
Discrimination • In this discrimination graph-- • Solid line is the observed data • Dashed line is the predicted data • (on the basis of the identification scores) Note: the actual listeners did a little bit better than the predictions.
Categorical, Continued • Categorical Perception was also found for VOT distinctions. • And for stop/glide/vowel distinctions: 10 ms transitions: [b] percept 60 ms transitions: [w] percept 200 ms transitions: [u] percept
Interpretation • Main idea: in categorical perception, the mind translates an acoustic stimulus into a phonemic label. (category) • The acoustic details of the stimulus are discarded in favor of an abstract representation. • A continuous acoustic signal: • Is thus transformed into a series of linguistic units:
The Next Level • Interestingly, categorical perception is not found for non-speech stimuli. • Miyawaki et al: tested perception of an F3 continuum between /r/ and /l/.
The Next Level • They also tested perception of the F3 transitions in isolation. • Listeners did not perceive these transitions categorically.
The Implications • Interpretation: we do not perceive speech in the same way we perceive other sounds. • “Speech is special”… • and the perception of speech is modular. • A module is a special processor in our minds/brains devoted to interpreting a particular kind of environmental stimuli.
Module Characteristics • You can think of a module as a “mental reflex”. • A module of the mind is defined as having the following characteristics: • Domain-specific • Automatic • Fast • Hard-wired in brain • Limited top-down access (you can’t “unperceive”) • Example: the sense of vision operates modularly.
A Modular Mind Model central processes judgment, imagination, memory, attention vision hearing touch speech modules transducers eyes ears skin etc. external, physical reality
Remember this stuff? • Speech is a “special” kind of sound because it exhibits spectral change over time. • it’s processed by the speech module, not by the auditory module.
SWS Findings • The uninitiated either hear sinewave speech as speech or as “whistles”, “chirps”, etc. • Claim: once you hear it as speech, you can’t go back. • The speech module takes precedence • (Limited top-down access) • Analogy: it’s impossible to not perceive real speech as speech. • We can’t hear the individual formants as whistles, chirps, etc. • Motor theory says: we don’t perceive the “sounds”, we perceive the gestures which shape the spectrum.