340 likes | 687 Views
The Perception of Speech. Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes examples of phonemes: /ba/ in bat , /pa/ in pat. Seeing Sound with Spectrograms. A spectrogram is a 3D plot of sound. Frequency. Time. Seeing Sound with Spectrograms.
E N D
Speech • Speech is for rapid communication • Speech is composed of units of sound called phonemes • examples of phonemes: /ba/ in bat , /pa/ in pat
Seeing Sound with Spectrograms • A spectrogram is a 3D plot of sound Frequency Time
Seeing Sound with Spectrograms • A spectrogram is a 3D plot of sound Frequency Time Intensity is often coded by colour Intensity
Acoustic Properties of Speech • Speech can be characterized by a spectrogram
Acoustic Properties of Speech • Spectrogram reveals differences between phonemes • The differences are in the formants and the formant transitions
Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • right?…
Perceiving Speech • So perceiving (interpreting) speech sounds is simply a matter of matching the spectrotemporal properties (the shape of the spectrogram) of the incoming sound waves to the appropriate phoneme • Then specific phonemes must correspond to specific spectrograms - a property called acoustic-phonetic invariance
Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ followed by different vowels:
Perceiving Speech • Acoustic - Phonetic invariance says that phonemes should match one and only one pattern in the spectrogram • This is not the case! For example /d/ • Clearly perception and understanding of speech sounds is more elaborate than simply interpreting an internal spectrogram
Perceiving Speech • The phrase “Peter buttered the burnt toast” has five /t/ phonemes. There are not 5 identical sweeps in the spectrogram
Perceiving Speech • The Segmentation Problem • Segmentation is the perception of silence between words • Often illusory
Perceiving Speech • The phrase “I owe you a Yo-Yo” has no silence in it !
Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Silent gaps don’t always indicate (aren’t perceived as) interruptions in speech
Spoken Input • The Segmentation Problem: • The stream of acoustic input is not physically segmented into discrete phonemes, words, phrases, etc. • Continuous speech stream is sometimes perceived as having gaps
Perceiving Speech • So how do you perceive speech? Some of the “strategies”: 1. reduce the data 2. use context clues 3. use vision
Categorical Perception • Categorical Perception is a phenomenon in which the brain assigns a stimulus into one or another category but never into an intermediate category
Categorical Perception • For example, /ba/ and /pa/ differ in their formant transitions • /ba/ is formed by stopping the flow of air from the lungs and releasing it after about 10 milliseconds (called voice onset time) • /pa/ is similar except that voice onset time is about 50 ms
Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but...
Categorical Perception • Voice onset time can range from zero to >50 ms. For example, you could synthesize a sound with a voice onset time of 30 ms but... • English speakers will hear either /ba/ or /pa/ but never something in between
Categorical Perception is Part of Learning a Language • Babies can discriminate /ba/ from /pa/ and can discriminate these from phonemes with intermediate voice onset times! • By 10 to 12 months, babies (learning English) stop discriminating intermediate voice onset times
Categorical Perception is Part of Learning a Language • Once category boundaries are learned it is impossible to unlearn them • non-native speakers of any language often cannot hear certain phonemes the way native speakers do • as a consequence they will always have at least some slight accent
Categorical Perception • Another example:
Perception (of all types) Makes Use of Context • The stream of information contained in speech is usually ambiguous and incomplete • Your brain makes a “best guess” based on the circumstances
Perception (of all types) Makes Use of Context • Consider the following example: shoe”. “The __eel fell of the cough car”.
Perception (of all types) Makes Use of Context • Consider the following example: • Listeners report hearing the “appropriate” phoneme during the cough shoe”. “The __eel fell of the cough car”.
Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !?
Much of Speech Perception isn’t Auditory ! • Why rely on only one sensory system when there is information in two !? • The brain seamlessly integrates any information it is given - this is called cross-modal integration
Cross-modal Integration • Speech perception involves the synthesis of vision and hearing • The McGurk effect demonstrates the critical role of vision on speech perception
Cross-modal Integration • The McGurk Effect
Cross-modal Integration • The McGurk Effect - suggests that visual and auditory information are combined to enhance speech perception under normal circumstances • When visual and auditory information are incongruous the resulting perception is unpredictable and often wrong