480 likes | 755 Views
Auditory Perception. April 9, 2009. Auditory vs. Acoustic. So far, we’ve seen two different auditory measures: Mels (unit of perceived pitch) Auditory correlate of Hertz (frequency) Sones (unit of perceived loudness) Auditory correlate of decibels (intensity)
E N D
Auditory Perception April 9, 2009
Auditory vs. Acoustic • So far, we’ve seen two different auditory measures: • Mels (unit of perceived pitch) • Auditory correlate of Hertz (frequency) • Sones (unit of perceived loudness) • Auditory correlate of decibels (intensity) • Both were derived from pitch and loudness estimation experiments…
Masking • Another scale for measuring auditory frequency emerged in the 1960s. • This scale was inspired by the phenomenon of auditory masking. • One sound can “mask”, or obscure, the perception of another. • Unmasked: • Masked: • Q: How narrow can we make the bandwidth of the noise, before the sinewave becomes perceptible? • A: Masking bandwidth is narrower at lower frequencies.
Critical Bands • Using this methodology, researchers eventually determined that there were 24 critical bands of hearing. • The auditory system integrates all acoustic energy within each band. • Two tones within the same critical band of frequencies sound like one tone • Ex: critical band #9 ranges from 920-1080 Hz • F1 and F2 for might merge together • Each critical band 0.9 mm on the basilar membrane. • The auditory system consists of 24 band-pass filters. • Each filter corresponds to one unit on the Bark scale.
Bark Scale of Frequency • The Bark scale converts acoustic frequencies into numbers for each critical band
Bark Table Band Center Bandwidth Band Center Bandwidth 1 50 20-100 13 1850 1720-2000 2 150 100-200 14 2150 2000-2320 3 250 200-300 15 2500 2320-2700 4 350 300-400 16 2900 2700-3150 5 450 400-510 17 3400 3150-3700 6 570 510-630 18 4000 3700-4400 7 700 630-770 19 4800 4400-5300 8 840 770-920 20 5800 5300-6400 9 1000 920-1080 21 7000 6400-7700 10 1170 1080-1270 22 8500 7700-9500 11 1370 1270-1480 23 10500 9500-12000 12 1600 1480-1720 24 13500 12000-15500
Your Grandma’s Spectrograph • Originally, spectrographic analyzing filters were constructed to have either wide or narrow bandwidths.
Spectral Differences • Acoustic vs. auditory spectra of F1 and F2
Cochleagrams • Cochleagrams are spectrogram-like representations which incorporate auditory transformations for both pitch and loudness perception • Acoustic spectrogram vs. auditory cochleagram representation of Cantonese word • Check out Peter’s vowels in Praat.
Cochlear Implants • Cochlear implants transmit sound directly to the cochlea through a series of band-pass filters… • like the critical bands in our native auditory system. • These devices can benefit profoundly deaf listeners with nerve deafness. • = loss of working hair cells in the inner ear. • Contrast with: a hearing aid, which is simply an amplifier. • Old style: amplifies all frequencies • New style: amplifies specific frequencies, based on a listener’s particular hearing capabilities.
Cochlear Implants A Cochlear Implant artificially stimulates the nerves which are connected to the cochlea.
Nuts and Bolts • The cochlear implant chain of events: • Microphone • Speech processor • Electrical stimulation • What the CI user hears is entirely determined by the code in the speech processor • Number of electrodes stimulating the cochlea ranges between 8 to 22. • poor frequency resolution • Also: cochlear implants cannot stimulate the low frequency regions of the auditory nerve
Noise Vocoding • The speech processor operates like a series of critical bands. • It divides up the frequency scale into 8 (or 22) bands and stimulates each electrode according to the average intensity in each band. This results in what sounds (to us) like a highly degraded version of natural speech.
What CIs Sound Like • Check out some nursery rhymes which have been processed through a CI simulator:
CI Perception • One thing that is missing from vocoded speech is F0. • …It only encodes spectral change. • Last year, Aaron Byrnes put together an experiment testing intonation perception in CI-simulated speech for his honors thesis. • Tested: discrimination of questions vs. statements • And identification of most prominent word in a sentence. • 8 channels: • 22 channels:
The Findings • CI User: • Excellent identification of the most prominent word. • At chance (50%) when distinguishing between statements and questions. • Normal-hearing listeners (hearing simulated speech): • Good (90-95%) identification of the prominent word. • Not too shabby (75%) at distinguishing statements and questions. • Conclusion 1: F0 information doesn’t get through the CI. • Conclusion 2: Noise-vocoded speech might not be a completely accurate CI simulation.
Mitigating Factors • The amount of success with Cochlear Implants is highly variable. • Works best for those who had hearing before they became deaf. • The earlier a person receives an implant, the better they can function with it later in life. • Works best for (in order): • Environmental Sounds • Speech • Speaking on the telephone (bad) • Music (really bad)
Practical Considerations • It is largely unknown how well anyone will perform with a cochlear implant before they receive it. • Possible predictors: • lipreading ability • rapid cues for place are largely obscured by the noise vocoding process. • fMRI scans of brain activity during presentation of auditory stimuli.
Infrared Implants? • Some very recent research has shown that cells in the inner ear can be activated through stimulation by infrared light. • This may enable the eventual development of cochlear implants with very precise frequency and intensity tuning. • Another research strategy is that of trying to regrow hair cells in the inner ear.
One Last Auditory Thought • Frequency coding of sound is found all the way up in the auditory cortex. • Also: some neurons only fire when sounds change.
A Philosophical Interlude • Q: What’s a category? • A classical answer: • A category is defined by properties. • All members of the category exhibit the same properties. • No non-members of the category exhibit all of those properties. • The properties of any member of the category may be split into: • Definitive properties • Incidental properties
Classical Example • A rectangle (in Euclidean geometry) may be defined as having the following properties: • Four-sided, two-dimensional figure (quadrilateral) • Four right angles This is a rectangle.
Classical Example • Adding a third property gives the figure a different category classification: • Four-sided, two-dimensional figure (quadrilateral) • Four right angles 3. Four equally long sides This is a square.
Classical Example • Altering other properties does not change the category classification: • Four-sided, two-dimensional figure (quadrilateral) • Four right angles definitive properties 3. Four equally long sides This is still a square. A. Is red. incidental property
Classical Linguistic Categories • Formal phonology traditionally defined all possible speech sounds in terms of a limited number of properties, known as “distinctive features”. (Chomsky + Halle, 1968) • [d] = [CORONAL, +voice, -continuant, -nasal, etc.] • [n] = [CORONAL, +voice, -continuant, +nasal, etc.] • … • Similar approaches have been applied in syntactic analysis. (Chomsky, 1974) • Adjectives = [+N, +V] • Prepositions = [-N, -V]
Prototypes • The psychological reality of classical categories was called into question by a series of studies conducted by Eleanor Rosch in the 1970s. • Rosch claimed that categories were organized around privileged category members, known as prototypes. • (instead of being defined by properties) • Evidence for this theory initially came from linguistic tasks: • Semantic verification (Rosch, 1975) • Is a robin a bird? • Is a penguin a bird? • Category member naming.
Exemplar Categories • Cognitive psychologists in the late ‘70s (e.g., Medin & Schaffer, 1978) questioned the need for prototypes. • Phenomena explained by prototype theory could be explained without recourse to a category prototype. • The basic idea: • Categories are defined by extension. • Neither prototypes nor properties are necessary. • Categorization works by comparing new tokens to all exemplars in memory. • Generalization happens on the fly.
A Category, Exemplar-style “square”
Back to Perception • When people used to talk about categorical perception, they meant perception of classical categories. • A stop is either a [b] or a [g] • (no in between) • Remember: in classical categories, there are: • definitive properties • incidental properties • Q: What are the properties that define a stop category? • The definitive properties must be invariant. • (shared by all category members) • So…what are the invariant properties of stop categories?
The Acoustic Hypothesis • People have looked long and hard for invariant acoustic properties of stops, with little success. • (and some people are still looking) • Frequency values of compact (synthetic) bursts cueing different places of articulation, in various vowel contexts. • (Liberman et al., 1952)
Theoretical Revision • Since invariant acoustic properties could not be found (especially for velars)… • It was assumed that listeners perceived (articulatory) gestures, not (acoustic) sounds. • Q: What invariant articulatory properties define stop categories? • A: If they exist, they’re hard to find. • Motor Theory Revision #2: Listeners perceive “intended” gestures. • Note: “intentions” are kind of impossible to observe. • But they must be invariant…right?
Another Brick in the Wall • Another problem for motor theory: • Perception of speech sounds isn’t always categorical. • In particular: vowels are perceived in a more gradient fashion than stops. • However, vowel perception becomes more categorical when the vowels are extremely short.
It’s also hard to identify any invariant acoustic properties for vowels. • Variation is rampant across: • tokens • speakers • genders • dialects • age groups, etc. • Variability = a huge problem for speech perception.
More Problems • Also: infants exhibit categorical perception, too… • Even though they don’t know category labels. • Chinchillas can do it, too!
An Alternative • It has been proposed that phoneme categories are defined by prototypes… • which we use to identify vowels in speech. • One relevant finding: the perceptual magnet effect. • Part 1: play listeners a continuum of synthetic vowels in the neighborhood of [i]. • Task: judge how much each one sounds like [i]. • Some are better = prototypical • Others are worse = non-prototypes
Perceptual Magnets Same? Different? • Part 2: define either a prototype or a non-prototype as a category center. • Task: determine whether other vowels on the continuum belong to those categories. • Result: more same responses when the category center is a prototype. • Prototype = a “perceptual magnet”
Prototypes, continued • The perceptual magnet prototypes are usually located at a listener’s average F1 and F2 values for [i]. • 4-month olds exhibit the perceptual magnet effect… • but monkeys do not. • Note: the prototype is the only thing that has to be “invariant” about the category. • particular properties aren’t important. • Testing a prototype model on the Peterson & Barney data yielded 51% correct classification. • (Human listeners got 94% correct) • Variability is still hard to deal with.
Flipping the Script • Another approach to speech perception is to preserve all variability that we hear… • Rather than boiling it down to properties or prototypes. • In this model, speech categories are defined by extension. • = consist of exemplars • So, your mental representaton of /b/ consists of every token of /b/ you’ve ever heard in your life. • …rather than any particular acoustic or articulatory properties. • Analogy: phonetics field project notes • (your mind is a pack rat)
Exemplar Categorization • Stored memories of speech experiences are known as traces. • Each trace is linked to a category label. • Incoming speech tokens are known as probes. • A probe activates the traces it is similar to. • Note: amount of activation is proportional to similarity between trace and probe. • Traces that closely match a probe are activated a lot; • Traces that have no similarity to a probe are not activated much at all.
highly activated traces • A (pretend) example: traces = vowels from the Peterson & Barney data set. * probe • Activation of each trace is proportional to distance (in vowel space) from the probe. low activation
Echoes from the Past • The combined average of activations from exemplars in memory is summed to create an echo of the perceptual system. • This echo is more general features than either the traces or the probe. • Inspiration: Francis Galton
Exemplar Categorization II • For each category label… • The activations of the traces linked to it are summed up. • The category with the most total activation wins. • Note: we use all exemplars in memory to help us categorize new tokens. • Also: any single trace can be linked to different kinds of category labels. • Test: Peterson & Barney vowel data • Exemplar model classified 81% of vowels correctly.
Exemplar Predictions • Point: all properties of all exemplars play a role in categorization… • Not just the “definitive” ones. • Prediction: non-invariant properties of speech categories should have an effect on speech perception. • E.g., the voice in which a [b] is spoken. • Or even the room in which a [b] is spoken. • Is this true? • Let’s find out…
Another Experiment! • Circle whether each word is a new or old word in the list. • 1. 9. 17. • 2. 10. 18. • 3. 11. 19. • 4. 12. 20. • 5. 13. 21. • 6. 14. 22. • 7. 15. 23. • 8. 16. 24.
Another Experiment! • Circle whether each word is a new or old word in the list. • 25. 33. • 26. 34. • 27. 35. • 28. 36. • 29. 37. • 30. 38. • 31. 39. • 32. 40.
Continuous Word Recognition • In a “continuous word recognition” task, listeners hear a long sequence of words… • some of which are new words in the list, and some of which are repeats. • Task: decide whether each word is new or a repeat. • Twist: some repeats are presented in a new voice; • others are presented in the old (same) voice. • Finding: repetitions are identified more quickly and more accurately when they’re presented in the old voice. (Palmeri et al., 1993) • Implication: we store voice + word info together in memory.