Roy Patterson Centre for the Neural Basis of Hearing

Part II: Lent Term 2014: ( 2of 4) Central Auditory Processing Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University of Cambridge email rdp1@cam.ac.uk Lecture slides on CamTools https://camtools.cam.ac.uk/portal.html Lecture slides, sounds and background papers on http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/

The Overture Act I: the information in communication sounds (animal calls, speech, musical notes) Act II:the perception of communication sounds and the robustness of perception to changes in acoustic scale Act III: the processing of communication sounds in the auditory system (signal processing) Act IV: the processing of communication sounds (anatomy, physiology, brain imaging)

Decreasing VTL Increasing GPR Auditory perception is robustto changes in Ssand Sf (1/Sf ) (Ss ) Kawahara and Irino (2004). Principles of speech manipulation system STRAIGHT. In Speech separation by humans and machines, P. Divenyi (Ed.), Kluwer Academic, 167-179.

Rana catesbeiana (1/Sf ) Decreasing VTL (Ss) Increasing GPR

Low Long Pitch VTL High Short [Patterson, Smith, van Dinther and Walters (2008)] (Sf ) ( Ss) Time Time

Low Long VTL Pitch High Short Spectra on a linear frequency axis (Sf ) ( Ss)

Recognition of Scaled Vowels /a/ /e/ /i/ /o/ Smith, Patterson, Turner, Kawahara and Irino JASA (2005) pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf mean /u/

waveform and spectrum of a child’s /a/ Sf Ss Frequency on a logarithmic axis (octaves)

I: Humans can extract the content of the communication without being confused by size differences The Perception of Communication Sounds: Summary Psychophysical experiments confirm: http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf

Speaker Size estimates for vowels varying in GPR and VTL Size Smith and Patterson (2005) JASA pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf log(VTL) log(GPR)

II: I: Humans can extract the size information without being confused by differences in the content Humans can extract the content of the communication without being confused by size differences The Perception of Communication Sounds: Summary Psychophysical experiments also confirm: http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf

Discrimination of Ssand Sf Ss: the semitones on the keyboard differ by 5.9% Experiments with vowels show that people can reliably discriminate a 2% difference in Ss Sf: the discrimination of Sfis more tricky Present two vowels and ask: Which vowel came from the larger speaker? http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf

waveform and spectrum of a child’s /a/ Sf Ss Frequency on a logarithmic axis (octaves)

Sonorants Stops Fricatives CV’s VC’s vowels large (voiced) small (voiced) Syllable database ma na la ra wa ya ba da ga pa ta ka sa fa va za xa ha me ne le re we ye be de ge pe te ke se fe ve ze xe he mi ni li ri wi yi bi di gi pi ti ki si fi vi zi xi hi mo no lo ro wo yo bo do go po to ko so fo vo zo xo ho mu nu lu ru wu yu bu du gu pu tu ku su fu vu zu xu hu am an al ar aw ay ab ad ag ap at ak as af av az ax ah em en el er ew ey eb ed eg ep et ek es ef ev ez ex eh im in il ir iw iy ib id ig ip it ik is if iv iz ix ih om on ol or ow oy ob od og op ot ok os of ov oz ox oh um un ul ur uw uy ub ud ug up ut uk us uf uv uz ux uh aa ee ii oo uu Ives, Smith and Patterson (2005) JASA mi en ka it so us Kawahara and Irino (2004). The vocoder STRAIGHT. Kluwer Academic

interval 2 interval 1 /se/ /wa/ /ma/ /et/ pitch /ku/ /om/ /te/ /am/ Speaker-size discrimination with syllables Present two intervals of syllables and ask: Which is the larger speaker? The syllables are randomly chosen for the intervals. The overall level is varied randomly between the intervals. The pitch contours are different. The only consistent cue is a difference in VTL ( Sf ) Ives, Smith and Patterson (2005) JASA VTL = x VTL = x + Δx http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf

Experiment: Sf discrimination thresholds for five different people SER VTL/cm 10 1.65 DWARF SMALL CHILD 14 1.22 Ives, Smith and Patterson, JASA (2005) SMALL MALE LARGE MALE CASTRATO 19 0.92 Sf 160 320 80 (Ss) Glottal pulse rate / Hz

Results: all subjects, all syllables SMALL CHILD DWARF % reported larger % reported larger Trials test as smaller Trials test as smaller SMALL MALE Ives, Smith and Patterson (2005) JASA % reported larger Trials test as smaller CASTRATO LARGE MALE % reported larger % reported larger Trials test as smaller Trials test as smaller

Results: all subjects, all syllables DWARF SMALL CHILD average JND across syllable category for specific speaker type. SMALL MALE grand average JND for the experiment basically 5%, independent of acoustic scale CASTRATO LARGE MALE http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf

III: Auditory perception is amazingly robust to changes in acoustic scale (Ssand/orSf) in communication sounds II: I: Humans can extract the size information without being confused by the content of the communication Humans can extract the content of the communication without being confused by the size information The Perception of Communication Sounds: Summary The acoustic scale values in communication sounds tell us which individual, within a population, is speaking or which instrument, within a family, is playing Psychophysical experiments confirm:

End of Act II Thank you Smith, D. R. R., Patterson, R. D., Turner, R., Kawahara, H., and Irino, T. (2005). "The processing and perception of size information in speech sounds," J. Acoust. Soc. Am.117,305-318. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf Smith, D. R. R. and Patterson, R. D. (2005). "The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age," J. Acoust. Soc. Am. 118,3177-3186. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf Ives, D. T., Smith, D. R. R. and Patterson, R. D. (2005). "Discrimination of speaker size from syllable phrases," J. Acoust. Soc. Am. 118 (6), 3816-3822. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf

Concurrent Speech and the cocktail party Colin Cherry (1952)

Sonorants Stops Fricatives CV’s VC’s vowels Syllable database ma na la ra wa ya ba da ga pa ta ka sa fa va za xa ha me ne le re we ye be de ge pe te ke se fe ve ze xe he mi ni li ri wi yi bi di gi pi ti ki si fi vi zi xi hi mo no lo ro wo yo bo do go po to ko so fo vo zo xo ho mu nu lu ru wu yu bu du gu pu tu ku su fu vu zu xu hu am an al ar aw ay ab ad ag ap at ak as af av az ax ah em en el er ew ey eb ed eg ep et ek es ef ev ez ex eh im in il ir iw iy ib id ig ip it ik is if iv iz ix ih om on ol or ow oy ob od og op ot ok os of ov oz ox oh um un ul ur uw uy ub ud ug up ut uk us uf uv uz ux uh aa ee ii oo uu

concurrent-speech: experimental paradigm Identify the syllable in the interval that stays lit wu osh Vestergaard et al (2009) JASA

Robustness of speech perception

Target Distracter 0 0 0 200 200 200 400 400 400 600 600 600 ms ms ms

Sonorants (semivowels) Stops (plosives) Fricatives Target Distracter 0 0 0 200 200 200 400 400 400 600 600 600 ms ms ms Concurrent-speech paradigm de mi ki lu ez osh • Target triplet: de mi osh • Masker triplet: ki lu ez • Concurrently at 0 dB SNR • Pre-cursor, 0 dB SNR Vestergaard et al (2009) JASA

Ss Sf Vestergaard, Fyson and Patterson, JASA, 2009

Roy Patterson Centre for the Neural Basis of Hearing