190 likes | 334 Views
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704, USA http://www.icsi.berkeley.edu/~steveng steveng@icsi.berkeley.edu Takayuki Arai
E N D
The Relation Between Speech Intelligibility and The Complex Modulation Spectrum Steven Greenberg International Computer Science Institute 1947 Center Street, Berkeley, CA 94704, USA http://www.icsi.berkeley.edu/~steveng steveng@icsi.berkeley.edu Takayuki Arai Department of Electrical and Electronics Engineering Sophia University, 7-1 Kioi-cho, Chiyoda-Ku, Tokyo, Japan http://www.splab.ee.sophia.ac.jp/arai arai@sophia.ac.jp
Acknowledgements and Thanks Technical Assistance Joy Hollenback, Shino Sakaguchi and Rosaria Silipo Research Funding U.S. National Science Foundation
Germane Publications PERCEPTUAL BASES OF SPEECH INTELLIGIBILITY Arai, T. and Greenberg, S. (1998) Speech intelligibility in the presence of cross-channel spectral asynchrony, IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, pp. 933-936. Greenberg, S. and Arai, T. (1998) Speech intelligibility is highly tolerant of cross-channel spectral asynchrony. Proceedings of the Joint Meeting of the Acoustical Society of America and the International Congress on Acoustics, Seattle, pp. 2677-2678. Greenberg, S. and Arai, T. (2001) The relation between speech intelligibility and the complex modulation spectrum. Proceedings of the 7th European Conference on Speech Communication and Technology (Eurospeech-2001). Greenberg, S., Arai, T. and Silipo, R. (1998) Speech intelligibility derived from exceedingly sparse spectral information, Proceedings of the International Conference on Spoken Language Processing, Sydney, pp. 74-77. Greenberg, S. (1996) Understanding speech understanding - towards a unified theory of speech perception. Proceedings of the ESCA Tutorial and Advanced Research Workshop on the Auditory Basis of Speech Perception, Keele, England, p. 1-8. Silipo, R., Greenberg, S. and Arai, T. (1999) Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations, Proceedings of the 6th European Conference on Speech Communication and Technology (Eurospeech-99). http://www.icsi.berkeley.edu/~steveng
We’’ll discuss this slide in greater detail shortly What is the Complex Modulation Spectrum? The Complex Modulation Spectrum Combines both the Magnitude and Phase of the Modulation Pattern Distributed Across the (tonotopically organized) Spectrum This Representation Predicts the Intelligibility of (Locally) Time-Reversed Speech (which Dissociates the Phase and Magnitude Components of the Modulation Spectrum) Thereby Demonstrating the Importance of Modulation Phase (across the frequency spectrum) for Understanding Spoken Language
Modulation Phase Across the Spectrum The modulation phase pattern distributed across the (tonotopically organized) frequency spectrum is most easily visualized as follows: The signal spectrum is partitioned into 15 separate 1/3-octave channels Only 4 of the channels are retained; the remaining 11 are “tossed” The upper edge of a channel is one octave below the lower edge of the adjacent (upper) channel The modulation pattern in the waveform emanating from each channel is shown Note that the timing of the peaks and valleys (i.e., phase) of the modulation pattern varies across the spectrum An earlier study (Greenberg et al., 1998), using spectrally sparse speech signals, suggested that the modulation phase pattern across the frequency spectrum could be important for intelligibility
What is (Locally) Time-Reversed Speech? Each segment of the speech signal is “flipped” on its horizontal axis The length of the segment thus flipped is the primary experimental parameter This signal manipulation has the effect of dissociating the phase and magnitude components of the modulation spectrum What impact does this manipulation have on intelligibility? Stimulus paradigm based on K. Saberi and D. Perrott (1999) “Cognitive restoration of reversed speech,” Nature 398: 760. Experimental paradigm and acoustic analysis bear virtually no relation to that described in the Saberi and Perrott study
Intelligibility of (Locally)Time-Reversed Speech What impact does local time reversal have on intelligibility? There is a progressive decline in intelligibility with increasing length of the reversed segment When the segment exceeds 40 ms the intelligibility is very poor What acoustic properties are correlated with this decline in intelligibility? Stimuli were sentences from the TIMIT corpus Sample sentence: “She washed his dark suit in greasy wash water all year” 80 different sentences, each spoken by a different speaker
Intelligibility Does NOT Depend Solely on the Magnitude Component of Modulation Spectrum Intelligibility as a function of reverse-segment length Modulation Spectrum (magnitude component only) Saberi and Perrott had conjectured that the results of their experiment could be explained on the basis of the magnitude component of the modulation spectrum Brain – 1(Cognitive) Scientists – 0
Increasing Modulation Phase Dispersion as a Function of Increasing Reversed-Segment Length Original 20 40 60 80 100 Let’s examine the relation between modulation phase and intelligibility …. Phase dispersion (relative to the original signal) across 40 sentences as a function of reversed-segment length (ms) (example = 750-1500 Hz sub-band; 4.5 Hz) Intelligibility as a function of reverse-segment length
Increasing Modulation Phase Dispersion Across Frequency as a Function of Increasing Reversed-Segment Length Let’s examine the relation between modulation phase and intelligibility from a slightly different perspective …. Phase dispersion across the spectrum for a single sentence at 4.5 Hz For reversed-segment lengths greater than 40 ms there is significant phase dispersion (relative to the original) that becomes severe for segments > 80 ms Frequency
Computing the Complex Modulation Spectrum It is important to compute the phase dispersion across the spectrum with precision and to ascertain its impact on the globalmodulation spectral representation (shown on the following slide) Complex Modulation Spectrum = Magnitude x Phase
Intelligibility is Based on BOTH the Magnitude and Phase Components of the Modulation Spectrum Intelligibility as a function of reverse-segment length Complex Modulation Spectrum (both magnitude and phase) The Relation between Intelligibility and the Complex Modulation Spectrum isn’t Bad! Complex modulation spectrum computed for all 80 sentences
Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum Complex Modulation Spectrum - Summary
Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Complex Modulation Spectrum - Summary
Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Complex Modulation Spectrum - Summary
Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Speech intelligibility IS CORRELATED with the COMPLEX modulation spectrum (magnitude x phase) Complex Modulation Spectrum - Summary
Locally time-reversed speech provides a convenient means to dissociate the magnitude and phase components of the modulation spectrum The intelligibility of time-reversed speech decreases as the segment length increases up to ca. 100 ms Speech intelligibility is NOT correlated with the magnitude component of the low-frequency modulation spectrum Speech intelligibility IS CORRELATED with the COMPLEX modulation spectrum (magnitude x phase) Thus, the phase of the modulation pattern distributed across the frequency spectrum appears to play an important role in understanding spoken language Complex Modulation Spectrum - Summary
That’s All, Folks Many Thanks for Your Time and Attention