200 likes | 323 Views
Tonal Speech without Pitch. Jerry Zhu zhuxj@cs.cmu.edu 2003/7/3. What’s in your mouth. Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html. MFCC. Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html. * Focus on vocal tract shape (e.g. different vowels) * No pitch.
E N D
Tonal Speech without Pitch Jerry Zhu zhuxj@cs.cmu.edu 2003/7/3
What’s in your mouth Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html
MFCC Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html * Focus on vocal tract shape (e.g. different vowels) * No pitch
Tonal languages • Tone: variation in pitch. e.g. Mandarin, Thai http://kca.org/education/ImageView.asp?ImageID=179
MFCC disastrous for tones? • MFCC should have no pitch info. • Bad for Mandarin speech recognition? Not really why?
Hypothesis 1 • Language context helps a lot? • e.g. singing over-rides pitch • people *do* understand the lyric (sort of)
Hypothesis 2 • MFCC retains some pitch? • by imperfection • residual pitch info used by speech recognizers • Test: convert MFCC to speech, listen for tones. (TBD)
Hypothesis 3 • Do we really need pitch to perceive tones? • Test: whispered speech • Can native speakers perceive tones in whispered speech? Tony Robinson, http://mi.eng.cam.ac.uk/~ajr/SA95/node15.html
Minimum pairs • A minimum pair: two 2-char words with only 1 tonal difference. • Why not use • one-char words: to prevent over-articulating • multi-char words: hard to find min pairs.
Listener listens for the ORDERwithin each minimum pair Whisperer file Listener file
Experiment setup • Each whisperer/listener group work on about 100 different minimum pairs. • In a quiet room, 1 meter apart. Each pair whispered once. • Native speakers. (Liu J., Yu H., Zhang Y., Zhu X.)
What to expect • If there is no tonal info in whisper, listeners would guess the order with 50% accuracy.
Result significant? • Flip a coin 3 times, 2 heads 1 tail. A biased coin? • Chi-square test • Accuracy significantly better than random at p < 0.0001 (that’s *really* significant).
Accuracy breakdown . correct/total .
Accuracy breakdown . Accuracy %, significant at p<0.002 .
Summary • People do perceive tonal differences without pitch. • How? • Strength (power)? • Duration? • Subtle vocal tract shape difference?
While we are whispering... • Tonal difference (we’ve seen that) • Voiced / unvoiced consonant? time vs. dime • voice onset time http://www.indiana.edu/~hlw/PhonUnits/consonants2.html
Voiced/unvoiced consonant • [p,b], [t,d], [k,g] • Mandarin speakers 94% accuracy • Aspiration
Other languages? • Thai • Is tonal too; 5 tones. • Has [ph], [p], [b] would be interesting!