110 likes | 188 Views
Detecting missrecognitions. Predicting with prosody. Missrecognitions - papers. “Predicting automatic speech recognition performance using prosodic cues” - TooT “Generalizing prosodic prediction of speech recognition errors” – W99. Missrecognitions - generalities. What are they?
E N D
Detecting missrecognitions Predicting with prosody
Missrecognitions - papers • “Predicting automatic speech recognition performance using prosodic cues” - TooT • “Generalizing prosodic prediction of speech recognition errors” – W99
Missrecognitions - generalities • What are they? • WER – Word error rate • CA – concept accuracy • Why it is important to detect them? • User dificulty to correct system missundertandings • User frustration by unnecessary confirmations or rejections
Prosody to the rescue!!! • Prosodic features used: • Fundamental frequency (f0) • Energy (rms) • Duration of speaker turn (dur) • Pause preceding turn (ppau) • Speaking rate (tempo) • Silence in speaker turn (zeros)
Predicting Missrecognitions - results • Rule based learner (RIPPER) • Characteristics of missrecognitions: • Higher in pitch • Louder, longer • Less internal space • Improved prediction with prosody • TooT – 6.53% vs 22.23% • W99 – 22.77% vs 26.14%
Predicting Missrecognitions - comments • Is WER a adequate measure? • Do we model the ASR capabilities or its training set? • Comparing with ASR confidence score learning is ok?
Detecting user corrections Predicting with prosody
User corrections - papers • “Corrections in spoken dialog systems” • “Identifying user corrections automatically in spoken dialog systems”
User corrections - generalities • What are they? • Why it is important to detect them? • Recognized much more poorly • Tuning dialog strategies • ASR for hyperarticulated speech • Change of initiative and confirmation strategy
User corrections - insights • Types: • REP – repetition • PAR – paraphrase • ADD – content added • OMIT – content omitted • ADD/OMIT • Characterized by prosodic features associated with hyperarticulation – but not the same
Predicting user corrections • Rule based learner on TooT corpus • Features: PROS, ASR, SYS, POS, DIA • 15.72% error rate on Raw+ASR+ SYS+POS+PreTurn