1 / 11

Detecting missrecognitions

Detecting missrecognitions. Predicting with prosody. Missrecognitions - papers. “Predicting automatic speech recognition performance using prosodic cues” - TooT “Generalizing prosodic prediction of speech recognition errors” – W99. Missrecognitions - generalities. What are they?

saman
Download Presentation

Detecting missrecognitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting missrecognitions Predicting with prosody

  2. Missrecognitions - papers • “Predicting automatic speech recognition performance using prosodic cues” - TooT • “Generalizing prosodic prediction of speech recognition errors” – W99

  3. Missrecognitions - generalities • What are they? • WER – Word error rate • CA – concept accuracy • Why it is important to detect them? • User dificulty to correct system missundertandings • User frustration by unnecessary confirmations or rejections

  4. Prosody to the rescue!!! • Prosodic features used: • Fundamental frequency (f0) • Energy (rms) • Duration of speaker turn (dur) • Pause preceding turn (ppau) • Speaking rate (tempo) • Silence in speaker turn (zeros)

  5. Predicting Missrecognitions - results • Rule based learner (RIPPER) • Characteristics of missrecognitions: • Higher in pitch • Louder, longer • Less internal space • Improved prediction with prosody • TooT – 6.53% vs 22.23% • W99 – 22.77% vs 26.14%

  6. Predicting Missrecognitions - comments • Is WER a adequate measure? • Do we model the ASR capabilities or its training set? • Comparing with ASR confidence score learning is ok?

  7. Detecting user corrections Predicting with prosody

  8. User corrections - papers • “Corrections in spoken dialog systems” • “Identifying user corrections automatically in spoken dialog systems”

  9. User corrections - generalities • What are they? • Why it is important to detect them? • Recognized much more poorly • Tuning dialog strategies • ASR for hyperarticulated speech • Change of initiative and confirmation strategy

  10. User corrections - insights • Types: • REP – repetition • PAR – paraphrase • ADD – content added • OMIT – content omitted • ADD/OMIT • Characterized by prosodic features associated with hyperarticulation – but not the same

  11. Predicting user corrections • Rule based learner on TooT corpus • Features: PROS, ASR, SYS, POS, DIA • 15.72% error rate on Raw+ASR+ SYS+POS+PreTurn

More Related