1 / 24

Feb.21.2006 Rohit Kumar Affective Dialog Systems

Compensating for Hyperarticulation by Modeling Articulatory Properties Hagen Soltau, Florian Metze, Alex Waibel Interactions between Speech Recognition Problems and User Emotions Mihai Rotaru, Diane J. Litman, Kate Forbes-Riley. Feb.21.2006 Rohit Kumar Affective Dialog Systems.

oren-garcia
Download Presentation

Feb.21.2006 Rohit Kumar Affective Dialog Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compensating for Hyperarticulationby Modeling Articulatory PropertiesHagen Soltau, Florian Metze, Alex WaibelInteractions betweenSpeech Recognition Problemsand User EmotionsMihai Rotaru, Diane J. Litman, Kate Forbes-Riley Feb.21.2006 Rohit Kumar Affective Dialog Systems

  2. (Affective Computing) User Centered Computing  Audience Centered Presentation             

  3. Queries & Concerns •  What are Articulatory Features ? • Large conflicts in enumeration of these features  •  Use of Articulatory Features to detect Emotions • Training data for Hyperarticulation models • Use of Isolated words  • No Annotation of Hyperarticulation  • Methodology of data collection  • Task Specific, …   

  4. Queries & Concerns •  Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ??? • Lots of big Questions • Should we make Human like ASRs ? • Can we ? • What is different ? •  Gaussian Mixture Models (GMM) • No Significance Numbers of WERs

  5. Queries & Concerns • Applicability test of Chi – Square •   Hypothesis to explain lack of dependancies where it is expected • Users more forgiving in Tutorial Dialog (higher tolerance to error) • May be due to Conflation of Emotions • Separate out +ves and -ves • Due to YES/NO turns after semantic misrecognition • Difficult to capture emotion in Yes/No • Better recognition to not reject

  6. But before we turn into “Self” Centered Maniacs  Lets look at what Soltau and Rotaru have to say

  7. What are these papers about Both these papers are about • Automatic (& Human) Speech Recognition • Error Handling Strategies in Spoken Dialog • Interaction between Affect and Misrecognitions by ASR

  8. Soltau et. al. • Suggest that Articulatory Features to be used to improve performance of ASR in Hyperarticulated speech • Assumption: People don’t substitute whole phone to contrast a previous recognition error • Basically, more precise modeling of whats being hyperarticulated • How did they do it ? • Besides what HMM based ASRs usually do • Trained additional GMMs for Articulatory features (and also anti-models   ) • Get probability scores (from the GMMs) for the Articulatory Features • Linearly combine (with different weights) the scores from all the models • Get better hypothesis (just like “get more minutes”)

  9. Soltau et. al. (continued) (Add in if I am missing something) • Methodology • Acoustic Models • Feature Extraction (MFCC + Context reduced to 40 features by LDA transform) • Other front end processing • AF Models • Same front end • GMMs (48 per feature) trained on middle state time alignments • Data collection for Hyperarticulated speech • 2 Sessions: Normal / Induced Hyperarticulated • Simulated Recognition Errors • Subjects 45

  10. Soltau et. al. (continued) • Various Experiments • Classification of Articulatory Features • Decoding with Adapted Acoustic Models + AF • Decoding with Specialized models + AF

  11. Rotaru et. al. • Domain: Spoken Tutorial Dialog • Chaining Effect of misrecognition across turns • Recognition Problems & Emotions in student turns

  12. Rotaru et. al. (continued) • Methodology • ITSPOKE Corpus + Emotion Annotation • Student Utterances annotated by • ASR Misrecognitions • Rejections • Semantic Misrecognition • Student Emotion • Emotion Source

  13. Rotaru et. al. (continued) • Chi-Square Analysis • Rejection in previous turn vs. Rejection in current turn • ASR Mis. in previous turn vs. ASR Mis. in current turn • ASR Mis. in previous turn vs. Rejection in current turn • Rejection in previous turn vs. Emotion in current turn • Rejection in previous turn vs. Emotion Src. in current turn • Sem. Mis. in previous turn vs. Emotion in current turn • Emotion in previous turn vs. (ASR) Mis. in current turn • Emotion in current turn vs. (ASR) Mis. in current turn

  14. Articulatory Features • Speech Production Mechanism

  15. Articulatory Features • Vowels • Vowel Height • High, Mid, Low • Vowel Backwardness • Front, Mid, Back • Long / Short Vowel • Dipthong • Schwa • Lip Rounding (+/-) • Voicing ! • Oral / Nasal

  16. Articulatory Features • Consonant • Place of Articulation • Labial, Alveolar, Palatal, Labio-Dental, Dental, Velar, Glottal, {Retroflex} • Manner of Articulation • Stop, Fricative, Affricative, Nasal, Lateral, Approximant, {Liquids, Semivowels} • Voicing (+/-) Rohit Kumar, Amit Kataria, Sanjeev Sofat, "Building Non - Native Pronunciation Lexicon for English using a Rule based Approach," International Conference on Natural Language Processing (ICON) 2003, Mysore, India http://en.wikipedia.org/wiki/Articulatory_phonetics

  17. Use of Articulatory Features to detect Emotions

  18. Training data for Hyperarticulation models • Use of Isolated words • No Annotation of Hyperarticulation • Methodology of data collection • Task Specific, …

  19. Humans use Hyperarticulation to recover from error in HH interaction while Hyperarticulation is a source of error in HC interaction. Why ??? • Lots of big Questions • Should we make Human like ASRs ? • Could we ? Would we ? • What is different ?

  20. Gaussian Mixture Models Andrew Moore’s Lecture Slides Pg 7 - 10, 20 - 24 http://www.autonlab.org/tutorials/gmm.html

  21. No Significance Numbers of WERs

  22. Applicability Test of (Chi)2 The following minimum frequency thresholds should be obeyed: • for a 1 X 2 or 2 X 2 table, expected frequencies in each cell should be at least 5 • for a 2 X 3 table, expected frequencies should be at least 2 • for a 2 X 4 or 3 X 3 or larger table, if all expected frequencies but one are at least 5 and if the one small cell is at least 1, chi-square is still a good approximation In general, the greater the degrees of freedom (i.e., the more values/categories on the independent and dependent variables), the more lenient the minimum expected frequencies threshold. http://www.georgetown.edu/faculty/ballc/webtools/web_chi_tut.html

  23. Hypothesis to explain lack of dependencies where it is expected • Users more forgiving in Tutorial Dialog (higher tolerance to error) • May be due to Conflation of Emotions • Separate out +ves and -ves • Due to YES/NO turns after semantic misrecognition • Difficult to capture emotion in Yes/No • Better recognition to not reject

  24. That’s all Folks Unless you have something to say ?!

More Related