1 / 13

Text independent speaker identification in multilingual environments

I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez. Text independent speaker identification in multilingual environments. Contents. Introduction SR in language mismatched conditions Existent solutions Proposed solution Working database

lidia
Download Presentation

Text independent speaker identification in multilingual environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments

  2. Contents • Introduction • SR in language mismatched conditions • Existent solutions • Proposed solution • Working database • Variability measures • Experimental results • Conclusions

  3. Speaker Recognition System M TRAIN Feature Extr. Train TEST Accuracy decreases Language mismatch? Feature Extr. Score Decision

  4. Existent solutions • Multi-language training • One model trained with various languages (per speaker) • Model learns characteristics of different languages • Multi-model training • One model for each language (per speaker) • Language detector

  5. Existent solutionsDrawbacks • Possible languages must be known in advance for each speaker • Not generalizable for languages not seen during training • More recording sessions needed for training • + Time  + Money • Desired solution: Language independent • Suitable for languages not seen during training • Capable of single-language training

  6. Proposed solution • Language-independent features • Normalization? • New features? • Short-term intonation and energy values • High speaker discrimination capability • Global distribution may change little with language • Combinable with MFCC • Only in voiced frames (intonation) • High session variability • MVN for inter-session normalization

  7. Database • Bilingual Spanish-Basque speech database • 22 speakers (11 Male, 11 Female) • 4 sessions (inter-session variability) • 7 numeric sequences (8 digits) per session and language

  8. Variability measures • Adding new features ALWAYS increases separability/variability • + Speaker separability  + discrimination • + Language variability  + model/test mismatch • + Session variability  + model/test mismatch • Key issue: Does speaker separability increase more than language/session variability?

  9. Variability measures Inter-speaker variability Inter-speaker variability Inter-session variability Inter-language variability • Kullback-Leibler divergence for variability estimation • Interesting measures: • Good if new features increase these ratios

  10. Variability measures

  11. Experimental results X-Y  Training in X, testing in Y

  12. Conclusions • Short-term intonation and energy values increase language robustness • Little accuracy drop on language-matched conditions • Very useful if test language is unpredictable • Variability measures predict results reasonably • Allows easy selection of features prior to experiments

  13. I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments

More Related