130 likes | 324 Views
I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez. Text independent speaker identification in multilingual environments. Contents. Introduction SR in language mismatched conditions Existent solutions Proposed solution Working database
E N D
I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments
Contents • Introduction • SR in language mismatched conditions • Existent solutions • Proposed solution • Working database • Variability measures • Experimental results • Conclusions
Speaker Recognition System M TRAIN Feature Extr. Train TEST Accuracy decreases Language mismatch? Feature Extr. Score Decision
Existent solutions • Multi-language training • One model trained with various languages (per speaker) • Model learns characteristics of different languages • Multi-model training • One model for each language (per speaker) • Language detector
Existent solutionsDrawbacks • Possible languages must be known in advance for each speaker • Not generalizable for languages not seen during training • More recording sessions needed for training • + Time + Money • Desired solution: Language independent • Suitable for languages not seen during training • Capable of single-language training
Proposed solution • Language-independent features • Normalization? • New features? • Short-term intonation and energy values • High speaker discrimination capability • Global distribution may change little with language • Combinable with MFCC • Only in voiced frames (intonation) • High session variability • MVN for inter-session normalization
Database • Bilingual Spanish-Basque speech database • 22 speakers (11 Male, 11 Female) • 4 sessions (inter-session variability) • 7 numeric sequences (8 digits) per session and language
Variability measures • Adding new features ALWAYS increases separability/variability • + Speaker separability + discrimination • + Language variability + model/test mismatch • + Session variability + model/test mismatch • Key issue: Does speaker separability increase more than language/session variability?
Variability measures Inter-speaker variability Inter-speaker variability Inter-session variability Inter-language variability • Kullback-Leibler divergence for variability estimation • Interesting measures: • Good if new features increase these ratios
Experimental results X-Y Training in X, testing in Y
Conclusions • Short-term intonation and energy values increase language robustness • Little accuracy drop on language-matched conditions • Very useful if test language is unpredictable • Variability measures predict results reasonably • Allows easy selection of features prior to experiments
I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments