130 likes | 157 Views
This research explores text-independent speaker identification in multilingual environments, focusing on language-independent solutions. It examines existing approaches, proposes a new language-independent system, uses a bilingual database, and analyzes variability measures to enhance speaker discrimination. Experimental results demonstrate the effectiveness of the proposed system in adapting to various languages and session conditions.
E N D
I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments
Contents • Introduction • SR in language mismatched conditions • Existent solutions • Proposed solution • Working database • Variability measures • Experimental results • Conclusions
Speaker Recognition System M TRAIN Feature Extr. Train TEST Accuracy decreases Language mismatch? Feature Extr. Score Decision
Existent solutions • Multi-language training • One model trained with various languages (per speaker) • Model learns characteristics of different languages • Multi-model training • One model for each language (per speaker) • Language detector
Existent solutionsDrawbacks • Possible languages must be known in advance for each speaker • Not generalizable for languages not seen during training • More recording sessions needed for training • + Time + Money • Desired solution: Language independent • Suitable for languages not seen during training • Capable of single-language training
Proposed solution • Language-independent features • Normalization? • New features? • Short-term intonation and energy values • High speaker discrimination capability • Global distribution may change little with language • Combinable with MFCC • Only in voiced frames (intonation) • High session variability • MVN for inter-session normalization
Database • Bilingual Spanish-Basque speech database • 22 speakers (11 Male, 11 Female) • 4 sessions (inter-session variability) • 7 numeric sequences (8 digits) per session and language
Variability measures • Adding new features ALWAYS increases separability/variability • + Speaker separability + discrimination • + Language variability + model/test mismatch • + Session variability + model/test mismatch • Key issue: Does speaker separability increase more than language/session variability?
Variability measures Inter-speaker variability Inter-speaker variability Inter-session variability Inter-language variability • Kullback-Leibler divergence for variability estimation • Interesting measures: • Good if new features increase these ratios
Experimental results X-Y Training in X, testing in Y
Conclusions • Short-term intonation and energy values increase language robustness • Little accuracy drop on language-matched conditions • Very useful if test language is unpredictable • Variability measures predict results reasonably • Allows easy selection of features prior to experiments
I. Luengo, E. Navas, I. Sainz, I. Saratxaga, J. Sanchez, I. Odriozola and I. Hernaez Text independent speaker identification in multilingual environments