150 likes | 289 Views
ELISA / IRISA 2002 SYSTEM DESCRIPTION. NIST Speaker Recognition Workshop, May 21-22, 2002. 1sp DETECTION Cellular Data Mathieu BEN & Raphael Blouet & Frédéric BIMBOT for the ELISA consortium. Outline. Generalities Development results 2002 Evaluation Conclusion. Generalities (1).
E N D
ELISA / IRISA 2002 SYSTEM DESCRIPTION NIST Speaker Recognition Workshop, May 21-22, 2002 1sp DETECTION Cellular Data Mathieu BEN & Raphael Blouet & Frédéric BIMBOT for the ELISA consortium
Outline • Generalities • Development results • 2002 Evaluation • Conclusion
Generalities(1) • Acoustic analysis • • pre-emphasis (0.95) • • 20 ms window every 10 ms • • 16 filter-bank ceps. coef. (over 300 - 3400 Hz) • • delta cepstrum included (calculated over 5 vectors) • • frame removal (bi-gaussian modeling of the energy) • coef. are centered and reduced
Generalities(2) • Speaker models • 128 GMMs with diagonal covariance matrices • MAP estimation of the mixture parameters (mean only adaptation) • MAP parameters are derived from gender-dependent world models.
Generalities (3) • Scoring (1) • The signal is split into temporal segment of 0.3s • • segmental score : mean over all segment frames • with • • segmental score normalization : • D-Norm : new score normalization • T-Norm • DT-Norm : T-Norm associated with D-Norm
Generalities (3) • Scoring (2) • Decision • Utterance Score : • mean of the segmental scores • is compared to a gender dependent threshold
Generalities (4) • D-Norm : “Distance Normalization” • use of Kullback-Liebler distances between the speaker models and the world models • the KL distances are estimated with a Monte Carlo method • D-Norm does not need additional speech data
Generalities (5) • D-Norm (2) • Experiments on NIST’00 database : • the KL distances are strongly correlated with the impostor mean scores • Principles of the D-Norm : • score S with claimed identity Xl
Development: results (1) • D-Norm is compared to No norm and Z-Norm • DT-Norm is compared to D-Norm and T-Norm • no dev. on cellular data • dev. on NIST’00 database (landline) :
Development: results (2) • No norm • D-Norm • Z-Norm
Development: results (3) • D-Norm • T-Norm • DT-Norm
2002 Evaluation (1): • 4 systems submited • IRI_1 : D-Norm, 1st type world models • IRI_2 : T-Norm, 1st type world models • IRI_3 : DT_Norm, 1st type world models primary system • IRI_4 : DT_Norm, 2nd type world models • impostor population for T-Norm : cellular data from 2001 eval. (74 male speakers and 100 female speakers)
2002 Evaluation (2) • world models : • 128 GMMs (diag. cov. mat.), gender dependent • training data : cellular data from NIST’01 eval. (74 male test segments, 100 female test segments) • IRI_1, IRI_2, IRI_3 : 1st type world models world models are initialised with gender-dependent world models used in 2001 eval. (landline), and trained with an EM algorithm and a ML criterion • IRI_4 : 2nd type world models world models are adapted from gender-dependent world models used in 2001 eval. (landline) with an EM algorithm and a MAP criterion
Conclusion • validation of D-Norm and DT-Norm • systems with the 2 types of world models perform comparably • work needed on cellular data