1 / 15

ELISA / IRISA 2002 SYSTEM DESCRIPTION

ELISA / IRISA 2002 SYSTEM DESCRIPTION. NIST Speaker Recognition Workshop, May 21-22, 2002. 1sp DETECTION Cellular Data Mathieu BEN & Raphael Blouet & Frédéric BIMBOT for the ELISA consortium. Outline. Generalities Development results 2002 Evaluation Conclusion. Generalities (1).

keefer
Download Presentation

ELISA / IRISA 2002 SYSTEM DESCRIPTION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ELISA / IRISA 2002 SYSTEM DESCRIPTION NIST Speaker Recognition Workshop, May 21-22, 2002 1sp DETECTION Cellular Data Mathieu BEN & Raphael Blouet & Frédéric BIMBOT for the ELISA consortium

  2. Outline • Generalities • Development results • 2002 Evaluation • Conclusion

  3. Generalities(1) • Acoustic analysis • • pre-emphasis (0.95) • • 20 ms window every 10 ms • • 16 filter-bank ceps. coef. (over 300 - 3400 Hz) • • delta cepstrum included (calculated over 5 vectors) • • frame removal (bi-gaussian modeling of the energy) • coef. are centered and reduced

  4. Generalities(2) • Speaker models • 128 GMMs with diagonal covariance matrices • MAP estimation of the mixture parameters (mean only adaptation) • MAP parameters are derived from gender-dependent world models.

  5. Generalities (3) • Scoring (1) • The signal is split into temporal segment of 0.3s • • segmental score : mean over all segment frames • with • • segmental score normalization : • D-Norm : new score normalization • T-Norm • DT-Norm : T-Norm associated with D-Norm

  6. Generalities (3) • Scoring (2) • Decision • Utterance Score : • mean of the segmental scores • is compared to a gender dependent threshold

  7. Generalities (4) • D-Norm : “Distance Normalization” • use of Kullback-Liebler distances between the speaker models and the world models • the KL distances are estimated with a Monte Carlo method • D-Norm does not need additional speech data

  8. Generalities (5) • D-Norm (2) • Experiments on NIST’00 database : • the KL distances are strongly correlated with the impostor mean scores • Principles of the D-Norm : • score S with claimed identity Xl

  9. Development: results (1) • D-Norm is compared to No norm and Z-Norm • DT-Norm is compared to D-Norm and T-Norm • no dev. on cellular data • dev. on NIST’00 database (landline) :

  10. Development: results (2) • No norm • D-Norm • Z-Norm

  11. Development: results (3) • D-Norm • T-Norm • DT-Norm

  12. 2002 Evaluation (1): • 4 systems submited • IRI_1 : D-Norm, 1st type world models • IRI_2 : T-Norm, 1st type world models • IRI_3 : DT_Norm, 1st type world models primary system • IRI_4 : DT_Norm, 2nd type world models • impostor population for T-Norm : cellular data from 2001 eval. (74 male speakers and 100 female speakers)

  13. 2002 Evaluation (2) • world models : • 128 GMMs (diag. cov. mat.), gender dependent • training data : cellular data from NIST’01 eval. (74 male test segments, 100 female test segments) • IRI_1, IRI_2, IRI_3 : 1st type world models world models are initialised with gender-dependent world models used in 2001 eval. (landline), and trained with an EM algorithm and a ML criterion • IRI_4 : 2nd type world models world models are adapted from gender-dependent world models used in 2001 eval. (landline) with an EM algorithm and a MAP criterion

  14. 2002 Evaluation (3)

  15. Conclusion • validation of D-Norm and DT-Norm • systems with the 2 types of world models perform comparably • work needed on cellular data

More Related