1 / 17

ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

ALISP based improvement of GMM ’ s for Text-independent Speaker Verification. Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani 1 Gérard Chollet 2 1: DIVA Group, University of Fribourg 2: GET-ENST, CNRS-LTCI, Paris 3-4 December 2003, Biometrics Tutorials, Uni. Fribourg. Overview.

rafer
Download Presentation

ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ALISP based improvement of GMM’s for Text-independent Speaker Verification Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani1 Gérard Chollet2 1: DIVA Group, University of Fribourg2: GET-ENST, CNRS-LTCI, Paris 3-4 December 2003, Biometrics Tutorials, Uni. Fribourg

  2. Overview 1. Why segmental speaker verification systems ? 2. Speech segmentation problems 3. Proposed segmental system based on DTW distance measure 4. Experimental setup 5. Results 6. Conclusions and perspectives

  3. 1 Why segmental speaker verification systems ? • Current reference speaker verification systems are based on Gaussian Mixture Models (each speech frame is treated independently) • Speech is composed of different sounds • Phonemes have different discriminant characteristics for speaker verification (see Eatock, al. ‘94, J.Olsen ‘97, Petrovska al.’98, 2000…) • nasals and vowels convey more speaker characteristics than other speech classes • we would like to exploit this fact • We need a automatic speechsegmentation tool !

  4. 1.1 Advantages and disadvantages of the speech segmentation • Problems: • Need of a speech segmentation tool • Speaker modeling per speech classes => more data needed • More complicated systems • Advantages • Possibility to use it in combination with a dialogue based systems, for which a speech segmentation is already done • Possibility to introduce text-prompted speaker verification, designed to include a maximum number of speaker specific units

  5. 2 Speech Segmentation • Large Vocabulary Continuous Speech Recognition (LVCSR) System • good results for a small set of languages • need huge amount of annotated speech data • language (and task) dependent • we do not have such a for American English

  6. 2.1 ALISP Speech Segmentation • Data-driven speech segmentation • not yet usable for speech recognition purposes • no annotated databases needed • language and task independent • we could use it to segment the speech data for a text-independent speaker verification task • We will use the data driven speech segmentation method ALISP(Automatic Language Independent Speech Processing)

  7. 2.2 ALISP principles

  8. 3 Proposed speaker verification system: ALISP segments and DTW 3.1 Segmentation problem • Segmentation of the speech data with N ALISP HMM models • N= 64 speech classes • Need of (not transcribed) speech data, to train the 64 ALISP HMM models • With so much speech classes we should change the speaker modeling method , not enough data for GMM adaptation===> • Use of Dynamic Time Warping (DTW)

  9. 3.2 DTW distance measure for speaker verification • Dynamic Time Warping (DTW) was already used for speaker verification, in a text-dependent mode (Rosenberg `76, Rabiner Schafer ’76, Furui ’81, Pandit and Kittler ’98…) • The DTW distance measure between two speech segments conveys speaker specific characteristics • Originality: used DTW intext-independent mode • We first proceed to the segmentation of speech data in ALISP classes • Measure the “distance “ between speaker and non-speaker segments • Speaker specific information is extracted from the : • ALISP based speech segments = > Client Dictionary • Non-speaker (world speakers) : • ALISP based speech segments => World Dictionary

  10. 3.3 Searching in the client and world speech dictionaries for speaker verification purposes

  11. 4 Evaluation of the proposed system: experimental setup • Development data: one subset from NIST 2002 cellular data (American English) • world speakers (60 female + 59 male): • used to train the ALISP speech segmenter • and to model the non-speakers (world speakers) • Evaluated on • another subset from NIST 2002 (111 + 79 male speakers)

  12. 4.1 Speech segmentation example • 2 another occurrences of the English phone : ay ; • the corresponding ALISP sequences: HX - Hf and (HM) - Hf - Ha- • previous slide : (Hf )-Ha or (HM) - HZ -Ha

  13. 4.2 Results: GMM , ALISP-DTW systems and their fusion

  14. 4.3 Results: EER comparison

  15. 4.4 Importance of fusion (33% improvement)

  16. 4.5 Using only GMM’s scores to segments=> segmental Gmm system

  17. 5. Conclusions • State of the art NIST 2002 results for EER: (best 8% to worst 28%) • Fusion of classical system with a segmental systems : big improvements • Why: higher level informations present in the segmental system complement usefully the short therm frequency informations present in the GMM system

More Related