ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

ALISP based improvement of GMM’s for Text-independent Speaker Verification Dijana Petrovska-Delacrétaz 1 Asmaa el Hannani1 Gérard Chollet2 1: DIVA Group, University of Fribourg2: GET-ENST, CNRS-LTCI, Paris 3-4 December 2003, Biometrics Tutorials, Uni. Fribourg

Overview 1. Why segmental speaker verification systems ? 2. Speech segmentation problems 3. Proposed segmental system based on DTW distance measure 4. Experimental setup 5. Results 6. Conclusions and perspectives

1 Why segmental speaker verification systems ? • Current reference speaker verification systems are based on Gaussian Mixture Models (each speech frame is treated independently) • Speech is composed of different sounds • Phonemes have different discriminant characteristics for speaker verification (see Eatock, al. ‘94, J.Olsen ‘97, Petrovska al.’98, 2000…) • nasals and vowels convey more speaker characteristics than other speech classes • we would like to exploit this fact • We need a automatic speechsegmentation tool !

1.1 Advantages and disadvantages of the speech segmentation • Problems: • Need of a speech segmentation tool • Speaker modeling per speech classes => more data needed • More complicated systems • Advantages • Possibility to use it in combination with a dialogue based systems, for which a speech segmentation is already done • Possibility to introduce text-prompted speaker verification, designed to include a maximum number of speaker specific units

2 Speech Segmentation • Large Vocabulary Continuous Speech Recognition (LVCSR) System • good results for a small set of languages • need huge amount of annotated speech data • language (and task) dependent • we do not have such a for American English

2.1 ALISP Speech Segmentation • Data-driven speech segmentation • not yet usable for speech recognition purposes • no annotated databases needed • language and task independent • we could use it to segment the speech data for a text-independent speaker verification task • We will use the data driven speech segmentation method ALISP(Automatic Language Independent Speech Processing)

2.2 ALISP principles

3 Proposed speaker verification system: ALISP segments and DTW 3.1 Segmentation problem • Segmentation of the speech data with N ALISP HMM models • N= 64 speech classes • Need of (not transcribed) speech data, to train the 64 ALISP HMM models • With so much speech classes we should change the speaker modeling method , not enough data for GMM adaptation===> • Use of Dynamic Time Warping (DTW)

3.2 DTW distance measure for speaker verification • Dynamic Time Warping (DTW) was already used for speaker verification, in a text-dependent mode (Rosenberg `76, Rabiner Schafer ’76, Furui ’81, Pandit and Kittler ’98…) • The DTW distance measure between two speech segments conveys speaker specific characteristics • Originality: used DTW intext-independent mode • We first proceed to the segmentation of speech data in ALISP classes • Measure the “distance “ between speaker and non-speaker segments • Speaker specific information is extracted from the : • ALISP based speech segments = > Client Dictionary • Non-speaker (world speakers) : • ALISP based speech segments => World Dictionary

3.3 Searching in the client and world speech dictionaries for speaker verification purposes

4 Evaluation of the proposed system: experimental setup • Development data: one subset from NIST 2002 cellular data (American English) • world speakers (60 female + 59 male): • used to train the ALISP speech segmenter • and to model the non-speakers (world speakers) • Evaluated on • another subset from NIST 2002 (111 + 79 male speakers)

4.1 Speech segmentation example • 2 another occurrences of the English phone : ay ; • the corresponding ALISP sequences: HX - Hf and (HM) - Hf - Ha- • previous slide : (Hf )-Ha or (HM) - HZ -Ha

4.2 Results: GMM , ALISP-DTW systems and their fusion

4.3 Results: EER comparison

4.4 Importance of fusion (33% improvement)

4.5 Using only GMM’s scores to segments=> segmental Gmm system

5. Conclusions • State of the art NIST 2002 results for EER: (best 8% to worst 28%) • Fusion of classical system with a segmental systems : big improvements • Why: higher level informations present in the segmental system complement usefully the short therm frequency informations present in the GMM system

ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

ALISP based improvement of GMM ’ s for Text-independent Speaker Verification

Presentation Transcript

Handwritten Signature Verification

Verification As a Matter Of Course

The Continuous Improvement Classroom

Verification of Qualifications of Public Service Employees by the South African Qualifications Authority October 2009 D

The Continuous Improvement Plan

World Wide Web and Hypermedia

Speaker Verification

Independent and Dependent Clauses

Speaker Marketing from the Marketing Speaker

EXPOSITION TEXT

Transparency Improvement of Haptic-based Networked Systems

Text

Software Verification With Liquid Types

DISK

Bringing Structure to Text

Obstetrical Triage: It ’ s Not First-Come, First-Served

GPU-based Hierarchical Computations for View Independent Visibility

Functional Verification III

Addressing School Improvement through Data II

ABC: An Academic “Industrial-Strength” Verification Tool (based on a tutorial given at CAV’10)

MEMS Rigid Diaphragm Speaker

Energy Sub-Metering for Measurement and Verification