Speaker Identification and Verification

Speaker Identification and Verification Dan Burnett, Nuance 58th IETF

Terminology • Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers • Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim) • Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.

draft-burnett-mrcpext-00.txt • Created by Nuance and Intervoice • Proposes extensions to MRCP v1 (draft-shanmugham-mrcp-04.txt) • Based originally on Nuance functionality, modified to be more general • Starting point for MRCP v2 functionality discussions • Also extensions for speaker-enrolled grammars, hotword recognition, and to the recognition resource

VER-DELETE-VOICEPRINT VER-ROLLBACK GET-PARAMS SET-PARAMS VERIFY VER-FROM-BUFFER* Proposed SI/SV process(simplified, see section 6.7) VER-START-SESSION VER-BUFFERING-START VER-SET-VOICEPRINT VER-BUFFERING-CONTROL VER-FROM-BUFFER* VER-BUFFERING-STOP VER-END-SESSION * Requires active buffering and ver/id sessions.

Discussion points • Why buffering? • Registry for return info • Anything else before I convert to MRCPv2?

Voice/Text Grammar Enrollment(simplified, see section 5.5) START-ENROLLMENT-SESSION • Extension to existing recognition resource • Creates speaker-produced grammar entries • E.g., voice-enrolled entries for voice dialing • Both speech and text can be used to create grammar entries PAUSE/RESUME-ENROLLMENT-SESSION ENROLLMENT-ROLLBACK RECOGNIZE/STOP* ADD/DELETE/MODIFY-PHRASE END/ABORT-ENROLLMENT-SESSION * These methods already exist in the recognizer resource

Hotword(see section 7) • New recognition resource • Instead of listening for a set time period, listens continuously until it matches a grammar • Non-matching speech is ignored and does not affect the state of the recognizer

Other Extensions • Record method (sec. 4.4) • Allows end-pointed recording of an audio stream • Interpret method (sec. 4.5) • Behaves as a recognition except that text input is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.

Speaker Identification and Verification

Speaker Identification and Verification

Presentation Transcript

Solution Identification and Verification of Effectiveness

Speaker Verification

Speaker Verification

Speaker Verification

Speaker Identification by Combining MFCC and Phase Information

VQ speaker verification with sentence codebook

Speaker Identification using Gaussian Mixture Model

Language and Speaker Identification using Gaussian Mixture Model

Speaker Verification via Kernel Methods

Speaker Verification: Is it Industrial Strength?

Speaker Verification: From Research to Reality

SPEAKER VERIFICATION USING SUPPORT VECTOR MACHINES

A Speaker Pruning Algorithm for Real-Time Speaker Identification

Speaker Verification

Biometrics for Personal Verification/Identification

Speaker Identification Using Wavelet Analysis and ANN

Speaker Verification System using SVM

A Robust Speaker Identification System

Verification and 2017-2018 Conflicting Information, Identification, and Resolution

Speaker Identification and Verification

Speaker Identification of Customer and Agent using AWS