80 likes | 415 Views
Speaker Identification and Verification. Dan Burnett, Nuance 58 th IETF. Terminology. Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers
E N D
Speaker Identification and Verification Dan Burnett, Nuance 58th IETF
Terminology • Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers • Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim) • Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.
draft-burnett-mrcpext-00.txt • Created by Nuance and Intervoice • Proposes extensions to MRCP v1 (draft-shanmugham-mrcp-04.txt) • Based originally on Nuance functionality, modified to be more general • Starting point for MRCP v2 functionality discussions • Also extensions for speaker-enrolled grammars, hotword recognition, and to the recognition resource
VER-DELETE-VOICEPRINT VER-ROLLBACK GET-PARAMS SET-PARAMS VERIFY VER-FROM-BUFFER* Proposed SI/SV process(simplified, see section 6.7) VER-START-SESSION VER-BUFFERING-START VER-SET-VOICEPRINT VER-BUFFERING-CONTROL VER-FROM-BUFFER* VER-BUFFERING-STOP VER-END-SESSION * Requires active buffering and ver/id sessions.
Discussion points • Why buffering? • Registry for return info • Anything else before I convert to MRCPv2?
Voice/Text Grammar Enrollment(simplified, see section 5.5) START-ENROLLMENT-SESSION • Extension to existing recognition resource • Creates speaker-produced grammar entries • E.g., voice-enrolled entries for voice dialing • Both speech and text can be used to create grammar entries PAUSE/RESUME-ENROLLMENT-SESSION ENROLLMENT-ROLLBACK RECOGNIZE/STOP* ADD/DELETE/MODIFY-PHRASE END/ABORT-ENROLLMENT-SESSION * These methods already exist in the recognizer resource
Hotword(see section 7) • New recognition resource • Instead of listening for a set time period, listens continuously until it matches a grammar • Non-matching speech is ignored and does not affect the state of the recognizer
Other Extensions • Record method (sec. 4.4) • Allows end-pointed recording of an audio stream • Interpret method (sec. 4.5) • Behaves as a recognition except that text input is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.