90 likes | 107 Views
Explore enhancements in Speaker Identification and Verification using unique voiceprints for accurate caller recognition. This proposal expands MRCP v1 for improved voice enrollment procedures and innovative hotword recognition, along with sessions for buffering control and voiceprint management. The document also introduces methods for endpointed recording and text interpretation within the recognizer resource.
E N D
Speaker Identification and Verification Dan Burnett, Nuance 58th IETF
Terminology • Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers • Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim) • Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.
draft-burnett-mrcpext-00.txt • Created by Nuance and Intervoice • Proposes extensions to MRCP v1 (draft-shanmugham-mrcp-04.txt) • Based originally on Nuance functionality, modified to be more general • Starting point for MRCP v2 functionality discussions • Also extensions for speaker-enrolled grammars, hotword recognition, and to the recognition resource
VER-DELETE-VOICEPRINT VER-ROLLBACK GET-PARAMS SET-PARAMS VERIFY VER-FROM-BUFFER* Proposed SI/SV process(simplified, see section 6.7) VER-START-SESSION VER-BUFFERING-START VER-SET-VOICEPRINT VER-BUFFERING-CONTROL VER-FROM-BUFFER* VER-BUFFERING-STOP VER-END-SESSION * Requires active buffering and ver/id sessions.
Discussion points • Why buffering? • Registry for return info • Anything else before I convert to MRCPv2?
Voice/Text Grammar Enrollment(simplified, see section 5.5) START-ENROLLMENT-SESSION • Extension to existing recognition resource • Creates speaker-produced grammar entries • E.g., voice-enrolled entries for voice dialing • Both speech and text can be used to create grammar entries PAUSE/RESUME-ENROLLMENT-SESSION ENROLLMENT-ROLLBACK RECOGNIZE/STOP* ADD/DELETE/MODIFY-PHRASE END/ABORT-ENROLLMENT-SESSION * These methods already exist in the recognizer resource
Hotword(see section 7) • New recognition resource • Instead of listening for a set time period, listens continuously until it matches a grammar • Non-matching speech is ignored and does not affect the state of the recognizer
Other Extensions • Record method (sec. 4.4) • Allows end-pointed recording of an audio stream • Interpret method (sec. 4.5) • Behaves as a recognition except that text input is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.