200 likes | 289 Views
Biometrics: An Appreciation Professor P.V.S. Rao Senior Professor and Head (Retd.) Computer systems and Communications Group Tata Institute of Fundamental Research Mumbai 2010. Introduction: Life … simplest single cell organisms (protozoa) Evolution -amazing variety of organisms.
E N D
Biometrics:An Appreciation Professor P.V.S. Rao Senior Professor and Head (Retd.) Computer systems and Communications Group Tata Institute of Fundamental Research Mumbai 2010
Introduction: • Life … simplest single cell organisms (protozoa) • Evolution -amazing variety of organisms. • Diversification … not unstructured • Organized structure … species. • Each species …characterising features • At levl of individual … fair degree of diversity; Enables identification: e.g. spots in giraffes, stripes in tigers and zebras and facial features and finger prints in humans).
Prevalent Means of Identification • (mutually agreed) information pass word - identification number • physical object - token or identity card • some physiological aspect - facial features, hand geometry, finger prints, iris or retinal patterns , odour, DNA structure) • some behavioural aspect (signature, voice, speech, laughter, language style gait, key-board operation style, DNA structure)
Features chosen should be • adequately unique • measurable easily and accurately • amenable to extraction of distinctive features, • replicable, stable or time invariant. • universal …(exceptions being rare) • e.g. the blind (iris) dumb (voice), without hands • non-Invasive, convenient and acceptable to user
Voice Biometrics – Advantages • acquisition easy, non-invasive, convenient • equipment simple. • remote operation possible, • transmission of signal easy • transmission infrastructure ubiquitous. Problems • Parameters not directly measurable • Rapidly changing • Intrinsic plus behavioral (pros and cons) • Sensitive to channel and microphone characterisrtics
Applications: • access to premises • access to information • authorizing transactions • identifying speaker in • threat callers • crime related conversations • check if ‘person’ is at site • weed out Prank calls on emergency lines • improve speech recognition systems
Speech signal incorporates characteristics of the individual Generic vocal tract size and shape related) Behavioural (a) vocal tract dynamics related (for all speakers) • co-articulation, • context effects from adjacent phonemes, • anticipatory articulation, (b) idiosyncratic, for each speaker the way a person speaks or writes the same word differently at different times (c) inter-speaker differences. vocal tract dynamics (some speakers even use wrong phones and phonemes)
Text Independent • VQ • Clusters of points • GMM • Mixtures of Gaussian Distr. • SV • Discriminative systems • Binary classification. Text Dependent • Template Directly compare • DTW Warp to account for speaking rates (both only signal based.. Not generative models) • HMM Sequence of states Transition Probabilities Emission Probabilities Hidden from observer . intra-speaker variability - behavioural inconsistencies .inter-speaker differences - generic (vocal tract size, shape - behavioural .articulatory dynamics
Humans use Short term information andLong term features Using LTF’s in speaker recognition means having wide window, low rates, hence much larger volumes of speech data.
Long term acoustic characteristics: • Pronunciation, Characterises speaker Important cue for humans. Use Phone sequences or n-grams Use speaker independent speech recognition for obtaining phone info, (Problem: Speech Recognition accuracy. Speaker independent systems robust but insensitive to subtle features Also, grammar used in recognition may paper over idiosyncrasies. So do not use final output of recognizer Use Lattice of choices prior to application of linguistic constraints • Accent and intonation • dynamics of phonation, • pitch and energy contours • phone and pause durations. • Language related features • vocabulary • speaking style (word and phrase patterns). • Vocabulary and style: • word frequencies • word usage patterns or word n-gram frequencies) …
Perceived performance • system performance: estimated in Lab using proper numbers of genuine speakers and imposters. In actual use, FRR=FAR • perceived performance depends on actual number of imposters and genuine speakers • False acceptance, usually not noticed. Rejections attract attention. actual rejections =Ni *(1-Fa) + Ng.Fr = Ni + Ng.Fr Extreme Case: If there are no imposters at all, all rejections would be false, system would seem to be useless.
Performance Limitations • Uncertainty in measurement and computation: • parameters are NOT physical attributes • evolution of speech towards robust intelligibility – i.e. insensitivity to sloppy articulation, inter speaker differences? • Probabilistic Models – fuzzy decisions • Variability – Intraspeaker vs interspeaker • Uniqueness not guaranteed in Nature
Variability – Intraspeaker vs interspeaker Wide-enough margins needed to accommodate intra-speaker variability • behavioural inconsistencies • long term behavioural changes (Will this cause overlaps between speakers?) Any well chosen biometric OK for small groups Confusions may increase with group size
Intuitive Illustration: State of the articulatory system – represented as a point in a parameter space As one speaks, this point traces a trajectory along time axis in a multidimensional parameter space-time manifold. • Use time warping to align trajectories for varying rates of speaking. • Different utterances: trajectories similar but not identical. • Construct tube to contain all ( - say 95%) of these trajectories. • One tube for each speaker. Confusion between two speakers will depend on degree of overlap between corresponding tubes. Super tube to contain all trajectories for all speakers • size will depend on interspeaker variabilities. • shape will be broadly the same as individual speaker tubes. • Will envelop the tubes for all speakers Super tube contained within bounds determined by perceptual limits • if any trajectory goes outside these bounds, the message would be heard wrong. Overlaps increase space gets crowded There is a limit to the number of individual (non-overlapping) tubes: i.e. number of speakers
Text independent speaker recognition • project individual trajectories to parameter space hyperplane. • Model using GMMs. Overlap between individual speakers’ GMMs increases with numbers of speakers Due to degeneracy, confusion worse than in text dependent recognition.
Uniqueness not guaranteed in Nature Evolution and uniqueness Biometric systems: the implied assumption is that each individual is unique: • i.e. more like himself at any time - than like any other individual. i.e. • Diversity of individuals is exclusive • All individuals necessarily unique This assumption is open to question. Diversity necessary in evolution, not exclusive uniqueness Diversity facilitates survival by trial and errors adaptation to the environment (try very many alternatives) Absolute uniqueness - no advantage • duplication (look-alikes) not a disadvantage… • Hence cannot be ruled out.
Lack of Uniqueness in Nature: consequences • sheep: default (dissimilar individuals) • diagonal elements - close to 1 • non-diagonal elements - small positive values. • goats - difficult to recognize; • contribute disproportionately to false rejections • diagonal elements - significantly lower than 1. • lambs - easy to imitate • victims of false acceptances • Non diagonal elements - significantly higher than zero. • wolves - good imitators • easily mistaken to be others • boost false acceptances; non-diagonal elements in the same row - high values. goats, wolves and lambs in herd of sheep, - fact of life. easy to spot - impossible to weed out,
Performance Improvement • Combine systems using complementary parameters (increases the dimensionality of the parameter space) • Combine different types of systems (e.g. generative and discriminative systems – combine GMM-SVM techniques - GSV) • Use both higher and lower level information • pitch, intonation and accent • grammar and style - vocabulary, word choice, word frequencies, part of speech frequencies, sentence structures • Implications: • longer window - much slower rates – much more trg data • also more test data • Works only for spontaneous speech • Requires speech recognition • its own errors
Performance Issues and Limits Probabilistic – Hence not ultimate Physiological and behavioural traits Variability and inconsistency vs Margin of uniqueness Uniqueness to what extent? Evolutionary Perspective Sheep, goats, lambs and wolves Performance Improvement … by increasing dimensionality/degrees of freedom Last word… ?