160 likes | 309 Views
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification. B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet** * INT, dept EPH, 9 rue Charles Fourier, 91011 EVRY France; **ENST, Lab. CNRS-LTCI, 46 rue Barrault, 75634 Paris
E N D
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*, B. Dorizzi*, G. Chollet** * INT, dept EPH, 9 rue Charles Fourier, 91011 EVRY France; **ENST, Lab. CNRS-LTCI, 46 rue Barrault, 75634 Paris Emails: {Bao.Ly_van, Sonia.Salicetti, Bernadette.dorizzi}@int-evry.fr; {Blouet, Renouard, Chollet}@tsi.enst.fr
Overview • Introduction: Why Speech and Signature? • BIOMET database: brief description • Signature data • Speech data • Writer verification • Speaker verification systems • Fusion systems • Results and Conclusions
The BIOMET Database • 5 modalities: hand shape, fingerprints, on-line signatures, talking faces • 131 people: 50% male, 50% female • Data from 68 people for fusion • Time variability: two sessions spaced of 5 months • S. Garcia-Salicetti, C. Beumier, G. Chollet, B. Dorizzi, J. Leroux-Les Jardins, J. Lunter, Y. Ni, D. Petrovska-Delacretaz, "BIOMET: a Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities", 4th International Conference on Audio and Video-Based Biometric Person Authentication, 2003.
Azimuth (0°-359°) Altitude (0°-90°) 0° 270° 180° 90° Signatures capture • Captured on a digitizer : 200 Hz • WACOM Intuos2 A6 • 5 parameters: • Coordinates • Axial pressure • Azimuth and Altitude • 15 genuine per person • 12 forgeries per person
Signatures modeling • Preprocessing (filtering) • Feature extraction: 12 parameters • Modeling signature: continuous HMM • 2 states, 3 gaussians per state • Bagging techniques: 10 models to build an «aggregated» model (average score) • Training: 10 signatures of one session • Normalized score: |Si(O) - Si*|
Speech • Two verification systems: • Data: volontary degraded • Text-dependent: only 4 digits sequence among 10 digits (5 templates per speaker) • Text-independent: sentences extracted from the original data: • client model: trained on digits (15 seconds) and tested on sentences • world model: trained on data from 131-68 people • Methods: • Text-dependent: DTW (Dynamic Time Warping) • Text-independent: GMM (Gaussian Mixture Model)
Template speech signal Sample speech signal Text-dependent (DTW) • DTW computes the spectral distance between two template patterns DTW Score
WORLDGMMMODEL GMMMODELING WORLD DATA Front-end TARGETGMMMODEL TARGET SPEAKER GMM model adaptation Front-end Text-independent (GMM)
HYPOTH.TARGETGMM MOD. Front-end WORLDGMMMODEL Baseline GMM method l Test Speech = LLR SCORE
Fusion systems • Additive Tree Classifier (ATC) • Boosting techniques on Binary Trees • CART algorithm • Support Vector Machine (SVM) • Linear kernel • Input: • Normalized signature score • Text-dependent LLR score • Text-independent LLR score
Tree-based Approach for score fusion • Goal: finding an optimal partition R = {Rk}1k Kof the score space S=(s1, s2, s3) accordingto an Information Theory criterion • a sub-optimal solution, based on CART: • Best partition : R* = arg minR C(R) • Score estimation based on P(client|Rk) and P(world|Rk) at each node of a given tree • Use of RealAdaboost to build 50 trees per client and to obtain a robust estimation of P(client|Rk) and P(world|Rk)
Verification based on ATC • A score S=(s1, s2, s3) is presented to the system composed of 50 trees : • each tree gives as output a score, based on the affected region Rk • the LLR score is computed with P(client|Rk)and P(world|Rk) • an average score is then computed with the 50 scores
Separating hyperplans H , with the optimal hyperplan Ho Feature space Input space H y(X) X Class(X) Ho SVM principles
Fusion experiments • The 68 people database: splitted in 2 equal parts • 34 people: Fusion Learning Base (and threshold estimation for unimodal systems with the criterion min TE) • 34 people: Fusion Test Base (and test of unimodal systems) • Per person: • 5 genuine bimodal values • 12 impostor bimodal values
Model TE (%) FA (%) FR (%) Signature 11.9 [±2.7] 8.9 [±2.9] 20.1 [±6.0] Speech TI Speech 6.3 [±2.0] 2.0 [±1.4] 16.0 [±5.5] without TD Speech 10.3 [±2.6] 7.6 [±2.7] 17.0 [±5.7 noise ATC 2.8 [±1.4] 1.7 [±1.3] 5.2 [±3.3] SVM 2.7 [±1.4] 1.3 [±1.1] 5.9 [±3.6] Speech TI Speech 8.0 [±2.3] 2.0 [±1.4] 23.2 [±6.4] –10dB TD Speech 11.9 [±2.7] 7.8 [±2.7] 22.1 [±6.3] noise ATC 2.9 [±1.4] 2.5 [±1.6] 3.9 [±2.9] SVM 2.9 [±1.4] 1.9 [±1.4] 5.3 [±3.4] Speech TI Speech 17.0 [±3.1] 6.0 [±2.4] 45.0 [±7.5] 0dB TD Speech 16.5 [±3.1] 6.3 [±2.4] 42.0 [±7.4] noise ATC 6.7 [±2.1] 4.7 [±2.1] 11.2 [±4.8] SVM 5.8 [±2.0] 2.4 [±1.5] 13.6 [±5.2] Fusion Performances
Conclusions • Equivalent results of ATC and SVM: • role of Boosting (ATC) • Fusion increases performance by a factor 2 relatively to the best unimodal system (in clear or noisy environments) • Other methods to create noisy environments should be tested (not gaussian white noise but real one !) • Fusion performances should also be studied only on the 2 speech verification systems, since no noise was introduced in the signature modality