Sabah Jassim University of Buckingham, UK.

SecurePhone AMulti-Modal Biometric Verifier for constrained devices Sabah Jassim University of Buckingham, UK.

Outline • The SecurePhone project • Fusion approaches to biometric-based Identification • SecurePhone multi-Modal Biometric verifier • PDA Implementation Constraints • Modalities • Fusion strategy • Performance: Match on Host (Moh) & Mach on Card (Moc) • Challenges and Potential solutions • Conclusion BioSecure + COST 2101 - March 2007

The SP consortuim The SecurePhone Project • Aims to produce a prototype of a new mobile communication system enabling biometrically authenticated users to deal legally binding m-contracts during a mobile phone call in an easy yet highly dependable and secure way using a biometric recogniser that fuses face, voice and handwritten signature. BioSecure + COST 2101 - March 2007

SecurePhone aim 1: secure exchange • Secure PKI (Public Key Infrastructure) Deal secure m-contracts during a mobile phone call • secure: private key stored on SIM card • user-friendly: intuitive, non-intrusive • flexible: legally binding text/audio transactions • dynamic: mobile e-signing “on the fly” BioSecure + COST 2101 - March 2007

Project aim 2: biometric verification face voice signature preprocessing preprocessing preprocessing modelling modelling modelling fusion client & impostor joint-score models accept userrelease private key reject user • Zero-Knowledge Authentication. BioSecure + COST 2101 - March 2007

Implementation constraints • PDA main processor is such slower processing power than PC. • Thus even on PDA verification must be very efficient. • Inadequate Audio-Visual signal sample rate using the device applications (only 8 kHz for audio and 10 fps video). • Succeeded to improved. Current SP sampling and real time pre-processing is 22 kHz audio and 20 fps video signals. • Only data on the SIM is secure, so must store and process the biometric models/templates on the SIM. Yet the SIM has very limited computational resources and processing support • SIM model storage is limited to 40 K: text-dependent promptsNote: text-independent prompts or varied text-dependent prompts are more secure, but would require 200-400 K. • Enrolment should be based on a short session (acceptability) BioSecure + COST 2101 - March 2007

Voice verification (SU / GET ENST) • Fixed 5-digits prompt – conceptually neutral, easily extendable, requires few Gaussians • 22 KHz sampling • Online energy based non-speech frame removal • MFCCs with online CMS and first time difference features – slow to compute, but fixed point faster than floating point • Features modelled by 100-Gaussian GMM pdf, with UBM for model initialisation and score normalisation • Training on data from 2 indoor and 2 outdoor recordings from one session. Testing on similar data from another session BioSecure + COST 2101 - March 2007

Signature verification (GET INT) • 2D coordinates (100 Hz) augmented by time difference features, curvature, etc. – total 19 featuresNote: no pressure or angles available, since obtained from PDA’s touch screen, not from writing pad • Shift normalisation, but no rotation or scaling • Features modelled by 100 Gaussian GMM pdf – UBM used for model initialisation and score normalisation • Fast to compute • Training and testing on data from one session BioSecure + COST 2101 - March 2007

Face Wavelet feature Representation (BU) • The Discrete Wavelet Transform (DWT) decomposes an image into a set of different frequency subbands with different resolutions, each consisting of • At a resolution depth of k, the pyramidal scheme decomposes an image I into 3k + 1 subbands: (LLk, HLk, LHk, HHk, . . . , HL1, LH1, HH1). The lowest-pass subband LLkrepresents the k-level resolution approximation of the image I. The subbands HL1, LH1, and HH1contain finest scale wavelet coefficients, and the coefficients get coarser as k increases, LLkbeing the coarsest. • Each subband of DWT-decomposed face image represents the person’s face at different frequency ranges and different scales (i.e. a distinct stream for face recognition with varying accuracy rates that can be fused for improved accuracy). BioSecure + COST 2101 - March 2007

Face verification (BU) • Static face recognition – 10 grey-scale images selected at random from a video, face area 160x192 pixels • Histogram equalisation and z-score standardisation of features are applied as simple fast light normalisation. • Haar wavelet low-low-4 (or low-high) subband as feature vectors • Other wavelet filters were tested but Haar is the fastest to compute • Features modelled by only 4 Gaussian GMM pdf – UBM used for model initialisation and score normalisation • Training on data from 2 indoor and 2 outdoor recordings from one session, testing on similar data from another session BioSecure + COST 2101 - March 2007

Fusion (GET INT) • For each modality S(i) = log p(Xi|C) - log p(Xi|I) • Score fusion was tested by: • Optimal linear weighted sum:Fused-scores = w(i) * S(i) • sum is taken over the 3 modalities • GMM scores modelling, i.e. modelling both client and impostor joint score pdf’s by diagonal covariance GMMs:Fused-score = log p(S|C) - log p(S|I) BioSecure + COST 2101 - March 2007

7 9 8 5 1 Press to start/stop speaking start/stop User verification system • User requests PDA to verify their identity • PDA requests user to • read prompt (face in box) • sign signature • Feature processing applied to each modality[silence removal, histogram equalisation, MFCC or Haar wavelets, online CMS, delta features, etc.] • for each modality S(i)=log p(Xi|C)-log p(Xi|I) • if S(i) < θ(i) for any (i) please repeat • else fused-score = log p(S|C) - log p(S|I) • if fused-score > φuser accepted • else user rejected BioSecure + COST 2101 - March 2007

Speaking face & Forgery (GET ENST) • Investigated possible attacks and forgery scenarios: • using synthesised voice and face • Difficult to create – synchronisation problems • Replay attacks – devised a successful attack whereby the client voice and face images but not the same video. • Used coupled HMM for voice and face reduced greatly the effect of this attack. BioSecure + COST 2101 - March 2007

PDA Database (PDAtabase) • After initial development with many databases [TIMIT(V), CSLU(V), BANCA(V,F), ORL(F), BIOMET(V,F,S), NIST(V)] • CSLU/BANCA-like database recorded on Qtek2020 PDA for realistic conditions (sensors, environment) • 60 English subjects: 24 for UBM, 18 for g1, 18 for g2. Accept/reject threshold optimised on g1evaluated on g2, vice versa • Video (voice + face): 18 prompts from (5-digit, 10-digit and phrase);3 sessions, with 2 inside and 2 outside recordings per session • Signatures in one session, 20 expert impostorisation for each • Virtual couplings of audio-visual with signature data (independent) • Automatic test script allows to test many possible configuration • User just provides executables for feature modelling, scores generation and scores fusion BioSecure + COST 2101 - March 2007

Match on Host (MoH): complementarity of modalities For LL subband. Already have improved results for LH subband! Result table with improved results for 5-digit and 10-digit prompts in PDAtabase (SPIE 2006) BioSecure + COST 2101 - March 2007

Match on Card (MoC) • Implementation of the MoH system on the SIMcard (MoC) • No problem in terms of storage • But is not feasible because of verification time (matching plus host/SIM communication = one hour ) A reduction of the verification time can be attained by • reducing the vector size • reducing the frame rate • reducing the number of Gaussians of the client and background models Matching time was still not acceptable BioSecure + COST 2101 - March 2007

MoC bottleneck • Not in preprocessing, since this is still all done on the PDA, as in the MoH system. • Not in face: • Although feature vectors are • Only a few (10) of them in testing • and only 4 Gaussians needed (client model and UBM) • Bottleneck caused by voice and signature data: • Vectors are relatively small, • large number of frames • large number of Gaussians BioSecure + COST 2101 - March 2007

MoC solution • Only a drastic measure can solve the problem: • Globalised features: • Features to represent the whole signature: a single vector of 41 parameters representing correlation and variation in x-y coordinates, velocity and acceleration parameters • Idea generalized to voice: use of means (cf. Long-Term Average Spectrum) and standard deviations per vector parameters across all frames • Works well for signature • Improvement: • use up to four equal subparts of signature/voice signal • Implementation: 2 equal subparts BioSecure + COST 2101 - March 2007

MoC-emulated results EER (percent) for globalised means (columns 2-5) and means plus standard deviations (columns 6-9) for voice and sinature divided into two equal subparts BioSecure + COST 2101 - March 2007

Solving the capacity problem • Possible options for improving performance of the SecurePhone: • Use match-on-server (MoS) - Security and privacy concern. • Implement the Biometric Recognizer and Encryption on a chip (more costly than current solution) • Build a secure PDA with sufficient storage and processing power (A dedicated device that would be more costly and less ubiquitous). • Split matching (hybrid MoC/MoH) considered but not implemented. Initial work is being done and results are encouraging. Promising implications for security and privacy of biometrics data (templates/models)without cryptography. BioSecure + COST 2101 - March 2007

Conclusion and Future Work • Natural, non-intrusive biometrics guarantee high user acceptance • Biometric data never leave the SIM-card.High security • Fusion of Multi-streams of single trait can lead to improved in performance (A pilot for Face was tested but not implemented in SP) • MoH is efficient with high accuracy, but vulnerable. • MoC is secure, efficiency and high accuracy cannot happen together! • Future work include: • Designing hybrid mixed client-server matching. • Investigating the privacy and security of Biometric data, using Cancellable Biometrics, specially for “Match on Server” • Improving performance of single modalities through the multi-classifier & multi-stream strategies. e.g. Face by mixing larger number of subbands at different depths BioSecure + COST 2101 - March 2007

AcknowledgementThanks to EU for funding this research through the SecurePhone(IST-2002-506883) project. BioSecure + COST 2101 - March 2007

Sabah Jassim University of Buckingham, UK.

Sabah Jassim University of Buckingham, UK.

Presentation Transcript

Buckingham Palace

Buckingham Palace

BUCKINGHAM PALACE

Buckingham Palace

Buckingham Palace

Buckingham Palace

Christopher Buckingham, Computer Science, Aston University

BUCKINGHAM PALACE

Buckingham Palace

Sabah Salih

Buckingham Palace

Sabah Salih HEPP The University of Manchester UK

Buckingham Palace

Buckingham 2008

Peter Buckingham UK Film Council

Buckingham Palace

Buckingham Palace

The snake By Jassim

Buckingham Palace

SABAH EXPLORER

SABAH EXPLORER

Buckingham Pewter