Design of Automatic Face/Voice Recognition Systems for Personal Identification

United Arab Emirates UniversityCollege of Engineering Design of Automatic Face/Voice Recognition Systems for Personal Identification Supervisor: Dr. Farhad Kissain Mariam Al Dhuhoori 970724502 Fatema Mohammed 199902260 Laila AL Shehhi 199902258 Mona Atti AL-Rashdi 199904062

Overview • Introduction. • Main principles of speaker recognition. • Selected Method. • Speaker recognition Models. • Feature Extraction. • Feature Extraction Implementation. • Conclusion.

Introduction:

Objectives of our Project • Design and implement a simple face recognition system. • Design and implement an automatic voice recognition system. • The MatLAB program is used to implement the project.

Speaker Recognition methods • Text Dependent : For speaker identity is based on his/ her speaking one or more specific phase. • Text Independent: Speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying.

Selected Method • Text Independent: Identify the person who speaks regardless to what is saying.

Speech Feature Extraction • That extracts a small amount of data from the voice signal that can later be used to represent each speaker. • MFCC: is based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies.

There are silence at the beginning and at the end of the signals. The word consists of two syllables. ‘ro’ ‘ze’ Input Speech Signals

Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Frame Blocking Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N). • Frame1 Consist of First N samples. • Frame2 Begins M samples after first frame, and the overlaps it by N-M samples. • Frame3 Begins 2M samples after the first frame, and the overlaps it by N-2M samples.

After Frame Blocking • The speech signals were blocked into frames of N samples with overlap.

Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Windowing • Each individual frame will be windowed. • Hamming window is used in this project.

After Windowing

Before Windowing After Windowing

Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert each frame of N samples form time domain into the frequency domain.

Log Scale of S1 speech wave

Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz. • Filter Bank.

Mel –Spaced Filter Bank

Mel-Frequency Wrapping

Before Mel-Frequency wrapping After Mel-Frequency wrapping

Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert the log mel spectrum back to Time domain.

Features Matching Method • The Dynamic Time Warping, (DTW) • Hidden Markov Modeling (HMM) • Vector Quantization (VQ)

Clustering of training vector

Data points for all sounds of first set

Data points of speaker 5 and speaker 6 of first set

Data points of all sounds after passing it into LBG Algorithm for first set

Vector Quantization (VQ) source modeling • VQ. • cluster. • codeword. • codebook.

VQ advantages • The model is trained much faster than other method like Back Propagation. • It is able to reduce large datasets to a smaller number of codebook vectors. • Can handle data with missing values. • The generated model can be updated incrementally. • Not limited in the number of dimensions in the codebook vectors like nearest national techniques. • Easy to implementation and more accurate.

Sounds (word: “twenty”) Laila Mona Mariam Fatema S1 S1 S1 S1 S1 S2 S2 S4 S2 S3 S3 S3 S3 S3 S2 S4 S4 S2 S1 S4 Success rate in recogntion % 100% 50% 75% 50% Performance Rate

Testing Phase of Second Set

Testing Phase of Second Set for 2 speakers

Results for test 1 (When speaker said: ”Twenty”): Speaker 1 matches with speaker 1 Speaker 2 Not Match Speaker 3 matches with speaker 3 Speaker 4 matches with speaker 4 Speaker 5 Not Match Speaker 6 Not Match

Conclusion

Design of Automatic Face/Voice Recognition Systems for Personal Identification

Design of Automatic Face/Voice Recognition Systems for Personal Identification

Presentation Transcript

Automatic Pet Feeder Product Design Specifications Jim Calnon and Jeff Rothweiler Performance

Automatic Verification of Industrial Designs

Demand-side and Supply-side Policies

Automatic Transmission Fundamentals

Automatic Forecasting with R

Automatic Text Summarization

SHARPn Face to Face Meeting June 11, 2012 Stanley M Huff, MD Chief Medical Informatics Officer

Design in Construction

3-D Face Recognition Based on Warped Example Faces

Face Alignment by Explicit Shape Regression

Automatic Generation of Taxonomies from the WWW

Conditional Random Fields for Automatic Speech Recognition

Automatic Control Theory

Program Evaluator (PEV) Face-to-Face Training

CMOS Design Methodologies

Manual and Automatic Subjectivity and Sentiment Analysis

Face Alignment at 3000 FPS via Regressing Local Binary Features

Universal Design Design Empathy

Module 24 Encoding: Getting Information In

Automatic Voltage Regulator

Conditional Random Fields for Automatic Speech Recognition