370 likes | 589 Views
United Arab Emirates University College of Engineering. Design of Automatic Face/Voice Recognition Systems for Personal Identification. Supervisor: Dr. Farhad Kissain Mariam Al Dhuhoori 970724502 Fatema Mohammed 199902260 Laila AL Shehhi 199902258 Mona Atti AL-Rashdi 199904062.
E N D
United Arab Emirates UniversityCollege of Engineering Design of Automatic Face/Voice Recognition Systems for Personal Identification Supervisor: Dr. Farhad Kissain Mariam Al Dhuhoori 970724502 Fatema Mohammed 199902260 Laila AL Shehhi 199902258 Mona Atti AL-Rashdi 199904062
Overview • Introduction. • Main principles of speaker recognition. • Selected Method. • Speaker recognition Models. • Feature Extraction. • Feature Extraction Implementation. • Conclusion.
Objectives of our Project • Design and implement a simple face recognition system. • Design and implement an automatic voice recognition system. • The MatLAB program is used to implement the project.
Speaker Recognition methods • Text Dependent : For speaker identity is based on his/ her speaking one or more specific phase. • Text Independent: Speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying.
Selected Method • Text Independent: Identify the person who speaks regardless to what is saying.
Speech Feature Extraction • That extracts a small amount of data from the voice signal that can later be used to represent each speaker. • MFCC: is based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies.
There are silence at the beginning and at the end of the signals. The word consists of two syllables. ‘ro’ ‘ze’ Input Speech Signals
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Frame Blocking Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N). • Frame1 Consist of First N samples. • Frame2 Begins M samples after first frame, and the overlaps it by N-M samples. • Frame3 Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Frame Blocking Continuous speech signal is blocked into frames of N samples with adjacent frames being separated by M(M<N). • Frame1 Consist of First N samples. • Frame2 Begins M samples after first frame, and the overlaps it by N-M samples. • Frame3 Begins 2M samples after the first frame, and the overlaps it by N-2M samples.
After Frame Blocking • The speech signals were blocked into frames of N samples with overlap.
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum Windowing • Each individual frame will be windowed. • Hamming window is used in this project.
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert each frame of N samples form time domain into the frequency domain.
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Mel frequency scale is a linear frequency spacing below 1KHz and a logarithmic spacing above 1KHz. • Filter Bank.
Continuous speech frame spectrum Frame Blocking Windowing FFT Mel-frequency wrapping Cepstrum After Windowing • Convert the log mel spectrum back to Time domain.
Features Matching Method • The Dynamic Time Warping, (DTW) • Hidden Markov Modeling (HMM) • Vector Quantization (VQ)
Data points of all sounds after passing it into LBG Algorithm for first set
Data points of all sounds after passing it into LBG Algorithm for first set
Vector Quantization (VQ) source modeling • VQ. • cluster. • codeword. • codebook.
VQ advantages • The model is trained much faster than other method like Back Propagation. • It is able to reduce large datasets to a smaller number of codebook vectors. • Can handle data with missing values. • The generated model can be updated incrementally. • Not limited in the number of dimensions in the codebook vectors like nearest national techniques. • Easy to implementation and more accurate.
Sounds (word: “twenty”) Laila Mona Mariam Fatema S1 S1 S1 S1 S1 S2 S2 S4 S2 S3 S3 S3 S3 S3 S2 S4 S4 S2 S1 S4 Success rate in recogntion % 100% 50% 75% 50% Performance Rate
Results for test 1 (When speaker said: ”Twenty”): Speaker 1 matches with speaker 1 Speaker 2 Not Match Speaker 3 matches with speaker 3 Speaker 4 matches with speaker 4 Speaker 5 Not Match Speaker 6 Not Match