Speech Recognition

Speech Recognition Reporter: 林瑋勝

Outline • Introduction • System • Method and technique • Conclusion

Introduction • What is speech recognition? • Let computer realize voice of human speaking and compare with the sample of test. • The technique of speech recognition is used in various devices and software. • This technique can be designed into hardware or software.

The Application of the Speech Recogniton • http://www.wretch.cc/video/cheesebabe&func=single&vid=3751453&p=0 • http://tw.youtube.com/watch?v=nFsNOKyEs64

Method of Speech Recognition Speech data • Speech data • Use recording software to record voice. Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • Produce the value of energy • When inputting speech, we use value of energy that it produce. • Get the last 50% of the value as the lower limit. Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • Use energy to do end-point detection • Search start-point • Search end-point Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • Get MFCC • Use MFCC to get eigenvector of speech. • MFCC (Mel-frequency Cepstral Coefficients) • Pre-emphasis • Split syllable into several frame • Multiply by Hamming Windowing • Fast fourier transform Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • MFCC • Pre-emphasis • Split syllable into several frame • Multiply by Hamming Windowing • Fast fourier transform • Triangular band-pass filter • Discrete Cosine Transform Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • Discrete fourier transform • Let MFCC of each column of two-dimentional matrix do DFT. Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • Fetch information of low frequency • The information of speech collects in low frequency • Fetch top 50 lines as eigenvector and let it do Normalization • Finally, store it into another two-dimentional matrix Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Method of Speech Recognition Speech data • CompareM: eigenvector of test speechT: eigenvector of reference speech i: column number of 2-dimentional matrix j: row number of 2-dimentional matrix • Catch Minimum distance as the result. Produce the value of energy Use energy to do end-point detection Get MFCC Discrete fourier transform (DFT) Fetch information of low frequency Store into database Comparison

Conclusion • Speech recognition can let searching data become quick and easy. • The device with it is interesting for our life.

Reference • [1]王小川編著,2005,”語音訊號處理”,第三波出版。 • http://neural.cs.nthu.edu.tw/jang/ • http://www.csie.chu.edu.tw/pj_92/

Speech Recognition