190 likes | 364 Views
Word Recognition Device . C.K. Liang & Oliver Tsai. Why is speech recognition important?. Several real world applications. Dictation devices/software i.e. Dragon Naturally Speaking.
E N D
Word Recognition Device C.K. Liang & Oliver Tsai
Why is speech recognition important? • Several real world applications. • Dictation devices/software i.e. Dragon Naturally Speaking. • Voice activated devices may be used to dial telephone numbers, change preset buttons in car audio, change t.v. stations, and several other possibilities.
How is this possible? • Linear Predictive Coding (LPC) • LPC models waveform like Infinite Impulse (IIR) Filter. • Uses the feedback from past inputs and past outputs to predict future outputs
IIR Filter a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ...+b(nb+1)*x(n-nb) - a(2)*y(n-1)-…-a(na+1)*y(n-na)
How do we use LPC for speech recognition? • Record human speech • Pre-emphasis • Convolution pre-emphasis filter with waveform
Hamming Window • Multiply the 240 samples point by point with hamming window • Reduce the amplitude on both ends of the window frame
Variance Sound analysis summary LPC Coefficients
General Block Diagram A/D converter 8000 samples/sec Pre-emphasis filter Frame Blocking 30ms window framing Hamming Window Levinson-Durbin Algorithm Auto-Correlation SSD Comparison Output 4 digital bits
Implementation on Motorola DSP56303 • Train Device for vowel sound template • Recognition Device for vowels
Training for sound template • Detect beginning of speech • Pre-emphasize 2000 input samples • Hamming window 240-sample frame • Calculate 10 LPC coefficients • Repeat 10 times and store 10 sets of LPC coefficients
Recognition Device • Detect beginning of speech • Pre-emphasize 2000 input samples • Create window frame by shifting 80 samples • Hamming window each frame • Find 10 LPC coefficients for each frame • Compute SSD between the coefficients and those in template
Output Hardware Map 4 output bits from DSP board to 10 corresponding vowel LEDs plus 1 volume indicator LED with NAND chips
Difficulties encountered • Insufficient data memory • Indirect connection between microphone and the DSP board • Incompatible I/O core302 assembly file • Low volume for the sound input
Further Expansion • Speech compression • Large vocabulary continuous speech recognition with Hidden Markov Model
H(Z) = G/(1+A1 Z-1+A2 Z-2 + …. + A10 Z-10) 239 Ri = x(n) x(n-i) n=i for i = 1 to 10 Autocorrelation
Levinson-Durbin Algorithm R0 R1 R2 …. R9 A1 R1 R1 R0 R1 …. R8 A2 R2 R1 R0 R1 …. R8 A3 = - R3 …………………… …. …. R9 R8 R7 …. R0 A10 R10 An(i) = An-1(i) + Kn An-1(n-i) Kn = (-1/En-1) An-1(I) Rn-i (i = 0 to n-1) En = En-1 (1-Kn2 )