Word Recognition Device

Word Recognition Device C.K. Liang & Oliver Tsai

Why is speech recognition important? • Several real world applications. • Dictation devices/software i.e. Dragon Naturally Speaking. • Voice activated devices may be used to dial telephone numbers, change preset buttons in car audio, change t.v. stations, and several other possibilities.

How is this possible? • Linear Predictive Coding (LPC) • LPC models waveform like Infinite Impulse (IIR) Filter. • Uses the feedback from past inputs and past outputs to predict future outputs

IIR Filter a(1)*y(n) = b(1)*x(n) + b(2)*x(n-1) + ...+b(nb+1)*x(n-nb) - a(2)*y(n-1)-…-a(na+1)*y(n-na)

How do we use LPC for speech recognition? • Record human speech • Pre-emphasis • Convolution pre-emphasis filter with waveform

Pre-emphasis Filter

Why are vowel sound used ?

Hamming Window • Multiply the 240 samples point by point with hamming window • Reduce the amplitude on both ends of the window frame

Waveform of a consonant sound

Variance Sound analysis summary LPC Coefficients

General Block Diagram A/D converter 8000 samples/sec Pre-emphasis filter Frame Blocking 30ms window framing Hamming Window Levinson-Durbin Algorithm Auto-Correlation SSD Comparison Output 4 digital bits

Implementation on Motorola DSP56303 • Train Device for vowel sound template • Recognition Device for vowels

Training for sound template • Detect beginning of speech • Pre-emphasize 2000 input samples • Hamming window 240-sample frame • Calculate 10 LPC coefficients • Repeat 10 times and store 10 sets of LPC coefficients

Recognition Device • Detect beginning of speech • Pre-emphasize 2000 input samples • Create window frame by shifting 80 samples • Hamming window each frame • Find 10 LPC coefficients for each frame • Compute SSD between the coefficients and those in template

Output Hardware Map 4 output bits from DSP board to 10 corresponding vowel LEDs plus 1 volume indicator LED with NAND chips

Difficulties encountered • Insufficient data memory • Indirect connection between microphone and the DSP board • Incompatible I/O core302 assembly file • Low volume for the sound input

Further Expansion • Speech compression • Large vocabulary continuous speech recognition with Hidden Markov Model

H(Z) = G/(1+A1 Z-1+A2 Z-2 + …. + A10 Z-10) 239 Ri =  x(n) x(n-i) n=i for i = 1 to 10 Autocorrelation

Levinson-Durbin Algorithm R0 R1 R2 …. R9 A1 R1 R1 R0 R1 …. R8 A2 R2 R1 R0 R1 …. R8 A3 = - R3 …………………… …. …. R9 R8 R7 …. R0 A10 R10 An(i) = An-1(i) + Kn An-1(n-i) Kn = (-1/En-1)  An-1(I) Rn-i (i = 0 to n-1) En = En-1 (1-Kn2 )

Word Recognition Device

Word Recognition Device

Presentation Transcript

Auditory Word Recognition

Word Recognition

Phonics, Word Recognition, and Spelling

(Off-Line) Cursive Word Recognition

Handwritten Word Recognition ( preprocessing )

Boll Weevil Capture/Recognition Device

Visual Word Recognition

Visual Word Recognition

Visual Word Recognition II

Word Recognition Strategies

PICTURE V. WORD RECOGNITION

Spelling Phonics and Word Recognition

Gesture Recognition Interface Device

Decoding and word recognition

Gesture Recognition Interface Device

Word Recognition Inventory

Phonics, Word Recognition, and Spelling

Developing Word Recognition Skills

Continuous Word Recognition

Word Recognition

Wake-Up-Word Speech Recognition:

Word Recognition (Sereno, 4/04)