Auditory Speech Recognition Model Comparison: Seneff's Model in Depth

Seneff’s Auditory ModelMiriam Cordero Ruiz(SONY Advanced Technology Center Stuttgart)Leuven, july 2002

Which is the best speech recognizer?

Introduction • Auditory System • Seneff’s Model • Stage I • Stage II • Conclusions

Human Auditory System Basilar Membrane 4kHz

Human Auditory System Basilar Membrane 400Hz

Human Auditory System Basilar Membrane Critical Bands (Zwicker)

t t Human Auditory System Inner Hair Cells Neural Mecanichal

Structure of the model ENVELOPE DETECTOR Mean rate spectrum CRITICAL BAND FILTER BANK HAIR CELL SYNAPSE MODEL SYNCHRONY DETECTOR synchrony spectrum STAGE II STAGE I STAGE III

Stage I: Auditory Filter Bank 40 channels (20 - 6700 Hz) BW1channel=0,5 Barks

INITIAL COMPLEX ZEROES ZERO OF CASCADE ZERO OF CASCADE ZERO OF CASCADE ……. RESONATOR RESONATOR RESONATOR CHANNEL 1 CHANNEL 2 CHANNEL 40 Design of the Auditory Filter Bank f(Hz)

Stage II Model Physiological Data Half Wave Rectification Harmonics Firing prob. nerve fiber Short Term Adaptation synchrony reduction smooths saturated stimuli LP Filter Synchrony Automatic Gain Control Refractory Effect < 1kHz

CRITICAL BAND FILTER BANK HALFWAVE RECTIFICATION SHORT-TERM ADAPTATION LOW PASS FILTER RAPID AGC Stages I+II Model Signal Output STAGE I STAGE II

Results

Other Peripheral Models • Patterson-Meddis • Gammatone Filterbank • Lyon’s Cochlear Model • Gammatone Filterbank • Adaptation Stage

Conclusions • Based on biological data • Front-End for Speech Processing • Speech Recognition, Speaker ID, Localization…. • Better performance

Auditory Speech Recognition Model Comparison: Seneff's Model in Depth

Auditory Speech Recognition Model Comparison: Seneff's Model in Depth

Presentation Transcript