270 likes | 396 Views
Neuro-IT Roadmap: Successful in the Physical World. Robust perception Image processing Speech recognition Multimodal human machine interaction System integration Scene analysis and representation. Automotive: Overtake-Checker and Door-Opener Assistant. Dr. Axel Techmer
E N D
Neuro-IT Roadmap: Successful in the Physical World • Robust perception • Image processing • Speech recognition • Multimodal human machine interaction • System integration • Scene analysis and representation
Automotive: Overtake-Checker and Door-Opener Assistant Dr. Axel Techmer Infineon Technologies
Security: Face Detection & Recognition • Leading edge approach of face detection (University of Bochum) • Detection of face regions (a) • Pre-selecting of frontal faces (b) • Face recognition (c,d) • Elastic graph matching • Gabor Wavelet Transform Ruhr University Bochum
Vision Instruction Processor (VIP) Infineon Technologies, Corporate Research, Systems Technology
16 parallel Processing Elements Vision Instruction Processor (VIP) Prototype available since May 2001: • SIMD - Architecture • 204 instructions • 10 Million logic transistors • On-chip memory: 37KB • Technology: 0.35µm • Clock: 100 MHz • Power consumption: 100µW/MOPS • Die size: 22mm x 23mm • Peak Performance: 53 GOPS • PCI-Board with VIP and camera submodules • Software Tools for VIP: • Compiler, Debugger, Profiler • Software Tools on Host: • MS Visual C++ with VPL++-Library • Application demonstrators • Car Vision, Face recognition, MPEG2, Graphic in 0.13µm CMOS Technology: • Clock: 200 MHz • Peak Perf.: 106 GOPS • Die Size: 70 mm² • Power Consump.: 700 mW Infineon Technologies, Corporate Research, Systems Technology
othersensors Vehiclecontrol CPU othersensors Car Vision Components - Hardware Dr. Axel Techmer Infineon Technologies
Neuro-IT Roadmap: Successful in the Physical World • Robust perception • Image processing • Speech recognition • Multimodal human machine interaction • System integration • Scene analysis and representation
20 ms window |FFT| resolves neither frequency nor temporal structure • |FFT| • frequency resolution: 50 Hz • temporal resolution: 20 ms
Classical Sound Processing for Speech Recognition time structure of speech signal (<20 ms) is lost in the magnitude spectrum (|FFT|) Humans extract both temporal- and spectral information for robust speech recognition
Auditory Sound Processing sound signal ear canal middle ear
Auditory Sound Processing 100µm sound signal ear canal middle ear inner ear hydrodynamics
BW speech range speech range rate threshold Dynamic Compression in the Inner Ear Inner ear model responses to 1 kHz tones apical basal
Auditory Sound Processing sound signal ear canal middle ear inner ear hydrodynamics sensory cell synaptic mechanisms
Coding of Sound into Action Potentials regular firing pattern (Dt=10 ms f0=100 Hz) high frequency F0 low
Spectral- and Temporal Sound Processing in the Auditory Pathway
Neuro-IT Roadmap: Successful in the Physical World • Robust perception • Image processing • Speech recognition • Multimodal human machine interaction • System integration • Scene analysis and representation
Audio-Visual Speech Recognition Tracking of lip motion with sub-pixel precision
Audio-Visual Speech Recognition Tracking of lip motion with sub-pixel precision “two - one - seven - three - five - nine - eight - zero - four - six” Hidden- Markov Speech Recognizer
Multi-modal: Pointing, gaze, gestures, mimics,… Dr. Axel Steinhage, Infineon Technologies AG
Neuro-IT Roadmap: Successful in the Physical World • Robust perception • Image processing • Speech recognition • Audio-visual speech recognition • Multimodal human machine interaction • System integration • Scene analysis and representation
Man-Machine-Interaction based on natural communication channels Dr. Axel Steinhage, Infineon Technologies Items presented by VPA Virtual Personal Assistant (VPA) Cheap sensors (Webcam, Microphone) Interactive comunication between user and VPA Natural channels speech, lip-motion, gestures ...
Man-Machine-Interaction based on natural communication channels Dr. Axel Steinhage, Infineon Technologies Human expert via Advanced Videophone (HHI) Items presented by VPA Advanced Videophone Virtual Personal Assistant (VPA) Cheap sensors (Webcam, Microphone) Interactive comunication between user and VPA Natural channels speech, lip-motion, gestures ...
What do we earn from Neuro-IT ? • Sensitive Sensors • Robust perception • Image processing • Speech recognition Robust processing • “Tools for Neuroscience” “Successful in the Physical World” World knowledge “Constructed brain” • Scene analysis and representation • Intelligent human-machine interaction • Natural feedback • Intelligent virtual person “Conscious Machines” • Self learning Software “Factor 10” Digital and/or analog neuronal networks • Massively parallel processing hardware
Neuro-IT Roadmap: Successful in the Physical World Werner HemmertInfineontechnologies AGCPR-ST Prof. Dr. Dr. h.c. H.-P. Zenner Prof. Dr. A.W. Gummer Prof. Dr. D.M. Freeman Dr. M. Mermelstein, B. Tsai U. Dürig, M. Despont, G. Genolet, U. Drechsler, P. Vettiger, G. Binning Prof. Dr. U. Ramacher J.-P. de la Cruz-Guiterrez, M. Holmberg Dr. A. Steinhage, Dr. A. Techmer