100 likes | 456 Views
Voice Recognition for Wheelchair Control. Theo Theodoridis , Xin Liu, and Huosheng Hu. Contents. Introduction Speech Recognition Structure Microsoft Speech SDK Testing. Introduction. Task: To use voice recognition for controlling a wheelchair
E N D
Voice Recognition for Wheelchair Control Theo Theodoridis, Xin Liu, and Huosheng Hu
Contents • Introduction • Speech Recognition Structure • Microsoft Speech SDK • Testing
Introduction • Task: To use voice recognition for controlling a wheelchair • Purpose: To aid people with limited physical capability • Software: The Microsoft Speech SDK • Hardware: The Essex robotic wheelchair • Experimentation: The Essex robotic arena
Start Speech Recognition Structure Sampling real-time signals • Driving components: - Start: Capture voice command - Sampling: Sample voice signal in real-time - Calculate energy: Validate signal’s presence - Calculate zero-crossing rate: Validate signal’s changes - Calculate entropy: Validate signal’s utterance - Speech recognition by parser: Microsoft Speech SDK - Driving: Execute the commands (Forw, Back, Left, Right, Stop) Calculate energy Calculate zero-crossing rate Calculate entropy Speech recognition by parser Driving Speech Recognition flow chart
Microsoft Speech SDK • Features: ∙ Developed by Microsoft’s Speech Technologies Group ∙ Aims to recognize audio speech and perform text-to-speech synthesizing ∙ This API can be used on common programming languages including C++ • FFTW Core: ∙ FFTW is a ready-made library for computing discrete Fourier transform (DFT) ∙ Developed using the C++ language by MIT ∙ Can be used for increasing the running speed • Recognition Accuracy: ∙ Four commands are employed for control ∙ Exceptional recognition accuracy ∙ Adequate real-time control
Microsoft Speech SDK • Recognition Tests: ∙ Three noisy background cases applied: (a) Silent – no noise (66db) (b) Music – noisy background (76db) (c) Live Singing – very noisy background (76db) ∙ Accuracies achieved: (a) Silent – 91% at 66db (b) Music – 84% at 66-76db (c) Live Singing – 58% at 66-76db Overall accuracy: 77.7%
Testing • Experimental Environment 1 Environment: A simple corridor with no obstacles Task: Reach destination at the same horizontal coordinate as the origin • Experimental Environment 2 Environment: An open area with two obstacles Task: Avoid obstacles in a zigzag fashion and return back to the origin