480 likes | 687 Views
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS. 3 rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003. Outline of Presentation. Introduction - What, Why and How? Background
E N D
A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS 3rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion
Project54 Overview • UNH / NHSP / DOJ • Integrates • Controls • Standard Interface
Introduction • What was the goal of this research? • Compare SR engine and microphone combinations • Accuracy and efficiency • Quantitatively
Introduction • Why was this research important? • Limit distraction • Limit frustration • Standard Process
Introduction • How was this goal accomplished? • 16 combinations (4 engines x 4 mics) evaluated • Speech Recognition Evaluation Program (SREP) • Simulates • Classifies • Calculates
Introduction • Accuracy • # of correct commands verses total commands • Efficiency • false recognitions • weighted
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results • Discussion • Conclusion
SR ENGINE OPTIONS • Speed of Speech • Discrete • Continuous • Type of Application • Command-and-control • Dictation • User-Dependency • Speaker dependent • Speaker independent • Field of Application • PC • Telephone • Noise robust • Grammar File
Comparing SR Engines • Field test • Simulated tests • Speaker source • Background noise • Number of speakers
Accuracy Ratings • Not consistent • Different conditions • Hyde’s Law • ‘Because speech recognisers have an accuracy of 98%, tests must be arranged to prove it’
Component Requirements • Speech Recognition Engine • Must be SAPI 4.0 • Microphone • Must be far-field • Mountable on dashboard • Cancel noise • Array • Directional
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion
LOOP ENGINES LOOP BACKGROUND LOOP COMMANDS
Obtaining Sound Files • Laptop w/ SoundBlaster • Earthworks M30BX • Background recorded on patrol • Speech commands in lab • Microsoft Audio Collection Tool • 5 Speakers (4 male, 1 female) • 40 phrases
Processing Sound Files • Matlab script Signal strength = variance(signal) + mean(signal)2 • Set volume and signal-to-noise ratio
Control File Structure • Background Noises • WAV filename • Desired SNR • Signal strength • Description of file • Voice Commands • WAV filename • Number of loops • Signal strength • Phrase
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion
PRODUCTS TESTED • Four microphones • A, B, C and D. • Four SR engines • 1, 2, 3, and 4. • 16 unique combinations • A1 through D4
SR ENGINES • SR Engine 1 • Microsoft SR Engine 4.0 • SR Engine 2 • Microsoft SR Engine 4.0 • SR Engine 3 • Dragon NaturallySpeaking 4.0 • SR Engine 4 • IBM ViaVoice 8.01
PREPERATION • Freshly installed engines • Minimum training • Default settings • Microphone Set-up Wizard
TEST SCENERIO • Identical conditions • 42 phrase grammar • 10 speech commands • 5 speakers • 6 background noises • 3 SNR levels
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion
Efficiency Score • Specific to Project54 • False recognitions
Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 0 LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS
Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1 LIGHTS UNRECOGNIZED LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS
Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1.5 LIGHTS SIREN ON SIREN OFF SIREN OFF LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS
Efficiency Score • Scoring system • Correctly recognized = 1.5 • Unrecognised = 0.5 • Falsely recognized = 0 Eff. = ((#correct * 1.5) + (#unrec. * 0.5)) / 13.5 • Extreme scores • All correct => Eff. = 100 • All unrecognised => Eff. = 33 • All falsely recognised => Eff. = 0
WINNER • Accuracy • Configuration C2 accuracy = 70.3 % • Efficiency • Configuration C2 efficiency = 72.4 • Logical choices • Microphone C • SR Engine 2
WHY LOW ACCURACIES? • Speakers SR experience • Limited training • Training Environment • Default settings • Microphone and speaker placement • SNR • Absolute scores not important
Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion
CONCLUSION • The main goal of this research was • SR engine and microphone combinations • Accuracy and efficiency • Quantitatively
CONCLUSION • This research was important in order to • Limit distraction • Limit frustration
CONCLUSION • The goal was reached by • Evaluating 16 combinations (4 engines x 4 mics) • Speech Recognition Evaluation Program (SREP) • Simulated • Classified • Calculated
CONCLUSION • Configuration C2 • Most accurate • Most efficient SR ENGINE 2 Microsoft SR Engine 4.0 Telephone mode
CURRENT STATUS • 9 vehicles on road • 300 in production • Now support non SAPI 4.0 • Evaluating new engines
MORE INFORMATION • www.project54.unh.edu • andrew.kun@unh.edu • brettv@unh.edu