1 / 48

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS. 3 rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003. Outline of Presentation. Introduction - What, Why and How? Background

Jims
Download Presentation

A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A COMPARISON OF COMMERCIAL SPEECH RECOGNITION COMPONENTS FOR USE IN POLICE CRUISERS 3rd Annual Intelligent Vehicle Systems Symposium Andrew L. Kun Brett Vinciguerra June 11, 2003

  2. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  3. Project54 Overview • UNH / NHSP / DOJ • Integrates • Controls • Standard Interface

  4. Introduction • What was the goal of this research? • Compare SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

  5. Introduction • Why was this research important? • Limit distraction • Limit frustration • Standard Process

  6. Introduction • How was this goal accomplished? • 16 combinations (4 engines x 4 mics) evaluated • Speech Recognition Evaluation Program (SREP) • Simulates • Classifies • Calculates

  7. Introduction • Accuracy • # of correct commands verses total commands • Efficiency • false recognitions • weighted

  8. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results • Discussion • Conclusion

  9. SR ENGINE OPTIONS • Speed of Speech • Discrete • Continuous • Type of Application • Command-and-control • Dictation • User-Dependency • Speaker dependent • Speaker independent • Field of Application • PC • Telephone • Noise robust • Grammar File

  10. Comparing SR Engines • Field test • Simulated tests • Speaker source • Background noise • Number of speakers

  11. Accuracy Ratings • Not consistent • Different conditions • Hyde’s Law • ‘Because speech recognisers have an accuracy of 98%, tests must be arranged to prove it’

  12. Component Requirements • Speech Recognition Engine • Must be SAPI 4.0 • Microphone • Must be far-field • Mountable on dashboard • Cancel noise • Array • Directional

  13. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  14. LOOP ENGINES LOOP BACKGROUND LOOP COMMANDS

  15. Obtaining Sound Files • Laptop w/ SoundBlaster • Earthworks M30BX • Background recorded on patrol • Speech commands in lab • Microsoft Audio Collection Tool • 5 Speakers (4 male, 1 female) • 40 phrases

  16. Processing Sound Files • Matlab script Signal strength = variance(signal) + mean(signal)2 • Set volume and signal-to-noise ratio

  17. Control File Structure • Background Noises • WAV filename • Desired SNR • Signal strength • Description of file • Voice Commands • WAV filename • Number of loops • Signal strength • Phrase

  18. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  19. PRODUCTS TESTED • Four microphones • A, B, C and D. • Four SR engines • 1, 2, 3, and 4. • 16 unique combinations • A1 through D4

  20. SR ENGINES • SR Engine 1 • Microsoft SR Engine 4.0 • SR Engine 2 • Microsoft SR Engine 4.0 • SR Engine 3 • Dragon NaturallySpeaking 4.0 • SR Engine 4 • IBM ViaVoice 8.01

  21. PREPERATION • Freshly installed engines • Minimum training • Default settings • Microphone Set-up Wizard

  22. TEST SCENERIO • Identical conditions • 42 phrase grammar • 10 speech commands • 5 speakers • 6 background noises • 3 SNR levels

  23. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  24. ACCURACY BY ENGINE

  25. ACCURACY BY MIC

  26. RANKED ACCURACY

  27. Efficiency Score • Specific to Project54 • False recognitions

  28. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 0 LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  29. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1 LIGHTS UNRECOGNIZED LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  30. Efficiency Score SAID HEARD LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LOSS = 1.5 LIGHTS SIREN ON SIREN OFF SIREN OFF LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS LIGHTS

  31. Efficiency Score • Scoring system • Correctly recognized = 1.5 • Unrecognised = 0.5 • Falsely recognized = 0 Eff. = ((#correct * 1.5) + (#unrec. * 0.5)) / 13.5 • Extreme scores • All correct => Eff. = 100 • All unrecognised => Eff. = 33 • All falsely recognised => Eff. = 0

  32. RANKED EFFICIENCY

  33. WINNER • Accuracy • Configuration C2 accuracy = 70.3 % • Efficiency • Configuration C2 efficiency = 72.4 • Logical choices • Microphone C • SR Engine 2

  34. WHY LOW ACCURACIES? • Speakers SR experience • Limited training • Training Environment • Default settings • Microphone and speaker placement • SNR • Absolute scores not important

  35. Outline of Presentation • Introduction - What, Why and How? • Background • Speech Recognition Evaluation Program Software • Testing • Results and Discussion • Conclusion

  36. CONCLUSION • The main goal of this research was • SR engine and microphone combinations • Accuracy and efficiency • Quantitatively

  37. CONCLUSION • This research was important in order to • Limit distraction • Limit frustration

  38. CONCLUSION • The goal was reached by • Evaluating 16 combinations (4 engines x 4 mics) • Speech Recognition Evaluation Program (SREP) • Simulated • Classified • Calculated

  39. CONCLUSION • Configuration C2 • Most accurate • Most efficient SR ENGINE 2 Microsoft SR Engine 4.0 Telephone mode

  40. CURRENT STATUS • 9 vehicles on road • 300 in production • Now support non SAPI 4.0 • Evaluating new engines

  41. MORE INFORMATION • www.project54.unh.edu • andrew.kun@unh.edu • brettv@unh.edu

More Related