Optimal UAV Control: Speech Input vs. Manual Interface Study

Manual vs. Speech Input - Voice Interface with UAVs Presented to: Georgetown Linguistics 362 December 6, 2006 Dale Schalow AMN Corp Software ConsultingU.S. National Library of Medicine (NIH)

Paper Introduction • Official paper “MANUAL VERSUS SPEECH INPUT FOR UNMANNED AERIAL VEHICLE CONTROL STATION OPERATIONS” • Publishers & Authors • U.S. Air Force Research Laboratory (Dayton, OH) • Sytronics, Inc. (Dayton, OH) • Paper Presentation Background • Proceedings of the Human Factors & Ergonomics Society’s 47th Annual Meeting, October 2003 • SpeechTEK “Bridging the Gap” Conference, Jan/Feb 2006, San Francisco Hyatt Regency

Topics to Discuss • Overview • My interest in the paper • Experiment Methods and Results • Summary of Experiment • Bridge to NLP • For More Info…

Overview • The Big Picture: Command Unmanned Aerial Vehicles with Voice Recognition Interface • Paper attempts to proveapplied Speech Technologyis a preferred input modecompared with Manual interfaces in UAV environments • Some of the Parts… NLP HUD ASR UAV USAF Speech Manual AVO IFR

Overview • Parts Vocabulary • ASR = Automatic Speech Recognition • UAV = Unmanned Aerial Vehicle (aircraft) • Voice Macro = Spoken Word or Phrased Command • CRT = “Camera Display” • HUD = Heads-Up Display • IFR = Instrument Flight Rules pilot rating • Applied NLP Parts = Zipf’s Law, Automata, Viterbi algorithm and maybe others… • Additional Aviation or lesser NLP terms are used in the paper

My Interest in the Paper • A SpeechTEK talk given by Dr. Sally Ride of UCSD and NASA noting a strong interest among astronauts and speech technology in space experiments • Emerging commercialization issues (e.g. FAA-supervised “automated flight control”, privatized space ventures)

Experiment Methods • Ten male Instrument Flight Rules (IFR)-rated pilots are the users • Speech Input Modality • “Push-to-talk” microphone • Speech Recognition engine (Nuance v8) • Grammar structures used to support greater accuracy of speech requests • Manual Input Modality • Control stick, keyboard, trackball • “Visual Cues” output on a camera display CRT

Experiment Methods • Ground Control Station Simulator

Experiment Methods • The Pilot’s Tasks • “Voice Macros” control aircraft with speech • Motivation: 2,000 to 15,000+ grammars in advanced real-time ASRwith naturally spoken language (research and available STT solutions) • 160 Phrases used for the experiment • A construed implementation of Zipf’s Law (which I’ll link in the Experiment’s Results) • Data entry tasks included corridor navigation (e.g. banks/turns) and Info Retrieval • Visual cues of Alerts shown for 10 seconds for a Confirmation Response from the Pilot

Experiment Results • Pretense • Performance over all measures worse in High Difficulty missions compared to Low Difficulty • No further details provided aside from no significant interactions between Mission Difficulty (low/high) and Input Modes (manual/speech) • Overall Task Completion Time • Tasks completed Faster when pilots used Speech Input compared to Manual Input • Measured things like Normal Ops Tasks, Warnings, and Information Queries

Experiment Results • No. of tasks completed incorrectly with speech was < 1/3 of the number associated with manual input • “Voice Macros” - applied Zipf’s Law? • Voice macros involved fewer steps than manual input (Task Frequency) • Operators performed longer maneuversmore favorably withtheir voice (Voice Rank)

Experiment Results • Performance of speech recognition was excellent - average correct recognition across users was 95.054% (range of 86.93% to 98.29%)

Experiment Results • Time was measured between alert onset, a CRT “Visual Cue”, and user confirmation response as voice command “Confirm” • Results did show response time significantly longer for Speech Input than with Manual Input and Information Queries • Although statistically important, average difference was very short, < 1 second

Experiment Results • Over ALL Flight/Navigation Tasks, airspeed, path, and altitude errors tended to be less with Speech Input compared with Manual Input • Pilots (AVOs) personally favored Speech Input over Manual Input in final debriefing questionnaire

Experiment Summary • Definitively stated Speech Input wassuperior to Manual Input for operators in this environment • “Processing time” negligible for voice response compared with the pros of Speech Input (i.e. heads-up, hands-free environment)

Experiment Summary • The only primary difference to using speech was saying the word ‘Confirm’ • Some users said it like “Cfirm” or “Firm” • Automaton illustrates these 3 paths • WordConfirm • Phonemeskh `Ah n f `OR: m • Based on IPAAmerican EnglishAlphabet

Bridge to NLP • The main point from my view is not just that speech works ‘better’ than manual interfaces. It shows that unifying Natural Language and Speech can be applied successfully to subject matter Rules - Pilot-rated IFR Vocabularies in this case • The Final Result from the implemented methods improves controlled output with a non-option approach. This may be quite similar to a Best Path model such as the Viterbi algorithm

For More Information • Paper link • http://www.hec.afrl.af.mil/Publications/HFES03VoiceFinal%20version2.pdf • USAF “Global Hawk” in the news • CNNVideo (Nov 2006) • “Doonesbury”The Washington Post(12/03/06) • NorthropGrumman DoD Web sites

For More Information • SpeechTEK (East/West) conference • Next one in February (Hilton San Francisco)

Manual vs. Speech Input - Voice Interface with UAVs Thank you!

Optimal UAV Control: Speech Input vs. Manual Interface Study

Optimal UAV Control: Speech Input vs. Manual Interface Study

Presentation Transcript

Voice Input

Serial Input/Output Interface

Input / Output interface

Direct Speech Vs Indirect Speech

UAVs

Planning With Uncertainty for UAVs

Input Devices Manual and Automatic

Speech vs. Language

Class vs Abstract vs Interface

Active Voice VS Passive Voice

Active Voice vs Passive Voice

Module u1: Speech in the Interface 3: Speech input and output technology

SPEECH USER INTERFACE EVOLUTION

Comparison of manual vs. speech-based interaction with in-vehicle information systems

TOYOTA_FUJITSUTEN Interface MANUAL

Direct Speech VS Reported Speech

Direct Speech vs Indirect Speech

Comparison of manual vs. speech-based interaction with in-vehicle information systems

UAVs

Voice Input and Control Systems

Speech vs. Writing