350 likes | 508 Views
Speech Interface to Virtual Reality Applications. Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier. Reporter Chun-Feng Liao. References.
E N D
Speech Interface to Virtual Reality Applications Authors Wauchope, K., S. Everett, D. Tate, T. Maney M.Cernak, A.Sannier Reporter Chun-Feng Liao
References • M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002. • Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14-17, 2003, pp. 70-83. This report discuss 2 implementations of Speech Interface to Virtual Reality Applications.
Agenda • Introduction • Paper I • Paper II • Conclusion • System design Discussion
Introduction • Both papers are newly published.(2002,2003) • These 2 papers address technical details of Speech-VR integration.\ • The 2nd paper take more modern approach . • Both of them use similar architecture.(and are also similar to ours!) Ex:Choosing VRML + Java Speech API platform and encountered serveral difficult problems such as java security constraint and were force to use a “brwoser as an application ” instead of “browser as an applet”
Paper I • M.Cernak, A.Sannier ,Technical Report, “Command Speech Interface to Virtual Reality Applications”,Virtual Reality Applications Center at Iowa State University of Science and Technology, June 2002.
Purposes of this paper • Describe an approach to control VR applications using multimodal command speech interface (CSI)based on dialog modeling. • Used to imporve the usability of VRAC’s C6 . VRAC : Virtual Reality Applications Center C6 is a Virtual Reality System developed by VRAC.
Multimodal Interaction Command Addressing,used to trigger system start to record user’s voice for recognition. • U :MoleBio • S :Yes • U :(Targeting the atom 512 by mouse) • U :Go There ! • S :OK (goto Atom number 512 ). U: User , S: System
System Architecture Dialog Management and Speech facilities VR System
System Architecture • VR : VRAC’s C6 • TTS : Festival • SR : CSLU Toolkit • Platform : Windows OS on PII 400
Three Main Components(1) • Speech Synthesis (TTS) : Festival .
Three Main Components(2) • CSLU Toolkit :Dialog Modeling , Speech Recognition and Nature Language Processing. • CSLU was implemented in C and Tcl/tk , developed by OGI (Oregon Graduate Institute ) CSLU (Center of Spoken Language Understanding)
Three Main Components(3) • Communication Bridge to VR application. • To Integrate CSLU(Speech) and C6(VR).
How to Integrate CSLU and C6 • Initial Attempt : CORBA • C6 support CORBA . • Try to use “Combat” as tcl extension as CORBA Client but failed. • Try to use “Tcl Blend”: • Tck->Java->CORBA->C6 (efficient problems) • Result : use TCP socket.
Natural Language Processing • Instead of using standard JSGF , the authors use a custom grammar and wrote a specific parser to evaluate it. • Very similar to JSGF. • We will not discuss the custom grammar in detail here.
SCI Test Environment • A RAD (GUI) tool that help developers to quickly build the dialog flow.
Paper I Conclusion • Major advantage of this system is quick deployment. • The problematic area is the Speech Recognition Accuracy(provided by CSLU) was poor. • US Navy also developed a Speech Inteface to VR System , they will imporved the interaction with VR in terms of their method.
Future Work • Change TTS and SR to IBM ViaVoice . • Support JSAPI(Java Speech API) • Java is easier to communicate with C6 via CORBA.
Paper II • Wauchope, K., S. Everett, D. Tate, T. Maney, "Speech-Interactive Virtual Environments for Ship Familiarization," 2nd International EuroConference on Computer and IT Applications in the Maritime Industries (COMPIT '03), Hamburg, Germany, May 14-17, 2003, pp. 70-83.
Introduction • This paper intruduce 2 systems which help newly-aboard crews of US Navy ships to be familiar with their environment quickly. User : Tell me where is Rom 101 !
Motivation • Architects of US Navy Ships heavily use CAD tools to design ship models. • CAD file can be transferred to 3D model format with little effort. • Accroding to author’s previous research ,this Virtual Envirionment did shorten crews’ learning time.
Systems introduced • 2 Systems • MSFT(Multimodal Ship Familiarization Tool) • ISFS(Interactive Ship Familiarization System) • ISFS is a recent transition fo MSFT.
System Architecture:MSFT Run as different process
MSFT • VE veiwer component and speech interface run as two separate processes. • Speech interface : using a total IBM solution : • ViaVoice. • IBM’s SMAPI. • IBM’s SRCL grammar. Platform : PIII 500MHz
ISFS • A recent transistion of MSFT. • Using VRML as 3D modeling language. • Using JSAPI as interface to speech engine. • ViaVoice totally support JSAPI. • VRML support Java as a scripting language • Other structure is identical to MSFT system. Platform : Xeon 2.0GHz ->Need more computing power!
Why Chose to Use Standalone VRML Brwoser? • Security Limitations.(detail will be discussed later) • VM Limitations.(detail will be discussed later) • Provide opportunities to customize interface to VRML browser. In my personal experience,system usually become unstable when speech engine work with VRML Plug-in via EAI’s Java interface.
Security Limitations • JRE imposes security limitations on Java Applets. • JSAPI was unable to establish a connection with speech engine unless we explicitly reconfig the security settings.
Limited VM • Most VRML Browser ‘s EAI were implemented using ActiveX thus only support Microsoft’s old VM which dosen’t support most modern functions of Java. • Ex:This may force us to use Java AWT instead of swing which provide better GUI.
Providing GUI as VUI Fallback • GUI provides a fallback in case the speech recognizer is having trouble accurately transcribing the user’s voice. • GUI is adjusted dynamically to provide one-to-one correspondence to VUI .
Paper 2 Conclusion • The Speech Interface is needed because GUI and VE Viewer both rely on direct manipulation and keep our hand too busy. • As HCI become increasingly multimodel,care must be taken to integrate in natural manner.
Future Work • VRML is more close to Object –oriented and tree-structured. • It is hard to represent them in RDBMS. • Must find some way to store model data easily and efficiently. Personal thought : Using XML Database.
Switchable! Discussions