110 likes | 231 Views
Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett mbett@cs.cmu.edu 3/26/99. 1 - Overview. Description of the Multimodal Toolkit (MMI). What MMI is . Integrated Speech, Handwriting, and Gesture Recognizer Java Based API
E N D
Component Description Multimodal Interface Carnegie Mellon University Prepared by: Michael Bett mbett@cs.cmu.edu 3/26/99
1 - Overview • Description of the Multimodal Toolkit (MMI) What MMI is ... • Integrated Speech, Handwriting, and Gesture Recognizer Java Based API • Integrated Recording Feature • Plug-n-Play Recognizer Interface. Allows recognizers to be replaced • Internet Enabled Interface. Recognizers may run remotely over the internet • Simultaneous Multiple User Support • Supports Natural Interface Development
2 - Architecture Overview • MMI is a toolkit that allows multiple modalities to be easily integrated into applications. • Applications can mixed modalites (speech, gesture, and handwriting) Sample Application Which Uses Multimodal Error Repair Speech Janus/Speech Recognizer Acoustic Model The Java based API communicates directly with each recognizer The multimodal applet is the user interface; the applet window presents a view onto a domain-dependent representation of application data and state in the form of objects to be manipulated. Handwriting Vocabulary Multimodal Applet Multimodal Server Handwriting Recognizer Language Model Gesture Recognizer Gestures
3 - Component Description The following modalites have the following level of support in multimodal toolkit
4 - External Interfaces • The user defines their grammer using six probabilistically weighted nodes: • A Toplevel represents an entire input model and contains one or more sequences, each of which contains exactly one AFrame; • An AFrame represents an action frame and contains one or more sequences, each of which consists of one or more PSlots; • A PSlot represents a parameter slot and contains one or more UnimodalNodes (at most one for each input modality); • A UnimodalNode specifies a sub-grammar for a single input modality and has the same structure as a NonTerm, with the addition of a label specifying the modality; • A NonTerm is a non-terminal node consisting of one or more sequences, each of which contains zero or more NonTerms or Literals; • A Literal is a terminal node containing a text string representing one or more input tokens.
4 - External Interfaces • The Multimodal Server sends a series of points to the pen and gesture recognizers. • The audio is sent to the speech recognizer. • The pen, gesture and speech recognizers return their hypothesis to the multimodal toolkit which is responsible for integrating the results in an optimizing programming search as shown below. [Minh Tue Voh Dissertation 1998 CMU]
5 - Existing Software “Bridges” • The multimodal toolkit uses a Java API which allows applets or applications to incorporate multimodal functionality
6 - Information Flow • Part 1 - Specify how other CPOF components can send and receive data to your system - Please be explicit • Components may directly interface with the multimodal server • Part 2 - What are the inputs to your system - Please specify formats and protocol - provide details • Multimodal grammar • Part 3 - What are the outputs of your system - Please specify format and protocol - provide details • Hypothesis according to the multimodal grammer
7 - Plug-n-play • Part 1 - We have not currently identified how our components interact with other CPOF components. • Please present a diagram that shows this interaction TBD • Part 2 - Are there components in your system that are functionally “similar” to another CPOF component? TBD • Part 3 - Are any of your components complementing other CPOF components? (e.g ZUI and Sage/Visage) TBD
8 - Operating Environments and COTS Component Name Required Hardware Operating System Required COTS Language Multimodal Server JDK 1.1.* PC or Sun Independent Java Tcl/tk C Janus Sun - Ultra 60 Solaris 2.5.1 Tcl/Tk Solaris 2.5.1 or Windows NT NPen++ Sun or PC None C++ Gesture Recognizer Solaris 2.5.1 or Windows NT Sun or PC None C++
9 - Hardware Platform Requirement • Specify the hardware required to support your system: • MMI can run on a PC with a minimum of 32 Meg RAM and 200 Mhz processor. • The Speech Recognizer requires a Sun Ultra 60 dual processor with 500 Meg RAM minimum. (Current recognizer under development will require 500 Mhz Pentium III with a 128 Meg minimum, 256 Meg preferred) • Video capture cards, Soundblaster compatitable sound cards, table top and lapel microphones, pan tilt and stationary cameras are required.