E N D
An Overview of QuickSet, from OGI • Cohen, P. R., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. (1997). QuickSet: Multimodal Interaction for Distributed Applications, Proceedings of the Fifth International Multimedia Conference (Multimedia '97), ACM Press, pp 31-40. • Pittman, J., Smith, I., Cohen, P., Oviatt, S. and Yang, T. QuickSet: A Multimodal Interface for Military Simulation, In Proceedings of the 6th Conference on Computer-Generated Forces and Behavioral Representation, University of Central Florida, 1996, 217-224. • Pittman, J. Recognizing Handwritten Text, In Proceedings of CHI 1991, Human Factors in Computing Systems, ACM/SIGCHI, NY, pp. 271-275. • Oviatt, S.L., Cohen, P.R., Wu, L.,Vergo, J., Duncan, L., Suhm, B., Bers, J., Holzman, T., Winograd, T., Landay, J., Larson, J. & Ferro, D. Designing the User Interface for Multimodal Speech and Pen-based Gesture Applications: State-of-the-Art Systems and Future Research Directions, Human Computer Interaction, 2000, vol. 15, no. 4, 263-322.
What is QuickSet?A pen & voice front-end interface, scaleable from a hand-held to a wall-sized format
Used for Map-Based ApplicationsThe user sketches on top of an existing map.e.g., To drive a training simulator for U.S. Marines
Open Agent ArchitectureUses Interagent Communication Language
How is this different from hand-writing recognition? Sketched Routes Sketched Regions Other Sketched Symbols
Two Recognizers for Pen Gestures • Neural Network Pre-processing: size normalized, centered in a 2D image pixels are fed into the NN • Hidden Markov Model Pre-processing: smoothed, re-sampled, converted to deltas Combine probability estimates from the two recognizers to compute a probability for each possible gesture Recognized 68 pen-gestures (1997) and 190 gestures (2000)
From Pittman’s 1991 paper on Handwriting recognition (Preliminary work on the Neural Net only) • Standard back-propagation network with 2 hidden layers • His conclusion: architecture doesn’t matter; size of the training set does matter • Collecting & labeling a training corpus of 10,000-20,000 letters • Pre-processing: Use the center point of the character for normalization & re-sizing • Using Context (adjacent characters) to help recognition (no help for symbol recognition in the map)
ExampleProbabilities are estimated for each gestureThe route is mis-recognized as an area
Typed Feature Structure Unification From Pen Gesture Recognizer: From Speech Recognizer:
The Take-away • Multi-agent architecture • Pen gesture recognizer • Pre-processing • Neural Net might work fine for our purposes • Doesn’t help with object/route vs. a symbol • Find out more about the HMM • Semantic representation