230 likes | 449 Views
Multimodal Communication in the Staging Virtual farm. Patrizia Paggio and Bart Jongejan Center for Sprogteknologi MUMIN workshop Helsinki 2002. The Staging project (www.staging.dk). Interdisciplinary Danish project: nature and use of 3D applications populated with autonomous agents.
E N D
Multimodal Communication in the Staging Virtual farm Patrizia Paggio and Bart Jongejan Center for Sprogteknologi MUMIN workshop Helsinki 2002
The Staging project (www.staging.dk) Interdisciplinary Danish project: nature and use of 3D applications populated with autonomous agents. CST’s work: multimodal communication components of a 3D virtual farm. Focus: multimodal integration, mixed-initiative dialogue, interaction between dialogue and other behaviours. Paggio and Jongejan - Helsinki ‘02
The Staging VE The VE • is in charge of simulating the world • provides the agents with sensory information • processes requests from the agents (move objects, produce sounds, play animations) Staging VE developed at CVMT (Aalborg University) CST has developed a mock-up for testing purposes. Paggio and Jongejan - Helsinki ‘02
Agents Agents carry out behaviours • in reaction to external stimuli and according to their inner state (hunger, tiredness…) • based on strength of activation level Engaging in a dialogue with the user’s avatar is also a behaviour. Dialogue behaviour has strong degree of activation for the farmer agent. Paggio and Jongejan - Helsinki ‘02
The Aalborg VE Paggio and Jongejan - Helsinki ‘02
The CST farm Her skal vises et billede af vores VE Paggio and Jongejan - Helsinki ‘02
Multimodal communication User can interact with agents via various devices: microphone, keyboard, touch screen, data glove. Commercial speech technology, dedicated gesture recogniser (Karin Husballe Munk at CVMT). Speech can be combined with deictic, iconic and turn-taking gestures (Cassell and Prevost 1996). Gestures and speech merged by multimodal parser. Paggio and Jongejan - Helsinki ‘02
Multimodal integration Hand movements Speech Speech recognition Gesture recognition pointing, size Chart initialisation Parsing turn- taking Semantic mapping Communication management Action
More integration Gesture and word are paired: Feed that cow$1|cow Gesture adds information to lexicon entry. • Word and gesture must be (nearly) synchronous • Syntactic constraints: • deictic (pointing) requires noun or pronoun • iconic (size) requires noun • Semantic constraints: • semantic types must be compatible Paggio and Jongejan - Helsinki ‘02
Example Feed that cow$1|animal. pointgesture := <object-type>$<internal-id> {act=request, predicate=feed, arg3={reln=animal, semtype=animal, objectid=cow$1}} reln and object type unified, semtype compatible, objectid added. Paggio and Jongejan - Helsinki ‘02
Contradiction example Feed that cow$1|apple. {act=request, predicate=feed, arg3={reln=animal, semtype=animal, objectid=cow$1}} gesture and noun semantic types incompatible; only the interpretation provided by the gesture is compatible with the semantics of the predicate and survives. Paggio and Jongejan - Helsinki ‘02
Examples Deictic gestures U: Feed an animal, please. A: Which animal shall I feed? U: Take that cow (+ pointing) Iconic gestures U: Feed the sheep, please. A: Which food shall I take? U: The small apple (+ size) Turn-giving and taking gestures U: Hi (+give turn) A: Shall we... Paggio and Jongejan - Helsinki ‘02
The Communication Manager • Interprets user’s dialogue moves • Builds dialogue trees • Interprets references not resolved by gestures • Decides agent’s dialogue moves based on preceding dialogue and on changes in the VE Dialogue goals arising from scenario combined with dialogue obligations created by preceding dialogue. Paggio and Jongejan - Helsinki ‘02
Dialogue goals Dialogue goals are created based on domain-specific action templates (Badler et al 1999). A template specifies actions with related semantic arguments, corresponding attribute name in the semantic representation, necessary preconditions. FeedAction(Topic=Feed, Animal=<arg3>, Food=<arg2>, Tool=<instr>, Precondition=Hungry(Animal)) Paggio and Jongejan - Helsinki ‘02
Example: feed action U: Hi come here A: Okay, I’ll do it U: Feed an animal. A: Which animal shall I take? U: That cow$1|cow. A: Which food shall I take? U: (Take) a small$|small apple. A: Which tool shall I take? U: Take the pitchfork. A: Okay, I’ll do it. Paggio and Jongejan - Helsinki ‘02
Example: precondition not met U: Give that brown cow$2|cow an apple, please. ... A: The cow is not hungry. Paggio and Jongejan - Helsinki ‘02
Example: agent initiative A: Shall I feed the brown cows and the sheep? U: Yes, give the animals a carrot. A: Which tool shall I take? U: The pitchfork. A: Okay, I’ll do it. Paggio and Jongejan - Helsinki ‘02
Dialogue obligations Set of condition/obligation pairs model valid speech act sequences. E.g.: Request/Accept, Reject Whque/Answer, Inform Used to • produce a correct reaction to a user move • interpret a user move as either closing a dialogue segment or opening a new one Paggio and Jongejan - Helsinki ‘02
Dialogue trees request U: Give the white cow an apple please. whque A: Which tool shall I use? whque U: Where is the pitchfork? inform A: The pitchfork is in front of the tree. request U: Take the pitchfork then. accept A: Okay, I’ll do it. Paggio and Jongejan - Helsinki ‘02
Relaxing the rules Condition/obligation pairs do not always fit. Speech acts can be implied: A: Hi U: Feed the animals please They can be coerced: U: Feed an animal. A: Which animal shall I take? U: Feed the brown cow then. Paggio and Jongejan - Helsinki ‘02
Conclusions Staging has made an initial attempt at giving an agent multimodal dialogue abilities to allow for mixed-initiative dialogues. Future research: • more advanced gesture recognition • better understanding of how gestures and speech can complement each other • repairs and self-repairs • interaction between dialogue and other behaviours Paggio and Jongejan - Helsinki ‘02
FILM Paggio and Jongejan - Helsinki ‘02