160 likes | 257 Views
Working group on multimodal meaning representation. Dagstuhl workshop, Oct. 2001 http://www.dagstuhl.de/DATA/Title/01441.html. Scope. What should we consider as meaning?
E N D
Working group on multimodal meaning representation Dagstuhl workshop, Oct. 2001 http://www.dagstuhl.de/DATA/Title/01441.html
Scope • What should we consider as meaning? • How the processing of the input should lead to some update of the information state of the system (domain model, discourse model, user model, bla, bla, bla etc.); • Such a representation: • Should support both interpretation and generation; • Should support any kind of multimodal input and output; • Should support the variety of semantic theories;
Objectives • Provide interface formats within a MM dialogue architecture • Incremental construction (reference interpretation etc.) up to a final representation (e.g. fixed frame à la MUC) or system action/feedback; • Should also be a basis for the definition of annotation schemes of MM semantic content. • Specification and comparison of application-specific representations • Towards a framework allowing one to compare existing representations (e.g. M3L) or define a new one, while ensuring some level of interoperability between these.
What it is not • A domain model representation, or an ontology • There is OIL, DAML, Topic Maps etc. • A representation of lower-level linguistic or gestural information (e.g. syntax, etc.) • Some features may be percolated, though, or pointed to… • A representation of the underlying processes • Focus on the output of what is done by a given module
Basic constraints • Uniformity (representation for various types of inputs and outputs) • Incrementality (usable at various stages) • Before/after fusion, semantic/pragmatic aspects • Extensibility • Method for designing schemas (XML or others), rather than one specific schema • Clear and explicit semantics
Methodology • Basic components • Represent the general organization of any semantic structure • Parameterized by • data categories taken from a common registry • application specific data categories • General mechanisms • To make the thing work • General categories • Descriptive categories available to all formats
Basic components (1) • Temporal structures (“events”) • Dialogue turns/utterance • Gestures • Actions on/in the task • Referential structures (“participants”) • Individuals and objects participating in an event • Comprises spatial structures • Propositional content (BG)
Basic components (2) • Restrictions (on temporal and referential structures) • E.g. Gesture types, Linguistic modifiers, Dialogue acts, etc. • Dependency structures (linking events and referential structures) • E.g. Participant roles (cf. AGENT-SOURCE-GOAL), Discourse/rhetorical structure, temporal relations
Example • Typology of gesture types (registry) • Communicative gestures (wave) • Designation gestures • Shrug • Nod • Movement attributes (intensity, etc.) • One given format will choose among these, or even define its own categories
General mechanisms • Links • Internal links • To lower levels (syntactic structures, prosodic cues, gestural trajectories, etc.) • To domain model (types and instances) • Alternatives (cf. ambiguities) • E.g. disjunction of internal links
General categories • Architectural • Producer (consumer?) of the information, confidence, devices • Environmental • Time stamps, spatial information (speaker’s position, graphical configurations, gestural trajectories etc.) • Interactional • Speaker (user state?), other addressees etc.
Combining basic components and data categories Just to illustrate things…
Example Pointer to speaker’s characteristics <semRep id=”rep1”> <event id=“e0”> <cat>utterance</cat> <speaker target=“Peter”/> <adressee target=“System”/> </event> <event id=“e1”> <tense>present</tense> <evtType>wanttogo</evtType> … </event> <participant id=“x”> <num>sing</num> </participant> <relation source=“x” target=“e1”> <role>agent</role> </relation> </semRep> • In black: basic components and mechanisms • (meta-model of semantic representation) • In blue: parameter component chosen from reference registries • Categories • Values Peter: I want to go …
<semRep id=”rep1”> <event id=“e0”> <evtCat>utterance</evtCat> <speaker target=“Peter”/> <adressee target=“System”/> <alt> <dialAct cert=“0.8”>Order</dialAct> <dialAct cert=“0.3”>Inform</dialAct> </alt> </event> <event id=“e1”> <tense>present</tense> <voice>active</voice> <wh>none</wh> <evtType>wanttogo</evtType> … </event> <participant id=“x”> <lex>I</lex> <synCat>Pronoun</synCat> <num>sing</num> <pers>first</num> … </participant> <participant id=“y”> <lex>Paris</lex> <synCat>ProperNoun</synCat> <pers>third</num> … </participant> <participant id=“z”> <lex>Nancy</lex> <synCat>ProperNoun</synCat> <pers>third</num> … </participant> <relation source=“x” target=“e1”> <role>agent</role> </relation> <relation source=“y” target=“e1”> <role>source</role> </relation> <relation source=“y” target=“e1”> <role>goal</role> </relation> </semRep> I want to go from Paris to Nancy
<semRep id=”rep1”> <event id=“e0”> <evtCat>utterance</evtCat> <agent target=“Peter”/> <adressee target=“System”/> <dialAct>Order</dialAct> </event> <event id=“e1”> <tense>present</tense> <voice>active</voice> <wh>none</wh> <evtType>wanttogo</evtType> … </event> <event id=“e2”> <evtCat>gestural</evtCat> <agent target=“Peter”/> <when>2001-11-1:xxxxxx</when> <gestType>designation</gestType> <graphContext target=“ctxt23“> </event> <participant id=“x”> <lex>I</lex> … </participant> <participant id=“y”> <lex>here</lex> <synCat>adverb</synCat> </participant> <participant id=“z”> <lex>there</lex> <synCat>adverb</synCat> </participant> <relation source=“y” target=“e2”> <MMLink>co-designation</MMLink> </relation> </semRep> I want to go from here to there
Future work • SIGSEM Working group on meaning representations (ACL) • Liaison with ISO TC37/SC4 - linguistic resources • Preparation of a working draft • Liaison with Isle • Liaison with SIGMedia and SIGDial • W3C/VoiceXML