160 likes | 241 Views
Towards multimodal meaning representation. Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002. Scope. What should we consider as meaning?
E N D
Towards multimodal meaning representation Harry Bunt & Laurent Romary LREC Workshop on standards for language resources Las Palmas, May 2002
Scope • What should we consider as meaning? • How the processing of the input should lead to some update of the information state of the system (domain model, discourse model, user model, etc.); • This comprises both propositional content and communicative function. • Such a representation: • Should support both interpretation and generation; • Should support any kind of multimodal input and output; • Should support the variety of semantic theories;
Objectives • Provide interface formats within a MM dialogue architecture • Incremental construction (reference interpretation etc.) up to a final representation (e.g. fixed frame à la MUC) or system action/feedback; • Should also be a basis for the definition of annotation schemes of MM semantic content. • Specification and comparison of application-specific representations • Towards a framework allowing one to compare existing representations (e.g. M3L) or define a new one, while ensuring some level of interoperability between these.
What it is not • A domain model representation, or an ontology • There is OIL, DAML, Topic Maps, UNL etc. • A representation of lower-level linguistic or gestural information (e.g. syntax, etc.) • Some features may be percolated, though, or pointed to… • A representation of the underlying processes • Focus on the output of what is done by a given module
Basic constraints • Expressive and semantic adequacy • Coverage of phenomena and inferencing capacities • Uniformity (representation for various types of inputs and outputs) • Incrementality (usable at various stages) • Before/after fusion, semantic/pragmatic aspects • Underspecification and partiality • Openness and Extensibility • Compatibility with various theories and approaches • Method for designing schemas (XML or others), rather than one specific schema
Methodology • Basic components • Represent the general organization of any semantic structure • Parameterized by • data categories taken from a common registry • application specific data categories • General mechanisms • To make the thing work • General categories • Descriptive categories available to all formats
Basic components (1) • Temporal structures (“events”) • Dialogue turns/utterance • Gestures • Actions on/in the task • Referential structures (“participants”) • Individuals and objects participating in an event • Comprises spatial structures • Propositional content
Basic components (2) • Restrictions (on temporal and referential structures) • E.g. Gesture types, Linguistic modifiers, Dialogue acts, etc. • Dependency structures (linking events and referential structures) • E.g. Participant roles (cf. AGENT-SOURCE-GOAL), Discourse/rhetorical structure, temporal relations
General mechanisms • Links • Internal links • To lower levels (syntactic structures, prosodic cues, gestural trajectories, etc.) • To domain model (types and instances) • Alternatives (cf. ambiguities) • E.g. disjunction of internal links
General categories • Architectural • Producer (consumer?) of the information, confidence, devices • Environmental • Time stamps, spatial information (speaker’s position, graphical configurations, gestural trajectories etc.) • Interactional • Speaker (user state?), other addressees etc.
Combining basic components and data categories Just to illustrate things…
Example Pointer to speaker’s characteristics <semRep id=”rep1”> <event id=“e0”> <cat>utterance</cat> <speaker target=“Peter”/> <adressee target=“System”/> </event> <event id=“e1”> <tense>present</tense> <evtType>wanttogo</evtType> … </event> <participant id=“x”> <num>sing</num> </participant> <relation source=“x” target=“e1”> <role>agent</role> </relation> </semRep> • In black: basic components and mechanisms • (meta-model of semantic representation) • In blue: parameter component chosen from reference registries • Categories • Values Peter: I want to go …
<semRep id=”rep1”> <event id=“e0”> <evtCat>utterance</evtCat> <speaker target=“Peter”/> <adressee target=“System”/> <alt> <dialAct cert=“0.8”>Order</dialAct> <dialAct cert=“0.3”>Inform</dialAct> </alt> </event> <event id=“e1”> <tense>present</tense> <voice>active</voice> <wh>none</wh> <evtType>wanttogo</evtType> … </event> <participant id=“x”> <lex>I</lex> <synCat>Pronoun</synCat> <num>sing</num> <pers>first</num> … </participant> <participant id=“y”> <lex>Paris</lex> <synCat>ProperNoun</synCat> <pers>third</num> … </participant> <participant id=“z”> <lex>Nancy</lex> <synCat>ProperNoun</synCat> <pers>third</num> … </participant> <relation source=“x” target=“e1”> <role>agent</role> </relation> <relation source=“y” target=“e1”> <role>source</role> </relation> <relation source=“z” target=“e1”> <role>goal</role> </relation> </semRep> I want to go from Paris to Nancy
<semRep id=”rep1”> <event id=“e0”> <evtCat>utterance</evtCat> <agent target=“Peter”/> <adressee target=“System”/> <dialAct>Order</dialAct> </event> <event id=“e1”> <tense>present</tense> <voice>active</voice> <wh>none</wh> <evtType>wanttogo</evtType> … </event> <event id=“e2”> <evtCat>gestural</evtCat> <agent target=“Peter”/> <when>2002-02-2:02.02.02</when> <gestType>designation</gestType> <graphContext target=“ctxt23“> </event> <participant id=“x”> <lex>I</lex> … </participant> <participant id=“y”> <lex>here</lex> <synCat>adverb</synCat> </participant> <participant id=“z”> <lex>there</lex> <synCat>adverb</synCat> </participant> <relation source=“y” target=“e2”> <MMLink>co-designation</MMLink> </relation> </semRep> I want to go from here to there
Future work • SIGSEM Working group on meaning representations (ACL) • Liaison with ISO TC37/SC4 - linguistic resources • Preparation of a working draft • Liaison with Isle • Liaison with SIGMedia and SIGDial • W3C/VoiceXML