140 likes | 258 Views
Beyond Multimedia Integration: corpora and annotations for cross-media decision mechanisms. Katerina Pastra. Language Technology Applications, Institute for Language and Speech Processing, Athens, Greece. The Multimedia Integration context.
E N D
Beyond Multimedia Integration: corpora and annotations for cross-media decision mechanisms Katerina Pastra Language Technology Applications, Institute for Language and Speech Processing, Athens, Greece
The Multimedia Integration context • MM integration mechanisms: ones that establish associations between medium-specific representations • Applications: MM dialogue, MM indexing etc. • Need: MM corpora annotated with such associations • Existing collections with such annotations: IBM (Lin et al. 2003), PASCAL visual object categorization challenge (Everingham et al. 2005), and some ad hoc created collections finite set of textual labels associated to visual medium (keyframe/image or image region) OK, for MM integration – but, Beyond ???
Overview • Beyond MM integration: Cross-media decision mechanisms - notion and example application (cross-media indexing) • Corpora for cross-media mechanisms - corpus characteristics & the REVEAL THIS corpus • Annotations for cross-media mechanisms - cross-media relations - cross-media annotation types - annotation language requirements - MPEG7 & EMMA: suitability for such annotations
equivalence The notion of Cross-Media Decision Mechanisms Mechanisms that decide on the relation that holds between medium specific pieces of information: • across documents (Boll et al. 1999) • within documents(Pastra & Piperidis 2006, EuroITV conf.) The mechanisms decided whether medium-specific pieces of information within the same Multimedia Document are: • associated (multimedia integration) • complementary • semantically compatible/incompatible complementarity independence
Cross-media Relation Examples Equivalence: “the yellow taxi-boats…” Non-essential complementarity: “…we are heading to Patmos…” Essential complementarity: “…[pollution has taken its toll] on that..” Independence: “…I have finally found a place that’s not overrun by tourists…”
Cross-media relations • Equivalence: info expressed by different media refers to the same entity (object, state, event or property) • Complementarity: info in one medium is an (essential or not) complement of the info expressed in another. Essential complementarity usually indicated through association signals (e.g. indexicals) Non-essentially complementarity info in one medium is a modifier/adjunct of info expressed in another • Independence: each medium carries an independent (but coherent) part of the MM message Incoherence due to errors in medium-specific processing or artistic/editorial reasons
Application example: a cross-media indexer’s decisions 2 ¬choice 1) Landscape–sea/coast 1¬and 2) Landscape – people or/and Equivalence: “the yellow taxi-boats…” Independence: “…I have finally found a place that’s not overrun by tourists…” and and Essential complementarity: “…[pollution has taken its toll] on this..” Non-essential complementarity: “…we are heading to Patmos…” How shall we develop them?
Corpora for x-media mechanisms Such a corpus should consist of: • multimedia documents(e.g. video, illustrated text) • multi-genre & multi-domain documents within document algorithms! medium-specific processor needs! • The Reveal – This corpus • Multilingual(EL-EN) – part of it parallel + rest comparable • Multimedia(MPEG-2 videos – TV programmes, DVD documentaries etc. but also radio and web-text (=UTF8) • Multi-domain(Politics, Travel, News) • Multi-genreto accommodate: read speech vs. spontaneous • speech, face-rich vs. object-rich imagery, formal vs. colloquial • language
Annotation types Equivalence: • Association – A(X,Y) • Partial Association – PA(X,Y) e.g. “coloured” + Complementarity: • Association Signal – AS(X,Y) • Adjunct – AJ(X,Y) • Apposition – AP(X,Y) e.g. “the president” + Independence: • Coherence – CH(X,Y) “attribute” text vs. “value” image One unit, do not associate as if type:token
Annotation language requirements A markup language should allow for: • modular description of the structure of a MM document and of media it consists of, to facilitate indication of relations between media with different levels of structural granularity (e.g. token-image region, token-set of keyframes etc.) MPEG-7 ideally suited for such description • creation of a multimedia unit with re-defined properties in cases of essential conjunction of medium-specific pieces of info for forming a MM message (e.g. in essential complementarity cases….) EMMA ideally suited for such task
MPEG-7 and EMMA MPEG-7 ISO standard for describing MM content: • low-level feature descriptors (e.g. colour, motion etc.) • high-level feature descriptors (object, event etc.) • structure descriptors • relations between structural units (e.g. spatial, temporal etc.) • textual annotations for each unit (e.g. controlled vocabulary etc.) EMMAW3C standard (working-draft) for describing the output of medium-specific processors and their integration in MM user input scenarios • hook element ~ asso signal • composite derivation creation of MM unit e.g. “destination” + pen pointing to Boston image region on screen
Conclusions • within-document cross media decisions mechanisms • need multimedia, multi-genre, multi-domain corpora • annotated with a finite set of description elements • that will allow for indicating the cross-media relations (equivalence, complementarity, independence) that hold in the MM documents • using a markup language that will have features that MPEG-7 and EMMA have in combination A timely cooperation between the two schemes?