Semantic Multimedia

Semantic Multimedia Steffen Staab, Univ. Sheffield March 28, 2006

My private challenge ….and more than 17,000 other images and mini-movies

Send family recent christmas photos… Send friends pictures that include them… Ask for pictures that depict my children at carnival with a big smile in order to make a presentation about semantic multimedia… Show a friend where we live… Exchange opinions about what are the best shots Record „photo copies“ of signed contracts Query for all architecture images built for X-Media proposal… List to be continued… Store pictures in a folder Query for picture name and date What I would like to do… vs What I can do The Semantic Gap

Strategies for Narrowing the Semantic Gap • Image understanding • Scene classification • People recognition (that, who) • Artifact recognition • Context understanding • Shared Annotation • User Feedback • Highly domain-dependent, but so far: • little domain knowledge Google Flickr Little explored wrt Semantics (e.g.Santini etal.)

Simple Annotations - TagFS

Annotation is expensive as a dedicated process Allow for on-the-fly annotation from all existing applications Motivation Semantically Enabled App Arbitrary Application Semantic Metadata Repository Arbitrary Metadata Format

Retrieval strategies cannot be easily anticipated by users No fixed schema Motivation

Allow for on-the-fly annotation from all existing applications  Annotation through a filesystem interface No fixed schema  Avoid hierarchical filesystem organisation Concept

Use the Filesytem to Link Arbitrary Applications with a Metadata Repository Semantically Enabled App Arbitrary Application ? Virtual Filesystem OS Kernel Semantic Metadata Repository WebDAV FAT32

Queries filesystem paths correspond to queries each directory translates to a parametrized view on the metadata repository views can be nested Example “/tag beach/depictssteffen“ translates into depicts(„steffen“, tag(„beach“, /)) “/” represents metadata repository Metadata Every file is associated with tags given by names of super directories Allowing for linking of directories Multiple hierarchies Directories represent… at creation time at exploration time

listing a directory (= view) returns all information objects corresponding to the view information objects may be files, bookmarks, chapters, ... Class handlers implement file system operations (read, write, ...) for a class of information objects (files, bookmarks,…) RDFFS maps filesystem operations to operations on a RDF-repository Views and Class Handlers provide tagging for files Annotated Information Objects

Architecture II Semantically Enabled App TagFS Arbitrary Application Arbitrary Application Views Class Handlers RDFFs Virtual Filesystem OS Kernel Fuse Semantic Metadata Repository WebDAV FAT32

Take Picture of Steffen and Simon in Sheffield and store it at /tag image/tag sheffield/depicts simon/picture1 Take Picture of Steffen in Rome and store it at /tag rome/depicts steffen/tag image/picture2 Record video of Sheffield and store it at /tag video/tag sheffield/video1 notice that picture1 also depicts Steffen. link picture1 to /depicts Steffen Usage Examples: Annotate

all documents related to Sheffield: /tag sheffield picture1 video1 all images depicting Steffen: /tag image/depicts steffen picture1 picture2 Usage Examples: Retrieve

Shared Annotations:SEA – Semantic Exchange Architecture

Use Case: Virtual Organizations • Project members need to share, i.e. distribute and retrieve, confidential information among each other • Different members have different roles, e.g. manager, researcher, that require different views onto the shared data Lucy@Sheffield Sergej@Koblenz X-Media X-Media Deliverables Contracts Work Package 1 Work Package 2

Use Case: Image Sharing • User X has many images • X wants to share some images publicly, some only with dedicated persons, and some not at all • Due to the amount of images, uploading many images to a central repository is not an option

Use Case: Image Sharing • Neither X nor X's friends want to pay for a dedicated server or hand over their images to a server managed by a 3rd party • They would like to user their own storage

SEA Purpose: • Decentralizedinformation sharing, e.g. image sharing • Tagging as means for • Personal and collaborative organization of information • Information retrieval • Access control

SEA: Architecture • RDF store for meta data • DHT implementation for efficient distribution of shared information

SEA: Features • Autonomy for information distribution and sharing • Flexible information organization • Simple setup and administration of sharing environment • Privacy, data security • Ad-hoc collaboration

Centralized vs. Distributed Sharing with SEA Conventional informationsharing characterized by centralization SEA follows a distributed approach

Access Control in SEA • Access control mechanisms allow to define with whom to share data • based on taggings • e.g. everything tagged as „public“ is public • e.g. everything tagged as „forSteffen“ is accessible for Steffen • based on rules for access

SEA: Data Model • Ontological meta model

Image understanding using ontologies

What is this?

Solution Better use context and background knowledge

Region-Based Image Labelling • Find semantically meaningful regions • Label them with concepts • Infer higher level annotations from initial labellings • Provide user-centred, semantic annotation The overall aim is to improve the access to multimedia content.

Scene classification Image Labeling: 1. Initial, region-based Output: • Segment Classification • Hypothesis set of possible labels for image segments • Degree of confidence

Scene classification Image Labeling: 1. Initial, region-based Output of Person/Face Detection: • Bounding boxes for detected persons/faces • Degree of confidence

B2={l1,l2} A1 A2 A1 over A2 A1 overlaps B2 …  Spatial & topological information Multi-Tier Image Model Segments Label hypotheses Confidence values Bounding boxes Classification of picture ….

Multimedia Reasoning • Aim, now: • integrate available information towards • global, • consistent and • user-oriented annotation • 3 tasks: • Consistency Checking • Region Merging • Generation of a higher-level, user-centered annotation Current Focus

Constraint Satisfaction Problem (CSP) Check that label(s) of a region are consistent wrt labels of neighboring regions Ideally: Leaves one correct label per region More often: more than one label remains decision in favor of highest confidence values Process consists of Transformation of multi-tier description into a CSP Application of constraint reasoning to solve the CSP Computing the “best” labeling using the confidence values Consistency Checking

Constraint Satisfaction Problem (CSP) Definition • Consists of set of variables and set of constraints relating several variables • Each variable may have values from it’s domain • A constraint defines which values can be assigned to a variable depending on the related variables • Standard methods exist to solve the CSP • Two steps: • Consistency checking, i.e. removal of values from the domain that never satisfy the constraints • Computation of full solutions using search algorithms(i.e. model generation)

Constraint Satisfaction Problems Example: • Variables: x, y, z • Domains: D(x) = {1, 2, 3}, D(y) = {2, 3, 4}, D(z) = {2, 3, 4, 5} • Constraints: x >= y, y >= z • After consistency checking: D(x) = {2, 3}, D(y) ={2, 3}, D(z) = {2, 3} • Concrete Solutions (models): (2, 2, 2), (3, 2, 2), … • Not all possible combinations of domain values are a solution!

Image Labelling as a CSP • Each segment s is transformed into a variable vs • Initial Labellings Ls are the domains of the segment variables, D(vs)=Ls • For each spatial relation type, a constraint sp-rel(v,w) is defined • the spatial constraints define which value combinations are legal for the given relation • e.g. left-of(v,w):={(sea,sea),(sky,sky)}, but not (sky,sea) • If two segments s,t are related with a spatial relation sp-rel, a corresponding spatial constraint sp-rel(vs,vt) is instantiated

Example Initial image Segmentation Mask

Confidence Values

Sky can not be left of Sea Confidence Values Spatial Relations

Sea can not be above Sky Confidence Values Spatial Relations

Definition of Spatial Constraints • Spatial constraints form an integral part of the domain knowledge used for multimedia reasoning • Currently they are explicitly defined by a domain expert • But: • Seems not feasible for large amounts of concepts and relations • Preferably each constraint should be accompanied by a confidence value, which can hardly be defined by an expert • Idea: • Learn constraints from pre-annotated images • Allow for later refinement during run-time by user interaction. • Planned extension of M-OntoMat-Annotizer for this purpose • http://www.acemedia.org/aceMedia/results/software/m-ontomat-annotizer.html

M-Ontomat-Annotizer

Current Status: • Use of segment classification • Very recently: integration of person/face detection module

Initial Evaluation • An evaluation framework for region-based image labelling was defined within the aceMedia project. • Ground Truth is defined on a grid-basis • a N x N grid is layered on top of each image • each cell is annotated with all depicted concepts • For evaluation the segments of the segmentation, or the bounding boxes, are mapped to the respective cells. • For each concept it is counted how often • the concept was found correctly, i.e. a correspondence between the segment label and a grid label is found • the concept was found in general • the concept exists in the GT • Based on these values precision and recall for each concept, and the overall process can be defined.

Evaluation Results • Since the method is based on content analysis modules, we evaluated the improvement reached by applying the constraint reasoning to the segment classification. • First, precision, recall and the F-Measure were computed for the segment classification • Then, for the CSP method applied to the initial labelling • Finally the average improvement was calculated

Evaluation Results Segment Classification Constraint Reasoning • Set Up: • Evaluation with ~60 images • A 8x8 grid was used for the ground truth • The segmentation was set up to always produce 8 segments per image • Results are promising, showing an 10% increase in average. • However, results in the overall performance are needed

Next Steps • Soft Constraint Reasoning • Fuzzy Constraints to integrate the confidence values into the reasoning • Incremental Constraints to flexibly add constraints during reasoning • both should provide for more robust results and lead to better reduction of the initial label sets • Incorporation of a region merging step • Would enable an iterative process and a knowledge-based segmentation • Derivation of a higher-level annotation • Currently a simple combination of confidence values is applied • the maximum degree for each concept is kept, and each concept is added to the final annotation • later also relations and additional concepts should be inferred

Semantic Multimedia

Semantic Multimedia

Presentation Transcript

Publishing Cultural Heritage? Semantic Multimedia Content Management

Semantic Integration and Retrieval of Multimedia Metadata

XG Multimedia Semantic News Use Case

Multimedia Semantic Web and MPEG-7

The Multimedia Semantic Web

Semantic Multimedia Web

Multimedia Semantic Analysis in the PrestoSpace Project

Multimedia Semantics and the Semantic Web

Semantic Web - Multimedia Ontology-

Bridge Semantic Gap: A Large Scale Concept Ontology for Multimedia (LSCOM)

Semantic Web - Multimedia Annotation –

Semantic multimedia annotation tool Tutorial authors : Batatia, Piombo hadj.batatia@enseeiht.fr

Semantic Adaptation Of Multimedia Documents

Semantic driven multimedia presentations

Multimedia on the Semantic Web

Publishing Cultural Heritage? Semantic Multimedia Content Management

Semantic Web - Multimedia Ontology-