Multimedia search: From Lab to Web

KI RuG Multimedia search: From Lab to Web prof. dr. L. Schomaker Invited lecture, presented at the 4e Colloque International sur le Document Electronique, 24-26 octobre 2001. Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web. Proceedings of CIDE 2001, pp.373-375, ISBN 2-909285-17-0.

A definition • In content-based image retrieval systems, the goal is to provide the user with a set of images, based on a query which consists - partly or completely - on pictorial information • Exclude: point & click navigation in pre-organized image bases ©2001 LRB Schomaker - KI/RuG

Image-based queries on WWW: existing methods and their problems • IBIR - image-based information retrieval • CBIR - content-based image retrieval • QBIC - queries based on image content • PBIR - pen-based image retrieval ©2001 LRB Schomaker - KI/RuG

Existing systems & prototypes • QBIC (IBM) • VisualSEEk (Columbia) • Four-Eyes (MIT Media) • … and many more: (Webseek, Excalibur,Imagerover,Chabot,Piction) • Research: IMEDIA (Inria), Viper/GIFT (Marchand-Maillet) ©2001 LRB Schomaker - KI/RuG

Upper-left picture is the query • “boy in yellow raincoat” • …yields very counter-intuitive results •  What was the user’s intention?

VisualSEEk (Columbia Univ.) • Layout- and feature-based query construction • Requires detailed user knowledge on pattern-recognition issues!

FourEyes (MIT Medialab)

Problems • Full-image template matching yields bad retrieval results • Feature-based matching requires a lot of input and knowledge by the user • Layout-based search only suits a subset of image needs • Grid-based partitioning misses details and breaks up meaningful objects ©2001 LRB Schomaker - KI/RuG

Problems… • Reasons behind a retrieved image list are unclear (Picard, 1995) • Features and matching scheme are not easily explainable to the user • An intelligent system should learn from previous queries of the user(s) ©2001 LRB Schomaker - KI/RuG

A statement • In content-based image retrieval systems, just as in text-based Information Retrieval, the performance of current systems is limited due to their incomplete and weak modeling of the user’s • Needs • Goals • Perception • Cognition (semantics) ©2001 LRB Schomaker - KI/RuG

User-Interfacing aspects • Computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions (cost / benefit evaluation) • Consequence: after formulating a query with a large amount of key clicks, slider adjustments and mouse clicks, the quality of an image hit list is expected to be very high… • Conversely, user expectancies are low when the effort only consists of a single mouse click ©2001 LRB Schomaker - KI/RuG

Pragmatic aspects • a survey on WWW revealed that users are interested in objects (71%) and not in layout, texture or abstract features. • The preferred image type is photographs (68%) ©2001 LRB Schomaker - KI/RuG

Cognitive & Perceptual aspects • Objects are best recognized from 'canonical views' (Blanz et al., 1999), • Photographers know and utilize this phenomenon by manipulating camera attitude or objects ©2001 LRB Schomaker - KI/RuG

Photographs and paintings imply communication World World Photographer Painter Surveillance camera = Computer Vision User, viewer  Problems of geometrical invariance are less extreme ©2001 LRB Schomaker - KI/RuG

More cognition: Basic-level object categories • In a hierarchy of object classes (ontology) a node of the type 'Basic Level' (Rosch et al.,1976) adds many structural features in its description, as compared to the level above, whereas the number of unique additional features is reduced when going down towards a more specific node. ©2001 LRB Schomaker - KI/RuG

Basic-level categories, example “furniture” [virtually no geometrical features] “chair” [many clearly-defined structural features] “kitchenchair” [only a few additional features]. ©2001 LRB Schomaker - KI/RuG

Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named, or shown visually ©2001 LRB Schomaker - KI/RuG

Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) ©2001 LRB Schomaker - KI/RuG

Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. ©2001 LRB Schomaker - KI/RuG

Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. Rosch, E., Mervis, C.B., Gray, W.E., Johnson, E.M. and Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, pp. 382-439. ©2001 LRB Schomaker - KI/RuG

Implication of the ‘basic level’ category • The basic level forms a natural bridge between textual and pictorial information • It is likely to determine both annotation and search behavior of the users • It is an ideal starting point for developing computer vision systems which generate text on the basis of a photograph (ultimately) ©2001 LRB Schomaker - KI/RuG

“A picture is worth a thousand words”…. • But many pictures could use a few words…!

“A picture is worth a thousand words”? This is a part of a rocket engine by NASA

Assumptions • In image retrieval, the media type of photographs is preferred • There is a predominant interest in objects (in the broad sense: including humans and animals) • The most likely level of description in real-world images is the “basic-level” category (Rosch et al.) ©2001 LRB Schomaker - KI/RuG

Goal: object-based image search • Object recognition in an open domain?  Not possible yet. • Extensive annotation is needed in any case: for indexed access and for machine learning (MPEG-7 allows for sophisticated annotation) • But who is going to do the annotation: the content provider or the user, and how? ©2001 LRB Schomaker - KI/RuG

How to realize object-based image search? • Bootstrap process for pattern recognition • cf.: Project CyC (Lenat) and openMind (Stork) • Collaborative, opportunistic annotation and object labeling (browser side) • Background learning process (server side) ©2001 LRB Schomaker - KI/RuG

Design considerations • Focus on object-based representations and queries • Material: photographics with identifiable objects for which a verbal description can be given • Exploit human perceptual abilities • Allow for incremental annotation to obtain a growing training set ©2001 LRB Schomaker - KI/RuG

Outline-based queries • In order to bridge the gap between what is currently possible and the ultimate goal of automatic object detection and classification, a closed curve, drawn around a known object is used as a bootstrap representation: An outline. • This closed curve contains shape information itself (XY, dXdY, curvature) and allows to separate visual object characteristics represented by the pixels which are enclosed by it from the background ©2001 LRB Schomaker - KI/RuG

Scribbles vs Outlines

Examples of outlines from a “Wild West” base of photographs

Outline, basic features and matching

More outline-based features • Lengths of radii from center of gravity • Curvature • Curvature scale space • Bitmap of an outline • Absolute Fourier transform |FFT| • Others (not tried yet): wavelets, Freeman coding ©2001 LRB Schomaker - KI/RuG

Outline features: coordinates, running angle (cos(f),sin(f)), radii, |FFT|

Outline examples from motor bicycle set

Motor bike engine

Image (pixel-based) features

Matching possibilities

Multimedia search: From Lab to Web

Multimedia search: From Lab to Web

Presentation Transcript

Multimedia in Web

Multimedia Search and Retrieval

From Syntactic Search to Semantic Search

Web Search

Semantic Multimedia Web

Massive Effective Search from the Web

Diseño Web Multimedia

Multimedia in Web

Adding Multimedia to Expression Web

Web Search

WEB MULTIMEDIA COURSE

Moving From Lab Tracker to the Web

UWM Multimedia Software Lab

Multimedia Language Lab

Multimedia/Video Search

Multimedia in Web

Web Open Lab: Search Engine Optimization

Web Multimedia

Web Search

Multimedia search engine

Web Search

Web Search