780 likes | 902 Views
KI. RuG. Multimedia search: From Lab to Web. prof. dr. L. Schomaker. Invited lecture, presented at the 4e Colloque International sur le Document Electronique, 24-26 octobre 2001. Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web.
E N D
KI RuG Multimedia search: From Lab to Web prof. dr. L. Schomaker Invited lecture, presented at the 4e Colloque International sur le Document Electronique, 24-26 octobre 2001. Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web. Proceedings of CIDE 2001, pp.373-375, ISBN 2-909285-17-0.
Overview • Methods in content-based image search • The user’s perspective: ergonomics, cognition and perception • Feeding the data-starved machine ©2001 LRB Schomaker - KI/RuG
Researchers • L. Schomaker • L. Vuurpijl • E. Deleau • E. Hoenkamp • A. Baris ©2001 LRB Schomaker - KI/RuG
A definition • In content-based image retrieval systems, the goal is to provide the user with a set of images, based on a query which consists - partly or completely - on pictorial information • Exclude: point & click navigation in pre-organized image bases ©2001 LRB Schomaker - KI/RuG
Image-based queries on WWW: existing methods and their problems • IBIR - image-based information retrieval • CBIR - content-based image retrieval • QBIC - queries based on image content • PBIR - pen-based image retrieval ©2001 LRB Schomaker - KI/RuG
Existing systems & prototypes • QBIC (IBM) • VisualSEEk (Columbia) • Four-Eyes (MIT Media) • … and many more: (Webseek, Excalibur,Imagerover,Chabot,Piction) • Research: IMEDIA (Inria), Viper/GIFT (Marchand-Maillet) ©2001 LRB Schomaker - KI/RuG
Query Methods ©2001 LRB Schomaker - KI/RuG
Example 1. QBIC (IBM) • Features: • Colors, textures, edges, shape • Matching • Layout, full-image templates, shape ©2001 LRB Schomaker - KI/RuG
Upper-left picture is the query • “boy in yellow raincoat” • …yields very counter-intuitive results • What was the user’s intention?
Example 2. VisualSEEk • Features: • Colors, textures, edges. bitmap shape • Matching: • layout, full-image templates ©2001 LRB Schomaker - KI/RuG
VisualSEEk (Columbia Univ.) • Layout- and feature-based query construction • Requires detailed user knowledge on pattern-recognition issues!
Example 3. FourEyes (MIT Medialab) • Imposed block segmentation • Textual annotation per block • Labels are propagated on the basis of texture matching ©2001 LRB Schomaker - KI/RuG
FourEyes… • Imposed block segmentation: is unrelated to object placement • object details are lost: global + textural • Interesting: a role for the user ©2001 LRB Schomaker - KI/RuG
Problems • Full-image template matching yields bad retrieval results • Feature-based matching requires a lot of input and knowledge by the user • Layout-based search only suits a subset of image needs • Grid-based partitioning misses details and breaks up meaningful objects ©2001 LRB Schomaker - KI/RuG
Problems… • Reasons behind a retrieved image list are unclear (Picard, 1995) • Features and matching scheme are not easily explainable to the user • An intelligent system should learn from previous queries of the user(s) ©2001 LRB Schomaker - KI/RuG
A statement • In content-based image retrieval systems, just as in text-based Information Retrieval, the performance of current systems is limited due to their incomplete and weak modeling of the user’s • Needs • Goals • Perception • Cognition (semantics) ©2001 LRB Schomaker - KI/RuG
User-Interfacing aspects • Computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions (cost / benefit evaluation) • Consequence: after formulating a query with a large amount of key clicks, slider adjustments and mouse clicks, the quality of an image hit list is expected to be very high… • Conversely, user expectancies are low when the effort only consists of a single mouse click ©2001 LRB Schomaker - KI/RuG
Pragmatic aspects • a survey on WWW revealed that users are interested in objects (71%) and not in layout, texture or abstract features. • The preferred image type is photographs (68%) ©2001 LRB Schomaker - KI/RuG
Cognitive & Perceptual aspects • Objects are best recognized from 'canonical views' (Blanz et al., 1999), • Photographers know and utilize this phenomenon by manipulating camera attitude or objects ©2001 LRB Schomaker - KI/RuG
Photographs and paintings imply communication World World Photographer Painter Surveillance camera = Computer Vision User, viewer ©2001 LRB Schomaker - KI/RuG
Photographs and paintings imply communication World World Photographer Painter Surveillance camera = Computer Vision User, viewer Problems of geometrical invariance are less extreme ©2001 LRB Schomaker - KI/RuG
Canonical Views Non-canonical object orientation ©2001 LRB Schomaker - KI/RuG
Canonical Views Canonical object orientation ©2001 LRB Schomaker - KI/RuG
More cognition: Basic-level object categories • In a hierarchy of object classes (ontology) a node of the type 'Basic Level' (Rosch et al.,1976) adds many structural features in its description, as compared to the level above, whereas the number of unique additional features is reduced when going down towards a more specific node. ©2001 LRB Schomaker - KI/RuG
Basic-level categories, example “furniture” [virtually no geometrical features] “chair” [many clearly-defined structural features] “kitchenchair” [only a few additional features]. ©2001 LRB Schomaker - KI/RuG
Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology ©2001 LRB Schomaker - KI/RuG
Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named, or shown visually ©2001 LRB Schomaker - KI/RuG
Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) ©2001 LRB Schomaker - KI/RuG
Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. ©2001 LRB Schomaker - KI/RuG
Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. Rosch, E., Mervis, C.B., Gray, W.E., Johnson, E.M. and Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, pp. 382-439. ©2001 LRB Schomaker - KI/RuG
Implication of the ‘basic level’ category • The basic level forms a natural bridge between textual and pictorial information • It is likely to determine both annotation and search behavior of the users • It is an ideal starting point for developing computer vision systems which generate text on the basis of a photograph (ultimately) ©2001 LRB Schomaker - KI/RuG
Misconception about Perception and Cognition • “A picture is worth a thousand words”? True or False? ©2001 LRB Schomaker - KI/RuG
“A picture is worth a thousand words”…. • But many pictures could use a few words…!
“A picture is worth a thousand words”? This is a part of a rocket engine by NASA
Assumptions • In image retrieval, the media type of photographs is preferred • There is a predominant interest in objects (in the broad sense: including humans and animals) • The most likely level of description in real-world images is the “basic-level” category (Rosch et al.) ©2001 LRB Schomaker - KI/RuG
Goal: object-based image search • Object recognition in an open domain? Not possible yet. • Extensive annotation is needed in any case: for indexed access and for machine learning (MPEG-7 allows for sophisticated annotation) • But who is going to do the annotation: the content provider or the user, and how? ©2001 LRB Schomaker - KI/RuG
How to realize object-based image search? • Bootstrap process for pattern recognition • cf.: Project CyC (Lenat) and openMind (Stork) • Collaborative, opportunistic annotation and object labeling (browser side) • Background learning process (server side) ©2001 LRB Schomaker - KI/RuG
Design considerations • Focus on object-based representations and queries • Material: photographics with identifiable objects for which a verbal description can be given • Exploit human perceptual abilities • Allow for incremental annotation to obtain a growing training set ©2001 LRB Schomaker - KI/RuG
Outline-based queries • In order to bridge the gap between what is currently possible and the ultimate goal of automatic object detection and classification, a closed curve, drawn around a known object is used as a bootstrap representation: An outline. • This closed curve contains shape information itself (XY, dXdY, curvature) and allows to separate visual object characteristics represented by the pixels which are enclosed by it from the background ©2001 LRB Schomaker - KI/RuG
Examples of outlines from a “Wild West” base of photographs
More outline-based features • Lengths of radii from center of gravity • Curvature • Curvature scale space • Bitmap of an outline • Absolute Fourier transform |FFT| • Others (not tried yet): wavelets, Freeman coding ©2001 LRB Schomaker - KI/RuG
Outline features: coordinates, running angle (cos(f),sin(f)), radii, |FFT|