1 / 78

Multimedia search: From Lab to Web

KI. RuG. Multimedia search: From Lab to Web. prof. dr. L. Schomaker. Invited lecture, presented at the 4e Colloque International sur le Document Electronique, 24-26 octobre 2001. Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web.

ide
Download Presentation

Multimedia search: From Lab to Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KI RuG Multimedia search: From Lab to Web prof. dr. L. Schomaker Invited lecture, presented at the 4e Colloque International sur le Document Electronique, 24-26 octobre 2001. Schomaker, LRB (2001) Image Search and Annotation: From Lab to Web. Proceedings of CIDE 2001, pp.373-375, ISBN 2-909285-17-0.

  2. Overview • Methods in content-based image search • The user’s perspective: ergonomics, cognition and perception • Feeding the data-starved machine ©2001 LRB Schomaker - KI/RuG

  3. Researchers • L. Schomaker • L. Vuurpijl • E. Deleau • E. Hoenkamp • A. Baris ©2001 LRB Schomaker - KI/RuG

  4. A definition • In content-based image retrieval systems, the goal is to provide the user with a set of images, based on a query which consists - partly or completely - on pictorial information • Exclude: point & click navigation in pre-organized image bases ©2001 LRB Schomaker - KI/RuG

  5. Image-based queries on WWW: existing methods and their problems • IBIR - image-based information retrieval • CBIR - content-based image retrieval • QBIC - queries based on image content • PBIR - pen-based image retrieval ©2001 LRB Schomaker - KI/RuG

  6. Existing systems & prototypes • QBIC (IBM) • VisualSEEk (Columbia) • Four-Eyes (MIT Media) • … and many more: (Webseek, Excalibur,Imagerover,Chabot,Piction) • Research: IMEDIA (Inria), Viper/GIFT (Marchand-Maillet) ©2001 LRB Schomaker - KI/RuG

  7. Query Methods ©2001 LRB Schomaker - KI/RuG

  8. Example 1. QBIC (IBM) • Features: • Colors, textures, edges, shape • Matching • Layout, full-image templates, shape ©2001 LRB Schomaker - KI/RuG

  9. Upper-left picture is the query • “boy in yellow raincoat” • …yields very counter-intuitive results •  What was the user’s intention?

  10. Example 2. VisualSEEk • Features: • Colors, textures, edges. bitmap shape • Matching: • layout, full-image templates ©2001 LRB Schomaker - KI/RuG

  11. VisualSEEk (Columbia Univ.) • Layout- and feature-based query construction • Requires detailed user knowledge on pattern-recognition issues!

  12. Example 3. FourEyes (MIT Medialab) • Imposed block segmentation • Textual annotation per block • Labels are propagated on the basis of texture matching ©2001 LRB Schomaker - KI/RuG

  13. FourEyes (MIT Medialab)

  14. FourEyes… • Imposed block segmentation: is unrelated to object placement • object details are lost: global + textural • Interesting: a role for the user ©2001 LRB Schomaker - KI/RuG

  15. Problems • Full-image template matching yields bad retrieval results • Feature-based matching requires a lot of input and knowledge by the user • Layout-based search only suits a subset of image needs • Grid-based partitioning misses details and breaks up meaningful objects ©2001 LRB Schomaker - KI/RuG

  16. Problems… • Reasons behind a retrieved image list are unclear (Picard, 1995) • Features and matching scheme are not easily explainable to the user • An intelligent system should learn from previous queries of the user(s) ©2001 LRB Schomaker - KI/RuG

  17. A statement • In content-based image retrieval systems, just as in text-based Information Retrieval, the performance of current systems is limited due to their incomplete and weak modeling of the user’s • Needs • Goals • Perception • Cognition (semantics) ©2001 LRB Schomaker - KI/RuG

  18. User-Interfacing aspects • Computer users are continuously evaluating the value of system responses as a function of the effort spent on input actions (cost / benefit evaluation) • Consequence: after formulating a query with a large amount of key clicks, slider adjustments and mouse clicks, the quality of an image hit list is expected to be very high… • Conversely, user expectancies are low when the effort only consists of a single mouse click ©2001 LRB Schomaker - KI/RuG

  19. Pragmatic aspects • a survey on WWW revealed that users are interested in objects (71%) and not in layout, texture or abstract features. • The preferred image type is photographs (68%) ©2001 LRB Schomaker - KI/RuG

  20. Cognitive & Perceptual aspects • Objects are best recognized from 'canonical views' (Blanz et al., 1999), • Photographers know and utilize this phenomenon by manipulating camera attitude or objects ©2001 LRB Schomaker - KI/RuG

  21. Photographs and paintings imply communication World World Photographer Painter Surveillance camera = Computer Vision User, viewer ©2001 LRB Schomaker - KI/RuG

  22. Photographs and paintings imply communication World World Photographer Painter Surveillance camera = Computer Vision User, viewer  Problems of geometrical invariance are less extreme ©2001 LRB Schomaker - KI/RuG

  23. Canonical Views Non-canonical object orientation ©2001 LRB Schomaker - KI/RuG

  24. Canonical Views Canonical object orientation ©2001 LRB Schomaker - KI/RuG

  25. More cognition: Basic-level object categories • In a hierarchy of object classes (ontology) a node of the type 'Basic Level' (Rosch et al.,1976) adds many structural features in its description, as compared to the level above, whereas the number of unique additional features is reduced when going down towards a more specific node. ©2001 LRB Schomaker - KI/RuG

  26. Basic-level categories, example “furniture” [virtually no geometrical features] “chair” [many clearly-defined structural features] “kitchenchair” [only a few additional features]. ©2001 LRB Schomaker - KI/RuG

  27. Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology ©2001 LRB Schomaker - KI/RuG

  28. Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named, or shown visually ©2001 LRB Schomaker - KI/RuG

  29. Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) ©2001 LRB Schomaker - KI/RuG

  30. Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. ©2001 LRB Schomaker - KI/RuG

  31. Basic-level object categories and mental imagery • A basic level is the highest level for which clear mental imagery exists in an object ontology • A basic-level object elicits almost the same feature description when it is named or shown visually • Basic-level object descriptions often contain reference to structural components (parts) • In verbally describing the contents of a picture, people will tend to use 'basic-level' words. Rosch, E., Mervis, C.B., Gray, W.E., Johnson, E.M. and Boyes-Braem, P. (1976). Basic objects in natural categories. Cognitive Psychology, 8, pp. 382-439. ©2001 LRB Schomaker - KI/RuG

  32. Implication of the ‘basic level’ category • The basic level forms a natural bridge between textual and pictorial information • It is likely to determine both annotation and search behavior of the users • It is an ideal starting point for developing computer vision systems which generate text on the basis of a photograph (ultimately) ©2001 LRB Schomaker - KI/RuG

  33. Misconception about Perception and Cognition • “A picture is worth a thousand words”?  True or False? ©2001 LRB Schomaker - KI/RuG

  34. “A picture is worth a thousand words”…. • But many pictures could use a few words…!

  35. “A picture is worth a thousand words”? This is a part of a rocket engine by NASA

  36. Assumptions • In image retrieval, the media type of photographs is preferred • There is a predominant interest in objects (in the broad sense: including humans and animals) • The most likely level of description in real-world images is the “basic-level” category (Rosch et al.) ©2001 LRB Schomaker - KI/RuG

  37. Goal: object-based image search • Object recognition in an open domain?  Not possible yet. • Extensive annotation is needed in any case: for indexed access and for machine learning (MPEG-7 allows for sophisticated annotation) • But who is going to do the annotation: the content provider or the user, and how? ©2001 LRB Schomaker - KI/RuG

  38. How to realize object-based image search? • Bootstrap process for pattern recognition • cf.: Project CyC (Lenat) and openMind (Stork) • Collaborative, opportunistic annotation and object labeling (browser side) • Background learning process (server side) ©2001 LRB Schomaker - KI/RuG

  39. Design considerations • Focus on object-based representations and queries • Material: photographics with identifiable objects for which a verbal description can be given • Exploit human perceptual abilities • Allow for incremental annotation to obtain a growing training set ©2001 LRB Schomaker - KI/RuG

  40. Outline-based queries • In order to bridge the gap between what is currently possible and the ultimate goal of automatic object detection and classification, a closed curve, drawn around a known object is used as a bootstrap representation: An outline. • This closed curve contains shape information itself (XY, dXdY, curvature) and allows to separate visual object characteristics represented by the pixels which are enclosed by it from the background ©2001 LRB Schomaker - KI/RuG

  41. Scribbles vs Outlines

  42. Examples of outlines from a “Wild West” base of photographs

  43. Outline, basic features and matching

  44. More outline-based features • Lengths of radii from center of gravity • Curvature • Curvature scale space • Bitmap of an outline • Absolute Fourier transform |FFT| • Others (not tried yet): wavelets, Freeman coding ©2001 LRB Schomaker - KI/RuG

  45. Outline features: coordinates, running angle (cos(f),sin(f)), radii, |FFT|

  46. Outline examples from motor bicycle set

  47. Motor bike engine

  48. Image (pixel-based) features

  49. Matching possibilities

More Related