210 likes | 381 Views
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models. J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University of Massachusetts – Amherst. Presenter: Carlos Diuk. Introduction. The Problem:
E N D
Automatic Image Annotation and Retrieval using Cross-Media Relevance Models J. Jeon, V. Lavrenko and R. Manmathat Computer Science Department University of Massachusetts – Amherst Presenter: Carlos Diuk
Introduction • The Problem: • Automatically annotate and retrieve images from large collections. Retrieval example: answer query “Tigers in grass” with
Introduction • Manual annotation being done in libraries. • Different approaches to automatic image annotation: • Co-occurence Model • Translation Model • Cross-media relevance model
Introduction – related work • Co-occurence Model Looks at co-occurence of words with image regions created using a regular grid. • Translation Model Image annotation viewed as task of translating from vocabulary of blobs to vocabulary of words.
Introduction – CMRM • Cross-media relevance models (CMRM) • Assume that images may be described from small vocabulary of blobs. • From a training set of annotated images, learn the joint distribution of blobs and words.
Introduction – CMRM • Cross-media relevance models (CMRM) • Allow query expansion: • Standard technique for reducing ambiguity in information retrieval. • Perform initial query and expand by using terms from the top relevant documents. Example in image context: tigers more often associated with grass, water, trees than with cars or computers.
Introduction – CMRM • Variations: • Document based expansion • PACMRM (probabilistic annotation CMRM) Blobs corresponding to each test image are used to generate words and associated probabilities. Each test generates a vector of probabilities for every word in vocabulary. • FACMRM (fixed annotation-based CMRM) Use top N words from PACMRM to annotate images. • Query based expansion • DRCMRM (direct-retrieval CMRM) Query words used to generate a set of blob probabilities. Vector of blob probabilities compared with vector from test image using Kullback-Lieber divergence and resulting KL distance.
Discrete features in images • Segmentation of images into regions yields fragile and erroneous results. • Normalized-cuts are used instead (Duygulu et al): • 33 features extracted from images. • K (=500) clustering algorithm used to cluster regions based on features. Vocabulary of 500 blobs.
CMRM Algorithms • Image I = {b1 .. bm} set of blobs • Training collection of images J = {b1 .. bm ; w1 .. wn} • Two problems: • Given un-annotated image I, assign meaningful keywords. • Given text query, retrieve images that contain objects mentioned.
CMRM Algorithms • Calculating probabilities.
CMRM Algorithms • Image retrieval • INPUT: query Q = w1 .. wn and collection C of images • OUTPUT: images described by query words. • Annotation-based retrieval model (PACMRM-FACMRM) • Annotate images as shown. • Perform text retrieval as usual. • Fixed-length annotation vs probabilistic annotation:
CMRM Algorithms • Image retrieval • INPUT: query Q = w1 .. wn and collection C of images • OUTPUT: images described by query words. • Direct retrieval model (DRCMRM) • Convert query into language of blobs, instead of images into words. • Estimation: • Ranking:
Results • Dataset • Corel Stock Photo CDs (5000 images – 4000 training, 500 evaluation, 500 testing). 371 words and 500 blobs. Manual annotations. • Metrics: • Recall: number of correctly retrieved images divided by number of relevant images. • Precision: number of correctly retrieved images divided by number of retrieved images. • Comparisons • Co-occurence vs Translation vs FACMRM
Results • Dataset • Corel Stock Photo CDs (5000 images – 4000 training, 500 evaluation, 500 testing). 371 words and 500 blobs. Manual annotations. • Metrics: • Recall: number of correctly retrieved images divided by number of relevant images. • Precision: number of correctly retrieved images divided by number of retrieved images. • Comparisons • Co-occurence vs Translation vs FACMRM
Results • Precision and recall for 70 one-word queries.
Results • PACMRM vs DRCMRM
Some nice examples Automatically annotated as sunset, but not manually
Some nice examples Response to query “tiger” Response to query “pillar”
Google: cooperative annotation? • Google search for “tiger”: • Google search for “Kennedy”: Questions - Discussion • No semantic representation (just color, texture, shape). • How could we annotate a newspaper’s collection? (“Kennedy”, not just “people”)