Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Combining Text and Image Queries at ImageCLEF2005:A Corpus-Based Relevance-Feedback Approach Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Wen-Cheng Lin Department of Medical Informatics Tzu Chi University Hualien, Taiwan Yih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ImageCLEF 2005

Why Combining Text and Image Queries in Cross Language Image Retrieval ? • Text-based image retrieval • Translation errors in cross language image retrieval • Annotation errors in automatic annotation • Easy to catch semantic meanings • Easy to construct textual query • Content-based image retrieval (CBIR) • Semantic meanings are hard to be represented • Have to find/draw example images • Avoid translation in cross-language image retrieval • Annotation is not necessary

How to Combine Text and Image Features in Cross Language Image Retrieval ? • Parallel approachConducting text- and content-based retrieval separately and merging the retrieval results • Pipeline approachUsing textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images • Transformation-based approachMining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa

Approach at ImageCLEF 2004 • Automatically transform textual queries into visual representations • Mine the relationships between text and images • Divide an image into several smaller parts • Link the words in caption to the corresponding parts • Analogous to word alignment in a sentence aligned parallel corpus • Build a transmedia dictionary • Transform a textual query into visual one using the transmedia dictionary

Training collection Target collection Source language textual query Images Image captions Images Image captions Language resources Text-Image correlation learning Query transformation Query translation Visual query Target language textual query Visual index Textual index Transmedia dictionary Content-based image retrieval Text-based image retrieval Result merging Retrieved images System at ImageCLEF2004

Learning Correlation • Mare and foal in field, slopes of Clatto Hill, Fife segmentation hill mare foal field slope B01 B02 B03 B04

Text-Based Image Retrievalat ImageCLEF2004 • Using similarity-based backward transliteration improves performance 69.71%

Cross-Language Experimentsat ImageCLEF2004 +0.46%: Insignificant Performance Increase + poor

Analyses of These Approaches • Parallel approach and Pipeline approach • Simple and useful • Not employ the relations betweenvisual and textual features • Transformation-based approach • Textual and visual queries can be translated to each other using relations between visual and textual features • Hard to learn all relations between all visual and textual features • Degree of ambiguity of the relations is usually high

Our Approach at ImageCLEF2005:A Corpus-Based Relevance Feedback Method • A Corpus-Based Relevance Feedback approach • Initiate a content-based retrieval • Treat the retrieved images and their text descriptions as aligned documents • Adopt a corpus-based method to select key terms from text descriptions, and generate a new query.

Fundamental Concepts of a Corpus-Based Relevant Feedback Approach

(Aircraft on the ground) VIPER system

Bilingual Ad hoc Retrieval Task • 28,133 photographs from St. Andrews University Library’s photographic collection • Collection is in English and queries are in different languages • In our experiments, queries are in Chinese • All images are accompanied by a textual description written in English by librarians working at St. Andrews Library • The test set contains 28 topics, and each topic has text description and an example image.

An Example –An image and Its Description

An Example –A topic in Chinese An English Title A Chinese Title

Some Models in Formal Runs

Experiment Results at ImageCLEF2005 + +15.78% +25.96% + +11.01% Performance of EE+EX > CE+EX  EE > EX > CE > Visual run

Lessons Learned • Combining Textual and Visual information can improve performance • Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle.

Example: Aircraft on the Ground ( ) • Text only (monolingual) • Text only (cross-lingual ) Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )

Example: Aircraft on the Ground(after integration) • Text (monolingual) + Visual Text+VisualRun is better than monolingualrun because it expands some useful words, e.g., aeroplane, military air base, airfield

ImageCLEF2004 vs. ImageCLEF2005 • Text-based IR (monolingual case) • 0.6304 (2004) vs. 0.3952 (2005) • Topics of this year is a little harder • Text+Image IR (monolingual case) • 0.6591 (2004) vs. 0.5053 (2005) • Text+Image IR (crosslingual case) • 0.4441 (2004) vs. 0.3977 (2005) • 70.45% vs. 100.63%

Automatic Annotation Task • The automatic annotate task in ImageCLEF 2005 can be seen as a classification task, since each image can only be annotated with one word (i.e., a category) • We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category. • The methods we proposed use the same image features, but different classification approaches.

Image Feature Extraction • Resize images to 256 x 256 pixels • Segment each image into 32 x 32 blocks (each block is 8 x 8 pixels). • Compute the average gray value of each block to construct a vector with 1,024 elements. • The similarity between two images is measured by cosine formula.

Some Models and Experimental Results • NTU-annotate05-1NN Baseline model. It uses 1-NN method to classify each image. • NTU-annotate05-Top2 Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category. • NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.

Conclusion:Bilingual Ad hoc Retrieval Task • An approach of combining textual and image features is proposed for Chinese-English image retrieval.  a corpus-basedfeedback cycle from CBIR • Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977). resolve part of translation errors • The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053.  provide more information • The improvement is the best among all the groups.  78.2% of the best monolingual text retrieval

Conclusion:Automatic Annotation Task • A feature extraction algorithm is proposed and several classification approaches are explored under the same image features. • The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%). • Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.

Thank You and Comments

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach

Presentation Transcript

Object Recognition Using Alignment

Politics and Religion

Rigor Relevance Relationships Reflective Thought

Queries

Medical Image Registration: Concepts and Implementation

LUTEAL PHASE SUPPORT An evidence-based approach

Microsoft Access 2010

Access Chapter 2

Text

DISK

Off-line (and On-line) Text Analysis for Computational Lexicography

EPL660: DATA CLASSIFICATION

Content-based Image Retrieval (CBIR)

4.3 Digital Image Processing

ECE 472/572 - Digital Image Processing

Image Retrieval by Content (CBIR)

Image Processing

Giving feedback