280 likes | 440 Views
Combining Text and Image Queries at ImageCLEF2005: A Corpus-Based Relevance-Feedback Approach. Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan. Wen-Cheng Lin Department of Medical Informatics Tzu Chi University
E N D
Combining Text and Image Queries at ImageCLEF2005:A Corpus-Based Relevance-Feedback Approach Hsin-Hsi Chen Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan Wen-Cheng Lin Department of Medical Informatics Tzu Chi University Hualien, Taiwan Yih-Cheng Chang Department of Computer Science and Information Engineering National Taiwan University Taipei, Taiwan ImageCLEF 2005
Why Combining Text and Image Queries in Cross Language Image Retrieval ? • Text-based image retrieval • Translation errors in cross language image retrieval • Annotation errors in automatic annotation • Easy to catch semantic meanings • Easy to construct textual query • Content-based image retrieval (CBIR) • Semantic meanings are hard to be represented • Have to find/draw example images • Avoid translation in cross-language image retrieval • Annotation is not necessary
How to Combine Text and Image Features in Cross Language Image Retrieval ? • Parallel approachConducting text- and content-based retrieval separately and merging the retrieval results • Pipeline approachUsing textual or visual information to perform initial retrieval, and then employing the other feature to filter out the irrelevant images • Transformation-based approachMining the relations between images and text, and employing the mined relations to transform textual information into visual one, and vice versa
Approach at ImageCLEF 2004 • Automatically transform textual queries into visual representations • Mine the relationships between text and images • Divide an image into several smaller parts • Link the words in caption to the corresponding parts • Analogous to word alignment in a sentence aligned parallel corpus • Build a transmedia dictionary • Transform a textual query into visual one using the transmedia dictionary
Training collection Target collection Source language textual query Images Image captions Images Image captions Language resources Text-Image correlation learning Query transformation Query translation Visual query Target language textual query Visual index Textual index Transmedia dictionary Content-based image retrieval Text-based image retrieval Result merging Retrieved images System at ImageCLEF2004
Learning Correlation • Mare and foal in field, slopes of Clatto Hill, Fife segmentation hill mare foal field slope B01 B02 B03 B04
Text-Based Image Retrievalat ImageCLEF2004 • Using similarity-based backward transliteration improves performance 69.71%
Cross-Language Experimentsat ImageCLEF2004 +0.46%: Insignificant Performance Increase + poor
Analyses of These Approaches • Parallel approach and Pipeline approach • Simple and useful • Not employ the relations betweenvisual and textual features • Transformation-based approach • Textual and visual queries can be translated to each other using relations between visual and textual features • Hard to learn all relations between all visual and textual features • Degree of ambiguity of the relations is usually high
Our Approach at ImageCLEF2005:A Corpus-Based Relevance Feedback Method • A Corpus-Based Relevance Feedback approach • Initiate a content-based retrieval • Treat the retrieved images and their text descriptions as aligned documents • Adopt a corpus-based method to select key terms from text descriptions, and generate a new query.
Fundamental Concepts of a Corpus-Based Relevant Feedback Approach
(Aircraft on the ground) VIPER system
Bilingual Ad hoc Retrieval Task • 28,133 photographs from St. Andrews University Library’s photographic collection • Collection is in English and queries are in different languages • In our experiments, queries are in Chinese • All images are accompanied by a textual description written in English by librarians working at St. Andrews Library • The test set contains 28 topics, and each topic has text description and an example image.
An Example –A topic in Chinese An English Title A Chinese Title
Experiment Results at ImageCLEF2005 + +15.78% +25.96% + +11.01% Performance of EE+EX > CE+EX EE > EX > CE > Visual run
Lessons Learned • Combining Textual and Visual information can improve performance • Comparing to initial visual retrieval, average precision is increased from 8.29% to 34.25% after feedback cycle.
Example: Aircraft on the Ground ( ) • Text only (monolingual) • Text only (cross-lingual ) Top 2 images in cross-lingual run are non-relevant because of query translation problem : clear ( ), above ( ), floor ( )
Example: Aircraft on the Ground(after integration) • Text (monolingual) + Visual Text+VisualRun is better than monolingualrun because it expands some useful words, e.g., aeroplane, military air base, airfield
ImageCLEF2004 vs. ImageCLEF2005 • Text-based IR (monolingual case) • 0.6304 (2004) vs. 0.3952 (2005) • Topics of this year is a little harder • Text+Image IR (monolingual case) • 0.6591 (2004) vs. 0.5053 (2005) • Text+Image IR (crosslingual case) • 0.4441 (2004) vs. 0.3977 (2005) • 70.45% vs. 100.63%
Automatic Annotation Task • The automatic annotate task in ImageCLEF 2005 can be seen as a classification task, since each image can only be annotated with one word (i.e., a category) • We propose several methods to measure the similarity between a test image and a category, and a test image is classified to the most similar category. • The methods we proposed use the same image features, but different classification approaches.
Image Feature Extraction • Resize images to 256 x 256 pixels • Segment each image into 32 x 32 blocks (each block is 8 x 8 pixels). • Compute the average gray value of each block to construct a vector with 1,024 elements. • The similarity between two images is measured by cosine formula.
Some Models and Experimental Results • NTU-annotate05-1NN Baseline model. It uses 1-NN method to classify each image. • NTU-annotate05-Top2 Computing the similarity between a test image and a category using the top 2 nearest images in each category, and classify the test image to the most similar category. • NTU-annotate05-SC Training data is clustered using k-means algorithm (k=1000). We compute the centroid of each category in each cluster, and classify a test image to the category of the nearest centroid.
Conclusion:Bilingual Ad hoc Retrieval Task • An approach of combining textual and image features is proposed for Chinese-English image retrieval. a corpus-basedfeedback cycle from CBIR • Compared with the performance of monolingual IR (0.3952), integrating visual and textual queries achieves better performance in CL image retrieval (0.3977). resolve part of translation errors • The integration of visual and textual queries also improves the performance of the monolingual IR from 0.3952 to 0.5053. provide more information • The improvement is the best among all the groups. 78.2% of the best monolingual text retrieval
Conclusion:Automatic Annotation Task • A feature extraction algorithm is proposed and several classification approaches are explored under the same image features. • The approaches of 1-NN and top-2, which have error rates 21.7%, outperform the centroid-based approach (with error rate 22.5%). • Our method is 9% worse than the group of the best performance (error rate 12.6%), but is better than most of the groups in this task.