1 / 21

Multi-Modal Image Search for Large-Scale Applications

Explore the evolution and state-of-the-art of multi-modal retrieval, classification, and comparison techniques in image search applications. Learn about fusion types, efficiency, flexibility, and the evaluation domain. Discover selected retrieval techniques and implementations for text and visual search.

Download Presentation

Multi-Modal Image Search for Large-Scale Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-modal image search forlarge-scale applications Petra Budikova, Michal Batko, Pavel Zezula Masaryk University, Czech Republic

  2. Outline • Motivation • Importance of being multi-modal • Diversity of approaches, lack of evaluation • Especially large-scale • Contributions • Classification of approaches • General multi-modal searching • Implementation in image search domain • Experimental evaluation setup • Queries, ground truth, relevance measures • Selected test datasets • Early analysis of results • Conclusions MDDE 2012, Istanbul, August 31

  3. Evolution of Multimedia Retrieval • Text-based searching • Search by annotations, category search • Limits: missing/erroneous annotations, “image is worth a thousand words” • Content-based retrieval • Query-by-example paradigm • Limits: semantic gap • Multi-modal approaches • Combining orthogonal views on similarity • Overcoming limitations of individual approaches • Text and visual, video and text, location and visual, … MDDE 2012, Istanbul, August 31

  4. Multi-modal retrieval: state-of-the-art • Commerce: • text-based searching refined by visual rank • Google, Bing, … • Research: • Multi-modal indices • Specialized • Text&visual, location&text, … • General metric space indexing • M-tree family, M-index, … • Threshold algorithm for modality fusion • Fagin: Combining Fuzzy Information from Multiple Systems. • Late fusion, ranking methods • Visual search+text rank, relevance-feedback ranking, … • Numerous solutions presented at ImageCLEF, … MDDE 2012, Istanbul, August 31

  5. Multi-modal retrieval: state-of-the-art II Number of strategies and techniques but No reasonable comparison! • Only for some pairs • Small-scale MDDE 2012, Istanbul, August 31

  6. Our Objective Large-scale comparison of fundamental approaches to multi-modal retrieval • Classification of solutions • How are the individual modalities processed and fused? • Efficiency, flexibility? • Comparable implementations • Image data domain • MESSIF implementation framework • Real-world evaluation platform • 2 different datasets with 20 million images • Human-evaluated relevance MDDE 2012, Istanbul, August 31

  7. Classification I: query processing phases • Basic search: evaluate query over the whole database • Postprocessing: evaluate query over candidate objects MDDE 2012, Istanbul, August 31

  8. Classification II: modality fusion type • All modalities are equal • Early fusion • Specialized indices • Should be efficient • Usually not flexible • May imply costly evaluations of distances • Late fusion • Threshold Algorithm • In theory exact • Can be extremely costly • Medium flexibility • Fusion ranking • Flexible, efficient • Approximate solution MDDE 2012, Istanbul, August 31

  9. Classification III: modality fusion type • Some modalities are more important • Ranking • Google, Bing, … • Very flexible and efficient • May exploit (pseudo-)relevance feedback • Quality of results strongly influenced by the performance of the primary modality • Inherent fusion • MUFIN • Very flexible • Little added costs as compared to ranking • Possibly better quality than with ranking MDDE 2012, Istanbul, August 31

  10. Evaluation Domain • Image retrieval • Popular application, easy to evaluate • Text and general metric features • the same model applicable to many other domains! • Selected modalities • Text: keywords, tf-idf measure • Visual • MPEG7 global descriptors • SIFT local descriptors • Aggregation function: Weighted sum • Implementation platform • MESSIF library for large-scale metric searching MDDE 2012, Istanbul, August 31

  11. Selected Techniques • Single modality retrieval • Baseline for evaluation • Needed in some search&postprocess strategies • Text search: • tf-idf relevance measure • Lucene implementation • Visual search: • Only by global descriptors – weighted sum of five MPEG7 features • Local descriptors not feasible • Centralized M-index MDDE 2012, Istanbul, August 31

  12. Selected Techniques II • Early fusion: combined text&visual basic search • “joint features model” • Fixed combination of modalities => not flexible • Implementation: metric index (M-index) by text&visual similarity • Late fusion: separate TBIR and CBIR followed by results aggregation • Most frequent technique of image-text fusion • Efficient (parallel evaluation of single-modality retrievals), can interconnect existing systems • Aggregation can be costly • Implementations: Threshold Algorithm, fusion ranking MDDE 2012, Istanbul, August 31

  13. Selected Techniques III • Text-based retrieval with inherent fusion • Text used for selection of candidates • Combined text&visual distances evaluated • “large-scale ranking”, in distributed environment can be executed in parallel on partial candidate sets • Implementation: text search, all objects with non-zero text score ranked by combined similarity • Content-based retrieval with inherent fusion • Complementary to previous • Implementation: candidate data regions indentified in visual-based index, combined similarity evaluated MDDE 2012, Istanbul, August 31

  14. Selected Techniques IV • Result ranking techniques • Rank by text • Implementation: Tf-idf • Rank by visual similarity • Implementation: rank by global descriptors (MEPG7), rank by local descriptors (SIFT) • Pseudo-RF ranking • Popular, rapidly developing methods • Trying to overcome the semantic gap • Explore properties of objects in basic search result, relationships • Implementation: important descriptors rank (low variance), reverse kNN rank, clustering rank MDDE 2012, Istanbul, August 31

  15. Selected Techniques V • Overview of implemented techniques Candidate objects (sizes: 100 – 2000) Query Result Results postprocessing Basic search MDDE 2012, Istanbul, August 31

  16. Evaluation • Queries: 100 image+keyword queries • Frequent queries from photostock company logs • Easy and difficult queries selected by experience • Ground truth • Pooling approach, human assessors • 3-grade relevance: very good, acceptable, irrelevant • Translated to relevance percentage, averaged • Result quality measures • Precision@k, DCG,NDCG MDDE 2012, Istanbul, August 31

  17. Evaluation III Evaluation datasets: Profimedia dataset Real-world photo collection created for sale 20M high-quality images, rich and precise keyword annotations CoPhIR dataset (Flickr photos) Real-world photo collection created for fun 20M images of different quality, sparse and erroneous keyword annotations Close-up of bee sitting on pink field flower animalapisapismelliferaarthropodbeauty in nature beemacroflowerspringinsectanimalspollinationbloom blossombrowncloseupcloseupcollectingcolorcreative_tag extremeclose-up flora floralflower front viewhairyhoneybee horizontal image insectinvertebratehoney-bee no honeybee MDDE 2012, Istanbul, August 31 17/21

  18. [ms] Results: bimodal fusion performance • Text and MPEG7 modalities fusion • Text-based solutions the best • Expected – rich annotations • Ranking significantly improves search • For both text and visual search • Choice of primary modality extremely important! • Threshold algorithm very costly • Not suitable for large-scale • Inherent fusion costs acceptable • Result quality slightly better than with ranking MDDE 2012, Istanbul, August 31

  19. Results: limits of text-based searching • Queries where text-based solutions do not provide best results Query text: bird • complex queries • “two coins” • ambiguous queries • “shells”, “stamp” • too broad queries • “bird” • Future work: a more detailed analysis of aspects that influence the performance of a given modality for a given query MDDE 2012, Istanbul, August 31

  20. Results: multi-modal search with ranking • Effectiveness and efficiency of ranking techniques NDCG at 30 NDCG at 30 # of ranked objects • Text search ranking: • MPEG7 rank performs equally as well as SIFT while more efficient • Visual search ranking: • The most complementary modality is the best • Influence of the number of ranked objects • Differs for text and visual search • May be related to search space dimensionality – future work # of ranked objects MDDE 2012, Istanbul, August 31

  21. Conclusion • State-of-the-art • Multi-modal search paradigm • Rapid development of approaches • Real-world evaluations needed • Our contribution • First extensive evaluation of fundamental approaches to large-scale multi-modal retrieval • No big surprises, but valuable insights gained • Effectiveness vs. efficiency tradeoff, strengths and limits of text-based solutions, performance of various ranking methods • Future work • Evaluate again on qualitatively different data • Determine conditions of usability of individual methods ? MDDE 2012, Istanbul, August 31

More Related