220 likes | 334 Views
Query Specific Fusion for Image Retrieval. Shaoting Zhang, Ming Yang NEC Laboratories, America. Outline. Overview of image retrieval/search Basic paradigm Local features indexed by vocabulary trees Global features indexed by compact hash codes Query specific fusion Graph construction
E N D
Query Specific Fusion for Image Retrieval Shaoting Zhang, Ming Yang NEC Laboratories, America
Outline • Overview of image retrieval/search • Basic paradigm • Local features indexed by vocabulary trees • Global features indexed by compact hash codes • Query specific fusion • Graph construction • Graph fusion • Graph-based ranking • Experiments
Content-based Image retrieval/search Online • Scalability !!! • Computational efficiency • Memory consumption • Retrieval accuracy feature extraction hashing search re-rank Query image Features Hashing codes Rank list Offline feature extraction hashing indexing Data base Images Features Hashing codes Inverted indexing
Local Features Indexed by Vocabulary Trees • Features: SIFT features • Hashing: visual word IDs by hierarchical K-means. • Indexing: vocabulary trees • Search: voting and sorting • An example: • ~1K SIFT features per image • 10^6~1M leaf nodes in the tree • Query time: ~100-200ms for 1M images in the database Scalable Recognition with a Vocabulary Tree D. Nister and H. Stewenius, CVPR’06
Global Features Indexed by Compact Hash Codes • Feature: GIST, RGB or HSV histograms, etc. • Hashing: compact binary codes, e.g., PCA+rotation+binarization. • Indexing: a flat storage with/out inverted indexes • Search: exhaustive search with Hamming distances + re-ranking • An example: • GIST -> PCA -> Binarization • 960 floats -> 256 floats -> 256 bits (217 times smaller). • Query time: 50-100ms, search 1M images using Hamming dist Modeling the Shape of the Scene: A Holistic Representation, A. Oliva and A. Torralba, IJCV’01 Small Codes and Large Image Databases for Recognition, A. Torralba, R. Fergus, Y. Weiss, CVPR’08 Iterative Quantization: A Procrustean Approach to Learning Binary Codes, Y. Gong and S. Lazebnik, CVPR’11
Motivation • Pros and Cons • Can we combine or fuse these two approaches? • Improve the retrieval precision • No sacrifice of the efficiency • Early fusion (feature level)? • Late fusion (rank list level)?
Challenges • The features and algorithms are dramatically different. • Hard for the feature-level fusion • Hard for the rank aggregation • The fusion is query specific and database dependent • Hard to learn how to combine cross different datasets • No supervision and relevance feedback! • Hard to evaluate the retrieval quality online
Query Specific Fusion • How to evaluate online the quality of retrieve results from methods using local or global features? • Assumption: The consensus degree among top candidate images reveal the retrieval quality • The consistency of top candidates’ nearest neighborhoods. • A graph-based approach to fusing and re-ranking retrieval results of different methods.
Graph Construction • Construct a weighted undirected graph to represent a set of retrieval results of a query image q. • Given the query q, image database D, a similarity function S(.,.), top-k neighborhood . • Edge: the reciprocal neighbor relation • Edge weight: the Jaccard similarity between neighborhoods
Graph Fusion • Fuse multiple graphs to one graph • Union of the nodes/edges and sum of the weights
Graph-based Ranking • Ranking by a local Page Rank • Perform a link analysis on G • Rank the nodes by their connectivity in G • Ranking by maximizing weighted density
Experiments • Datasets: 4 public benchmark datasets • UKBench : 2,550*4=10200 images (k=5) • Corel-5K : 50*100 = 5000 images (k=15) • Holidays : 1491 images in 500 groups (k=5) • SFLandmark : 1.06M PCI and 638K PFI images (k=30) • Baseline methods • Local features: VOC (contextual weighting, ICCV11) • Global features: GIST (960D=>256bits), HSV (2000D=>256bits) • Rank aggregation • A fusion method based on an SVM classifier • Nearest neighbors are stored offline for the database
UKBench • Evaluation: 4 x recall at the first four returned images, referred as N-S score (maximum = 4).
Corel-5K • Corel 5K: 50 categories, each category has 100 images. Average top-1 precision for leave-one-out retrievals.
Holidays • Evaluation: mAP (%) for 1491 queries.
San Francisco Landmark • Database images: • Perspective central images (PCIs): 1.07M • Perspective frontal images (PFIs): 638K. • Query images: 803 image taken with a smart phone • Evaluation: The recall rate in terms of buildings
San Francisco Landmark • The fusion is applied to the top-50 candidates given by VOC.
Computation and Memory Cost • The average query time • Memory cost • 340MB extra storage for the top-50 nearest neighbor for 1.7M images in the SFLandmark.
Sample Query Results (1) • In the UKbench
Sample Query Results (2) • In the Corel-5K
Sample Query Results (3) • In the SFlankmark
Conclusions • A graph-based query specific fusion of retrieval sets based on local and global features • Requires no supervision • Retains the efficiency of both methods • Improves the retrieval precision consistently on 4 datasets • Easy to be reproduced by other motivated researchers • Limitations • No reciprocal neighbor for certain queries in either methods • Dynamical insertion or removal of database images