Bundling Features for Large Scale Partial-Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu∗, QifaKe, Michael Isard, and Jian Sun CVPR 2009

Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion

Target • Given a query image, is to locate its near- and partial-duplicate images in a large corpus of web images.

Unlike object-based image retrieval

State-of-the-art • Visual word(quantization) & scalable textual index retrieval schemes • Post-processing • Geometric verification • Bundled feature • Weak geometric verification • Bundled feature = SIFT + SMER

MSER • Maximally Stable Extremal Region

MSER

Bundled features

Discriminative power • Increase discriminative power • Feature region size • Feature dimensionality • Drawbacks • Less repeatable • Localization accuracy • Sensitive to occlusion, photometric, geometric

Matching bundled features

Bundled features

Advantage • More discriminative • Allowed to have large overlap error • Partially match • Robust • Occlusion • Geometric changes • …etc

Feature quantization • Hierarchical k-means • One million visual words from 50K training images

Feature quantization • K-D tree • pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]

Matching bundled features

Inverted-file index • Documents • T0 = "it is what it is" • T1 = "what is it" • T2 = "it is a banana" • Index • "a": {2} • "banana": {2} • "is": {0, 1, 2} • "it": {0, 1, 2} • "what": {0, 1}

Indexing and retrieval • Support • 512 bundled features each image • 32 visual word each bundled feature

Indexing and retrieval • Voting

Indexing and retrieval • tf • 100 vocabularies in a document, ‘a’ 3 times • 0.03 (3/100) • idf • 1,000 documents have ‘a’, total number of documents 10,000,000 • 9.21 ( ln(10,000,000 / 1,000) ) • if-idf = 0.28( 0.03 * 9.21)

Dataset • Basic dataset • One million images most frequently clicked in a popular commercial image-search engine • (50K, 200K, 500K) • Ground truth • Manually labeled 780 partial-duplicate web image form 19 groups. • Evaluation dataset = basic dataset + ground truth • Query • 150 images from ground truth

mAP • Mean average precision • EX: • two images A&B • A has 4 duplicate images • B has 5 duplicate images • Retrieval rank A: 1, 2, 4, 7 • Retrieval rank B: 1, 3, 5 • Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 • Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 • mAP= (0.83+0.45)/2=0.64

Evaluation • Baseline • Bag-of-features approach with soft assignment[13] [13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.

Evaluation • Compare(HE) • enhance the with hamming embedding [3] by adding a 24-bit hamming code to filter out target features. [3] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.

Evaluation baseline0.35 to Bundled(mem)0.40 a 14% improvement baseline0.35 to Bundled 0.49 a 40% improvement baseline0.35 to Bundled+HE0.52 a 49% improvement

Evaluation • Compare(Re-ranking) • Full geometric verification, RANSAC for top 300 candidate images

Evaluation Baseline 0.35 to Bundled+re-rank 0.62 a 77% improvement Baseline+re-rank 0.50 to Bundled+re-rank 0.62 a 24% improvement

Evaluation • Trade-off • Run time • a single CPU on a 3.0GHz Core Duo desktop with 16G memory

Sample results AP from 0.51 to 0.74 a 45% improvement

Sample results

Conclusion • Bundled features for large scale partial-duplicate web image search. • Bundled features property • More discriminative than individual SIFT features. • Simple and robust geometric constraints • Partially match two groups of SIFT features • Advantage • Robustness to occlusion, photometric and geometric changes

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Presentation Transcript

WISE: Large Scale Content-Based Web Image Search

Bundling small scale projects

Finding Near-Duplicate Web Pages: A Large-Scale Evaluation of Algorithms

Automatic Wrappers for Large Scale Web Extraction

VisualRank : Applying PageRank to Large-Scale Image Search

Bundling Features for Large Scale Partial-Duplicate Web Image Search

Large-Scale Image Parsing

VisualRank - Applying PageRank to Large-Scale Image Search

Cross-Indexing of Binary Scale Invariant Feature Transform Codes for Large-Scale Image Search

Hierarchical Semantic Indexing for Large Scale Image Retrieval

SVD-SIFT FOR WEB NEAR-DUPLICATE IMAGE DETECTION

Large Scale Depositional Features

Large-Scale Nonparametric Image Parsing

VisualRank : Applying PageRank to Large-Scale Image Search

FINDING NEAR DUPLICATE WEB PAGES: A LARGE-SCALE EVALUATION OF ALGORITHMS

Large-Scale Content-Based Image Retrieval

Very Large Scale Neighborhood Search

MUFIN: Large-scale Similarity Search

Exploiting Large Scale Web Semantics

Search and Access Technologies for Large Scale Web Archives

Automatic Wrappers for Large Scale Web Extraction

HathiTrust Large Scale Search