370 likes | 621 Views
Bundling Features for Large Scale Partial-Duplicate Web Image Search. Zhong Wu ∗, Qifa Ke , Michael Isard , and Jian Sun CVPR 2009 . Outline. Introduction Bundled features Image Retrieval using bundled feature Experiments and results Conclusion. Outline. Introduction
E N D
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu∗, QifaKe, Michael Isard, and Jian Sun CVPR 2009
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
Target • Given a query image, is to locate its near- and partial-duplicate images in a large corpus of web images.
State-of-the-art • Visual word(quantization) & scalable textual index retrieval schemes • Post-processing • Geometric verification • Bundled feature • Weak geometric verification • Bundled feature = SIFT + SMER
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
MSER • Maximally Stable Extremal Region
Discriminative power • Increase discriminative power • Feature region size • Feature dimensionality • Drawbacks • Less repeatable • Localization accuracy • Sensitive to occlusion, photometric, geometric
Advantage • More discriminative • Allowed to have large overlap error • Partially match • Robust • Occlusion • Geometric changes • …etc
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
Feature quantization • Hierarchical k-means • One million visual words from 50K training images
Feature quantization • K-D tree • pointList = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)]
Inverted-file index • Documents • T0 = "it is what it is" • T1 = "what is it" • T2 = "it is a banana" • Index • "a": {2} • "banana": {2} • "is": {0, 1, 2} • "it": {0, 1, 2} • "what": {0, 1}
Indexing and retrieval • Support • 512 bundled features each image • 32 visual word each bundled feature
Indexing and retrieval • Voting
Indexing and retrieval • tf • 100 vocabularies in a document, ‘a’ 3 times • 0.03 (3/100) • idf • 1,000 documents have ‘a’, total number of documents 10,000,000 • 9.21 ( ln(10,000,000 / 1,000) ) • if-idf = 0.28( 0.03 * 9.21)
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
Dataset • Basic dataset • One million images most frequently clicked in a popular commercial image-search engine • (50K, 200K, 500K) • Ground truth • Manually labeled 780 partial-duplicate web image form 19 groups. • Evaluation dataset = basic dataset + ground truth • Query • 150 images from ground truth
mAP • Mean average precision • EX: • two images A&B • A has 4 duplicate images • B has 5 duplicate images • Retrieval rank A: 1, 2, 4, 7 • Retrieval rank B: 1, 3, 5 • Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 • Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 • mAP= (0.83+0.45)/2=0.64
Evaluation • Baseline • Bag-of-features approach with soft assignment[13] [13] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.
Evaluation • Compare(HE) • enhance the with hamming embedding [3] by adding a 24-bit hamming code to filter out target features. [3] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.
Evaluation baseline0.35 to Bundled(mem)0.40 a 14% improvement baseline0.35 to Bundled 0.49 a 40% improvement baseline0.35 to Bundled+HE0.52 a 49% improvement
Evaluation • Compare(Re-ranking) • Full geometric verification, RANSAC for top 300 candidate images
Evaluation Baseline 0.35 to Bundled+re-rank 0.62 a 77% improvement Baseline+re-rank 0.50 to Bundled+re-rank 0.62 a 24% improvement
Evaluation • Trade-off • Run time • a single CPU on a 3.0GHz Core Duo desktop with 16G memory
Sample results AP from 0.51 to 0.74 a 45% improvement
Outline • Introduction • Bundled features • Image Retrieval using bundled feature • Experiments and results • Conclusion
Conclusion • Bundled features for large scale partial-duplicate web image search. • Bundled features property • More discriminative than individual SIFT features. • Simple and robust geometric constraints • Partially match two groups of SIFT features • Advantage • Robustness to occlusion, photometric and geometric changes