300 likes | 543 Views
A Statistical Approach to Speed Up Ranking/Re-Ranking. Hong-Ming Chen hc2599@columbia.edu Advisor: Professor Shih-Fu Chang. Outline . Flow chart of the overall work The idea of using statistical approach to do re-ranking By feature locations relationship O(n 2 ) time complexity
E N D
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen hc2599@columbia.edu Advisor: Professor Shih-Fu Chang
Outline • Flow chart of the overall work • The idea of using statistical approach to do re-ranking • By feature locations relationship • O(n2) time complexity • By orientation relationship • O(n) time complexity • The re-rank accuracy is as good as RANSAC • Experimental result evaluation
Flow Chart 1 – ranking components construction Code Book Bag of Word histograms of the database images Respond top-N result Bag of Word histogram of the query image Hierarchical k-means [1][2] Dataset: Ukbench [1] Query image [1] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006. [2] http://www.vlfeat.org/
Flow Chart 2 – re-ranking components construction Re-rank by RANSAC [3] Respond top-N result Result evaluation Re-rank by proposed statistical approach [3] http://www.csse.uwa.edu.au/~pk/research/matlabfns/, Peter Kovesi, Centre for Exploration Targeting School of Earth and Environment The University of Western Australia
1. Feature Locations Relationship • SIFT features [4] are: • Invariant to translation, rotation and scaling • Partially invariant to local geometric distortion • For an ideal similar image pair: • Only translation, rotation and scaling • The ratio of corresponding distance pairs should be constant. P2a P1a dist1 dist2 P1b P2a Image A Image B [4] David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004)
1. Feature Locations Relationship • SIFT features [4] are: • Invariant to translation, rotation and scaling • Partially invariant to local geometric distortion • For a similar image pair with view angle difference: • Translation, rotation and scaling • Local geometric distortion, and wrong feature points matching • The ratio of corresponding distance pairs is near constant. P2a P1a dist1 dist2 P1b P2a Image A Image B [4] David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004)
Example ukbench00000 ukbench00001 Mean = 0.85 Variance = 0.017 Total amount of match points: 554 Mean: scaling Variance: matching error, the smaller the better
1. Feature Locations Relationship • Assumption after observation: • A similar image pair: a distribution with small distribution variance • A dissimilar image pair: a distribution with large distribution variance
Analysis of feature locations relationship • Relationship of match pair numbers and averagevariances between similar image pairs and dissimilar image pairs Red: dissimilar image pairs Blue: similar image pairs
2. Feature orientation Relationship • SIFT features [4] are: • Invariant to translation, rotation and scaling • Partially invariant to local geometric distortion • For similar image pairs: • The rotation degree of P1a -> P1b should be EQUAL to the rotation degree of P2a -> P2b P2a P1a P1b P2a Image A Image B [4] David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004)
Example ukbench00000 Shift about pi/4 ukbench00001 The rotation degree is about 50, Distance measured by histogram intersection ukbench00001
2. Feature orientation Relationship • Assumption after observation: • A similar image pair: small orientation histogram distance • A dissimilar image pair: large orientation histogram distance
Analysis of Feature orientation Relationship Relationship of match pair numbers and averageorientation intersection difference between similar image pairs and dissimilar image pairs Red: dissimilar image pairs Blue: similar image pairs
Why I zoom in the small-match-number portion of the diagrams?
Dataset and features discussion • Ukbench dataset analysis: • 2550 classes, 4 images/class • Similar image pairs combination: C(4, 2) * 2550 = 15300 pairs • High percentage of similar image pairs having small amount of match points. (with default ratio value = 0.6) • The re-ranking criteria should have outstanding performance especially only having small match points amount.
Comparison of two re-ranking approach High variance of the variance of Scaling Distribution, even though the mean of it is quite distinctive.
Comparison of two re-ranking approach The variance of orientation histogram difference are very small (with respect to its mean value) and stable.
Comparison of two re-ranking approach Overall, the orientation histogram difference can clearly separate similar/dissimilar image pairs, because of its large distance of mean value and quite small variance.
Comparison of two re-ranking approach When match points are more than 5, the orientation histogram difference can roughly separate similar and dissimilar image pairs.
Comparison of two re-ranking approach When match points are more than 10, the orientation histogram difference can clearly separate similar and dissimilar image pairs.
Experimental results discussion • 1. the impact of k values (cluster centers) K=1000 K=4096 K=10000 K=50625 K=100000
Experimental results discussion • 2. the impact of looking up code book by different approach: • A. by tracing the vocabulary tree [1]: efficient, but the result is not optimal • B. by scanning the whole code book: very slow, but guarantees a optimalBoW result with respect to the K centers K=1000: decoded by tree K=1000: decoded directly K=10000: decoded by tree K=10000: decoded directly
K=1000 Ground truth Rotation Scale var + rotation RANSAC Scale var Original Re-rank depth =20
K=50625 Ground truth Rotation Scale var + rotation RANSAC Scale var Original Re-rank depth =20
Experimental result -- all • Re-rank depth = 20
Time Complexity Analysis • RANSAC: O(Kn): • K: random subset tried • n: input data size • no upper bound on the time it takes to compute the parameters • Distribution of Feature Location distance relationship: • O(n2) : distribution consists of all distance relationships • O(n): when n (match point number) is large enough, we can subsample “reliable enough” amount of samples to form the distribution • The distance of orientation histograms of matched SIFT features: • O(n): to generate rotation angle histograms of matched SIFT features • Constant time for compute rotation angles • Only little overhead with respect to searching match points
Future work • We have: • 1. Scale information • 2. Orientation information • 3. Trivial to find translation • A good initial guess for precise homography matrix estimation? • Applied the current approach to quantized SIFT features: • Using a code word to represent a interesting point, rather than applying 128 dimension vector • Moving from exact 1-1 mapping to many-to-many mapping. • I’ve tried to solve this problem. However, there are now no satisfying results at this stage.
Reference • [1] D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006. • [2] http://www.vlfeat.org/ • [3] http://www.csse.uwa.edu.au/~pk/research/matlabfns/, Peter Kovesi, Centre for Exploration Targeting School of Earth and Environment The University of Western Australia • [4] David G. Lowe, "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision, 60, 2 (2004)