630 likes | 889 Views
Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit. Real-Time 3D Object Detection. Runs at 15 Hz. One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise. Nearest neighbor classification. Pre-processing
E N D
Low Complexity Keypoint Recognition and Pose EstimationVincent Lepetit
Real-Time 3D Object Detection Runs at 15 Hz
One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise... Nearest neighbor classification Pre-processing Make the actual classification easier Keypoint Recognition The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification: Search in the Database
Training phase Classifier Used at run-time to recognize the keypoints
Compromise: which is proportional to but complete representation of the joint distribution infeasible. Naive Bayesian ignores the correlation: We are looking for If patch can be represented by a set of image features { fi }:
Posterior probabilities: The tests compare the intensities of two pixels around the keypoint: Invariant to light change by any raising function. Ferns: Training
0 1 1 1 0 0 1 0 1 Ferns: Training ++ 1 ++ 5 ++ 6
500 classes. No orientation or perspective correction. Ferns outperform Trees Ferns responses are combined multiplicatively (Naive Bayesian rule) FERNS Recognition rate TREES Trees responses are combined additively (average) Number of structures
Optimized Locations versus Random Locations:We Can Use Random Tests Comparison of the recognition rates for 200 keypoints: Information gain optimization Randomness Recognition rate Number of trees
For a small number of classes we can try several tests, and retain the best one according to some criterion. We Can Use Random Tests
For a small number of classes we can try several tests, and retain the best one according to some criterion. When the number of classes is large any test does a decent job: We Can Use Random Tests
Building the ferns takes no time (except for the posterior probabilities estimation); Simplifies the classifier structure; Allows incremental learning. We Can Use Random Tests:Why It Is Interesting
Comparison with SIFTRecognition rate FERNS Number of Inliers SIFT Frame Index
Comparison with SIFTComputation time • SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution); • FERNS: 13.5 micro-second to classify one keypoint into 200 classes.
Keypoint Recognition in Ten Lines of Code 1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1; 10: for(int i = 0; i < H; i++) P[i] += p[i]; } Very simple to implement; No need for orientation nor perspective correction; (Almost) no parameters to tune; Very fast.
The number of ferns, and The number of tests per ferns can be tuned to adapt to the hardware in terms of CPU power and memory size. Ferns Tuning
Feature Harvesting Estimate the posterior probabilities from a training video sequence:
Training examples Matches Feature Harvesting With the ferns, we can easily: - add a class; - remove a class; - add samples of a class to refine the classifier. Incremental learning Detect Object in Current Frame Update Classifier No need to store image patches; We can select the keypoints the classifier can recognize.
EPnP: An Accurate Non-Iterative O(n) Solution to the PnP ProblemJoint Work with Francesc Moreno-Noguer
2D/3D correspondences known Internal parameters known Rotation, Translation ? The Perspective-n-Point (PnP) Problem How to take advantage of the internal parameters ? Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].
A Stable Algorithm Rotation Error (%) MEAN MEDIAN Number of points used to estimate pose LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation); EPnP: Our method.
A Fast Algorithm MEDIAN Rotation Error (%) Computation Time (sec) - Logarithmic scale
Estimate the coordinates of the 3D points in the camera coordinate system. Rotation, Translation [Lu et al. PAMI00] General Approach
The 3D points are expressed as a weighted sum of four control points. Introducing Control Points 12 unknowns: The coordinates of the control points in the camera coordinates system.
The Point Reprojections Give a Linear System For each correspondencei: Rewriting and Concatenating the Equations from all the Correspondences:
The Solution as Weighted Sum of Eigenvectors • Mx = 0 • MTMx = 0 • x belongs to the null space of MTM: • with vi eigenvectors of matrix MTM associated to null eigenvalues. • Computing MTM is the most costly operation — and linear in n, the number of correspondences.
From 12 Unknowns to 1, 2, 3, or 4 • The i are our N new unknowns; • N is the dimension of the null space of MTM; • Without noise: N = 1 (scale ambiguity). • In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise). • We found that only the cases N = 1, 2, 3 and 4 must be considered.
How the Control Points Vary with the i When varying the i: Reprojections in the Image Corresponding 3D points
Imposing the Rigidity Constraint The distances between the control points must be preserved: 6 quadratic equations in the i.
The Case N = 1 , and 6 quadratic equations: • 1 can easily be computed: • Its absolute value is solution of a linear system: • Its sign is chosen so that the handedness of the control points is preserved.
The Case N = 2 , and 6 quadratic equations: We use the linearization technique. Gives 6 linear equations in 11 = 12, 12 = 1 2, and 22 = 22 :
The Case N = 3 , and 6 quadratic equations: Same linearization technique. Gives 6 linear equations for 6 unknowns: