Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit

Low Complexity Keypoint Recognition and Pose EstimationVincent Lepetit

Real-Time 3D Object Detection Runs at 15 Hz

One class per keypoint: the set of the keypoint’s possible appearances under various perspective, lighting, noise... Nearest neighbor classification Pre-processing Make the actual classification easier Keypoint Recognition The general approach [Lowe, Matas, Mikolajczyk] is a particular case of classification: Search in the Database

Training phase Classifier Used at run-time to recognize the keypoints

A New Classifier: FernsJoint Work with Mustafa Özuysal

Compromise: which is proportional to but complete representation of the joint distribution infeasible. Naive Bayesian ignores the correlation: We are looking for If patch can be represented by a set of image features { fi }:

Presentation on an Example

Posterior probabilities: The tests compare the intensities of two pixels around the keypoint: Invariant to light change by any raising function. Ferns: Training

0 1 1 1 0 0 1 0 1 Ferns: Training ++ 1 ++ 5 ++ 6

Ferns: Training

Ferns: Training Results

Ferns: Recognition

It Really Works

500 classes. No orientation or perspective correction. Ferns outperform Trees Ferns responses are combined multiplicatively (Naive Bayesian rule) FERNS Recognition rate TREES Trees responses are combined additively (average) Number of structures

Optimized Locations versus Random Locations:We Can Use Random Tests Comparison of the recognition rates for 200 keypoints: Information gain optimization Randomness Recognition rate Number of trees

For a small number of classes we can try several tests, and retain the best one according to some criterion. We Can Use Random Tests

For a small number of classes we can try several tests, and retain the best one according to some criterion. When the number of classes is large any test does a decent job: We Can Use Random Tests

Another Graphical Interpretation

Building the ferns takes no time (except for the posterior probabilities estimation); Simplifies the classifier structure; Allows incremental learning. We Can Use Random Tests:Why It Is Interesting

Comparison with SIFTRecognition rate FERNS Number of Inliers SIFT Frame Index

Comparison with SIFTComputation time • SIFT: 1 ms to compute the descriptor of a keypoint (without including convolution); • FERNS: 13.5 micro-second to classify one keypoint into 200 classes.

Keypoint Recognition in Ten Lines of Code 1: for(int i = 0; i < H; i++) P[i ] = 0.; 2: for(int k = 0; k < M; k++) { 3: int index = 0, * d = D + k * 2 * S; 4: for(int j = 0; j < S; j++) { 5: index <<= 1; 6: if (*(K + d[0]) < *(K + d[1])) 7: index++; 8: d += 2; } 9: p = PF + k * shift2 + index * shift1; 10: for(int i = 0; i < H; i++) P[i] += p[i]; } Very simple to implement; No need for orientation nor perspective correction; (Almost) no parameters to tune; Very fast.

The number of ferns, and The number of tests per ferns can be tuned to adapt to the hardware in terms of CPU power and memory size. Ferns Tuning

Feature Harvesting Estimate the posterior probabilities from a training video sequence:

Training examples Matches Feature Harvesting With the ferns, we can easily: - add a class; - remove a class; - add samples of a class to refine the classifier.  Incremental learning Detect Object in Current Frame Update Classifier  No need to store image patches;  We can select the keypoints the classifier can recognize.

Test Sequence

Handling Light Changes

Low Complexity Keypoint Recognition and Pose Estimation

EPnP: An Accurate Non-Iterative O(n) Solution to the PnP ProblemJoint Work with Francesc Moreno-Noguer

2D/3D correspondences known Internal parameters known Rotation, Translation ? The Perspective-n-Point (PnP) Problem How to take advantage of the internal parameters ? Solutions exist for the specific cases n = 3 [...], n = 4 [...], n = 5 [...], and the general case [...].

A Stable Algorithm Rotation Error (%) MEAN MEDIAN Number of points used to estimate pose LHM: Lu-Hager-Mjolsness, Fast and Globally Convergent Pose Estimation from Video Images. PAMI'00. (Alternatively optimize over Rotation and Translation); EPnP: Our method.

A Fast Algorithm MEDIAN Rotation Error (%) Computation Time (sec) - Logarithmic scale

Estimate the coordinates of the 3D points in the camera coordinate system. Rotation, Translation [Lu et al. PAMI00] General Approach

The 3D points are expressed as a weighted sum of four control points. Introducing Control Points  12 unknowns: The coordinates of the control points in the camera coordinates system.

The Point Reprojections Give a Linear System For each correspondencei: Rewriting and Concatenating the Equations from all the Correspondences:

The Solution as Weighted Sum of Eigenvectors • Mx = 0 • MTMx = 0 • x belongs to the null space of MTM: • with vi eigenvectors of matrix MTM associated to null eigenvalues. • Computing MTM is the most costly operation — and linear in n, the number of correspondences.

From 12 Unknowns to 1, 2, 3, or 4 • The i are our N new unknowns; • N is the dimension of the null space of MTM; • Without noise: N = 1 (scale ambiguity). • In practice: no zero eigenvalues, but several very small, and N ≥ 1 (depends on the 2D locations noise). • We found that only the cases N = 1, 2, 3 and 4 must be considered.

How the Control Points Vary with the i When varying the i: Reprojections in the Image Corresponding 3D points

Imposing the Rigidity Constraint The distances between the control points must be preserved:  6 quadratic equations in the i.

The Case N = 1 , and 6 quadratic equations: • 1 can easily be computed: • Its absolute value is solution of a linear system: • Its sign is chosen so that the handedness of the control points is preserved.

The Case N = 2 , and 6 quadratic equations: We use the linearization technique. Gives 6 linear equations in 11 = 12, 12 = 1 2, and 22 = 22 :

The Case N = 3 , and 6 quadratic equations: Same linearization technique. Gives 6 linear equations for 6 unknowns:

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit

Low Complexity Keypoint Recognition and Pose Estimation Vincent Lepetit

Presentation Transcript

Human Pose Recognition

Pose Estimation and Segmentation of People in 3D Movies

Keypoint -based Recognition and Object Search

Lecture 15-16 Pose Estimation – Gaussian Process

Low complexity and distributed energy minimization

Pictorial Structures for Articulated Pose Estimation

Joint Eye Tracking and Head Pose Estimation for Gaze Estimation

Why pose estimation?

Articulated People Detection and Pose Estimation: Reshaping the Future

Pose Estimation

Pose Estimation

Pose Estimation Using Four Corresponding Points

Lecture 15-16 Pose Estimation – Gaussian Process

Monocular 3D Pose Estimation and Tracking by Detection

Head pose estimation without manual initialization

Low-complexity and Repetitive Regions

Software complexity estimation

Human body model and pose estimation and abnormality classification

Database-Based Hand Pose Estimation

Pose Invariant Palmprint Recognition

Low-Complexity Channel Estimation for Wireless OFDM Systems

Keypoint-based Recognition