Evaluation of features detectors and descriptors based on 3D objects

Evaluation of features detectors and descriptors based on 3D objects P. Moreels - P. Perona California Institute of Technology

Features – what for ? Large baseline stereo Object recognition [Dorko & Schmid’05] [Tuytelaars & Van Gool ’00] Stitching [Lowe ’04] [Brown & Lowe ’03]

Moving the viewpoint

Features stability Features stability is not perfect… 240 keypoints extracted 232 keypoints extracted

First stage – feature detector Harris [Harris’88] difference of gaussians [Crowley’84] Affine invariant Harris [Mikolajczyk’02] Kadir & Brady [Kadir’02]

Second stage – feature descriptor SIFT Steerable filters [Lowe ’04] [Freeman’91 ] Shape context Differential invariants [Schmid’97 ] [Belongie’02 ]

Evaluations – Mikolajczyk ’03-’05 • Large viewpoint change • Computation of ground truth positions via a homography

Evaluations – Mikolajczyk ’03-’05 • SIFT-based descriptors rule ! • All affine-invariant detectors are good, they should all be used together. [CVPR’03] [PAMI’04] [submitted]

2D vs. 3D Ranking of detectors/descriptors combinations are modified when switching from 2D to 3D objects

Dataset – 100 3D objects

Viewpoints 45° apart

Ground truth - Epipolar constraints

Testing setup Unrelated images used to load the database of features.

Distance ratio • Correct matches are highly distinctive  lower ratio • Incorrect correspondences are ‘random correspondences’  low distinctiveness and ratio close to 1 [Lowe’04]

Are we accepting wrong matches ? Pietro said maybe don’t need this slide – I think it is important to justify our 3-cameras setup • Manual user classification into correct and incorrect triplets • Comparison with a simpler system: 2 views, only one epipolar constraint.

Detectors / descriptors tested Detectors Descriptors • Harris • Hessian • Harris-affine • Hessian-affine • Difference-of-gaussians • MSER • Kadir-Brady • SIFT • steerable filters • differential invariants • shape context • PCA-SIFT

Results – viewpoint change No ‘background’ images Mahalanobis distance

Results – lighting / scale changes Change in light – result averaged over 3 lighting conditions. Change in scale - 7.0mm to 14.6mm

Conclusions • Automated ground truth for 3D objects/scenes • Ranking changes from 2D to 3D • Stability is much lower for 3D • Detectors – affine-rectified detectors are indeed best • Descriptors – SIFT and shape context performed best. • Application: use ground truth in order to learn probability densities: ‘how does a correct match look like ?’

Evaluation of features detectors and descriptors based on 3D objects