220 likes | 490 Views
Compact Signatures for High-speed Interest Point Description and Matching. Calonder, Lepetit, Fua, Konolige, Bowman, Mihelich (as rendered by Lord). Just Kidding. Actually, we’re doing three papers Fast Keypoint Recognition in Ten Lines of Code (Ferns)
E N D
Compact Signatures for High-speed Interest Point Description and Matching Calonder, Lepetit, Fua, Konolige, Bowman, Mihelich (as rendered by Lord)
Just Kidding • Actually, we’re doing three papers • Fast Keypoint Recognition in Ten Lines of Code (Ferns) • Keypoint Signatures for Fast Learning and Recognition (Signatures) • Compact Signatures for High-speed Interest Point Description and Matching (Compact Signatures) • We will be doing them briefly, so don’t worry • Context: we’re talking about keypoint description and matching
Ferns • Problem: Features designed to be invariant or robust to commonly-observed deformations (e.g. SIFT) are slow to compute, limiting how many can be handled in many practical applications • Solution: Move most of the computation offline via a discriminative learning framework
Ferns We want to assign the patch around a keypoint to the most probable class ĉi given the binary features fj calculated over it: Standard Bayes’s Rule: Assuming a uniform prior, this becomes a maximum likelihood expression: Choose a very simple feature, the sign of the difference between two pixels:
Ferns Need about 300 of these features for accurate classification. The full joint thus can’t be represented. As usual, seek to alleviate this problem with independence assumptions. At the extreme: This (complete independence) will of course not really work on anything. So, a simple in-between: These groups are the ferns. Model dependence within each group, assume independence between them (at random):
Ferns The fern form has M2S parameters, with M between 30 and 50, and S about 10. The titular ten lines:
Ferns • Other details, which we’ll skip: • Modeling confidence in empirical estimates • Using thresholds to reduce evaluation count • Relationship with Random Trees • Comparison against SIFT
Signatures • Problem: Ferns are based on an offline training phase, so you can’t learn new features online. This renders ferns useless for, e.g., SLAM. • Solution: Describe new classes in terms of the old (assuming the initial set is rich enough).
Signatures Pull some keypoints at random from an arbitrary textured scene (here, N DOG/SIFT points not within 5 pixels): Call these points the “base set”, and train a Randomiz(s)ed Tree classifier on them. (Call the method “Generic Trees”.) The response of a keypoint from the base set to the classifier trained on the base set should peak at that keypoint: You also warp the base set patches to make the class recognition transformation-invariant (TBD):
Signatures The response of a keypoint not in the base set tends to peak in multiple (but relatively few) locations. This response is the keypoint’s “signature” (intended to be transformation-invariant): By thresholding, you can replace this signature with a sparse approximation to itself: A signature is essentially the collection of base patches you most resemble:
Signatures For evaluation, signatures are matched using best-bin-first with geometric ground truths on baseline pairs like this: N and t determine “signature length”, N explicitly and t implicitly (N increases description and matching, t only increases matching) (At t=0.01,) signature lengths are short and tightly distributed Experimentally, found reason to go beyond N=300
Signatures t does not have to be terribly large to max out your matching performance:
Signatures The selling point of this is that it gives very similar performance to SIFT, at a fraction of the cost in time (TBD): According to the paper, this represents a 35-time speedup. Division gives me about 53. Am I misunderstanding something, or was that a typo? (They also show this can be applied to SLAM, but we’ll note that without getting into it yet. TBD.)
Compact Signatures • Problem: Signatures are naturally sparse, but the first attempt at them did not exploit this: matching time and memory usage are higher than needed. • Solution: Compress the signatures through random projection. • (This is the whole paper.)
Compact Signatures You again have a base classifier consisting of J fern units. Although, now, the “ferns” are combined additively, like random trees, so they’re not really the ferns detailed in the reference (TBD): And again, there is a sparse version of the response created by thresholding against θ: With base size N, feature count d, and bytes to store a float b, the memory requirement of the approach is For J=50, d=10, and N=500, this exceeds 100 MB.
Compact Signatures However, you can compress this with an ROP matrix Φ: Because of the linear combination of fern responses, you can pre-compress the leaf vectors, avoiding storing their uncompressed versions: This effectively replaces N by M (row dimension of Φ), dividing memory requirement by N/M, and requiring N/M times fewer operations in computing descriptors. There is then further (SIMD-enabling) bit-level compression:
Compact Signatures The transformation from the previous approach (top) to this one (bottom) can be pictured like this:
Compact Signatures There’s no reason to make M larger than 176, and no reason to worry much about how you do the projection:
Compact Signatures This paper was about time and space: There are details about PTAM incorporation and a small appendix on compressive sensing, which we don’t do in detail here.
TBD • “Transformation-invariance” • SLAM application • Ferns vs. random trees