1 / 33

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions. Stepan Obdrzalek Jirı Matas. Proposed Algorithm. Identify affine-covariant regions of interest MSER detector Construct local affine frames (LAFs) Invariant to geometry and photometrics

ishmael
Download Presentation

Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Object Recognition using Local Affine Frames on Maximally Stable Extremal Regions Stepan Obdrzalek Jirı Matas

  2. Proposed Algorithm Identify affine-covariant regions of interest MSER detector Construct local affine frames (LAFs) Invariant to geometry and photometrics Normalize LAF geometry and color Generate descriptors of patches Discrete cosine transformation Recognition & Localization Establish tentative correspondences Find a globally consistent subset Infer presence and location of object

  3. Requirement for Region Detectors Consistent Discriminative Invariant (actually: covariant) Appearance is consistent with the transformation scaling, rotation, shearing Fixed shape is insufficient Shape must be covariant to object position (Sticky)

  4. Popular Affine Covariant Detectors Harris-Affine Hessian-Affine Edge Intensity Extrema Salient Regions MSER

  5. Harris-affine & Hessian-affine Detect interest points Identify corners in image using Harris corner detector Determine the “characteristic” scale Maximization of Laplacian-of-Gaussians Determine an elliptical region for each point Second moment matrix

  6. Edge based detector Edges are stable across view, scale, illumination Detect interest points Identify corners in image using Harris corner detector Identify edges using canny Combine to form a parallelogram Determine the “characteristic” scale Parallelograms where textures hit an extremum

  7. Intensity based detector Detect interest points Identify local extremum in intensity Analyze rays projecting radially Determine the “characteristic” scale Best-fit ellipse that passes through ray-points with large intensity shifts

  8. Salient region detector Based on PDF of intensity values computed over elliptical region Detect interest points Measure the pixel entropy within elliptical regions Select regions with high “complexity” Determine the “characteristic” scale Optimal scale is determined by the identified region

  9. Maximally Stable Extremal Region (MSER) Connected component of thresholded image Efficient to implement O(number pixels) Detect interest points All pixels inside the MSER have higher or lower intensities than in the surrounding regions Regions are selected to be stable over intensity range Determine the “characteristic” scale Optimal scale is automatic to MSER algorithm

  10. Runtime comparison

  11. Local Affine Frame (LAF) from Features Comparing transformed image regions can be simplified by constructing a viewpoint invariant coordinate system that is feature-based Coordinates are based on local features Coordinates “stick” to features Features must describe 6 degrees of freedom Simple points and ellipses are not sufficient MSER regions are sufficient Assumptions Local planarity Perspective camera

  12. Local Affine Frame (LAF) from Features

  13. Local Affine Frame (LAF) from Features 2D affine transformation has 6 degrees of freedom 6 independent constraints must be found Correspondence of 3 non-collinear points Constraints are derived from detected primitives

  14. Local Affine Frame (LAF) from Features Region shape constructions Center of gravity 2 constraints: resolves translation 2x2 covariance matrix ∑(ii) 3 constraints: Together with COG, fixes affine up to unknown rotation Concavities 4 constraints: line and point tangent to line Don’t require detection of whole region Curvature inflection points From concave to convex Straight line segments of boundary

  15. Local Affine Frame (LAF) from Features Intensity Constructions: pixels inside a region Orientations of gradients Rotation Direction of dominant texture periodicity Rotaion Extrema of RGB or any scalar function 2 constraints

  16. Local Affine Frame (LAF) from Features Topology of regions: Mutual configuration of regions Nested regions Neighboring regions Holes Incident regions

  17. LAF Construction Construction of primitives covering 6 degrees of freedom

  18. Geometric Normalization Translate between canonical / image frame Origin = (0,0)T, Basis Vectors = (1,0)T, (0,1)T Measurement Region (MR) Image region used to determine local correspondences (-2,3) x (-2,3)

  19. Photometric Normalization Translate between canonical / image frame Reflections and shadows are ignored Illumination, gain, aperture, etc. is modeled by affine transformations of color channels Transformation between two patches I and I’ is: Requires 6 additional normalization parameters Intensities are affinely transformed to have zero mean unit variance

  20. Normalization of Local Representation Translate between canonical / image frame 12 normalization parameters stored with the descriptor Coverage

  21. Descriptors Desirable properties Distinguish between large number of regions Maximize ratio of similarities between match & mismatch Robust or invariant to localization errors & transformations Efficient on memory and speed Discrete Cosine Transformation (JPEG compression) Algorithms require O(n lg n) Hardware implementations Robust to misalignment Same discrimination as SIFT

  22. Matching detected frames with query frames Comparison Compute similarities between all detected and query frames Matching Select most likely matches Verification Consistency check that incorporates geometric constraints

  23. Comparison Determine the probability that a transformation can take place Based on training experience If probability is below a threshold, ∞ similarity Otherwise, determined by descriptor similarity

  24. Matching Nearest Match Most common For each detected frame, find closest query frame Mutually Nearest Match For symmetric matching (e.g. stereo) For each detected, find closest query For each query, find closest detected Match if (close query = close detected) or (diff < threshold) All (or N most) similar Repetitive structures (many ambiguous correspondences) Keep all correspondences, resolution left to verification High number of false correspondences

  25. Verification All matches should be consistent with same model 3D models would only be effective if visible parts of the image are very large (building interiors) Sufficient to model as planar surfaces If 2 tentative correspondences are part of the same plane Similar geometric transformation Similar photometric transformation Set of all correspondences is decomposed into subsets of consistent correspondences Each subset represents a single plane in the scene Small sets are rejected

  26. Experimental Validation: COIL-100 100 objects 72 images each object 5º pose intervals Controlled lighting

  27. Experimental Validation: ZuBuD 201 buildings 5 pictures each

  28. Experimental Validation: FOCUS Product logos Logos occupy small image portion 360 color images

  29. Conclusion Object recognition based on local measurements Affine invariance achieved by expressing local appearance in terms of affine covariant coordinates Promising results Problems Speed is the primary issue All query compared to all database Speed improved using hashing, cost may be accuracy Planar surface assumption Rigid objects Shadow, etc.

More Related