420 likes | 528 Views
Wonwoo Lee, Youngmin Park, Vincent Lepetit , Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011. Video-Based In Situ Tagging on Mobile Phones. Outline. Introduction Online Target Learning Detection and Tracking
E N D
Wonwoo Lee, Youngmin Park, Vincent Lepetit, Woontack Woo IEEE TRANSACTIONS ON CURCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 10, OCTOBER 2011 Video-Based In Situ Tagging on Mobile Phones
Outline • Introduction • Online Target Learning • DetectionandTracking • Experimental Results • Conclusion
Introduction • Objective : Augment a real-world scene with minimal user intervention on a mobile phone. “Anywhere Augmentation” • Considerations: • Avoid reconstruction of 3D scene • Perspective patch recognition • Mobile phone processing power • Mobile phone accelerometers • Mobile phone Bluetooth connectivity • http://www.youtube.com/watch?v=Hg20kmM8R1A
Introduction • The proposed method follows a standardprocedure of target learning and detection.
Introduction • The proposed method follows a standardprocedure of target learning and detection
Online Target Learning • Input: Image of the target plane • Output: Patch data and camera poses • Assumptions • Known camera parameters • Horizontal or vertical surface
Frontal View Generation • We need a frontal view to create the patch data and their associated poses. Targets whose frontal views are available.
Frontal View Generation • However, frontal views are not alwaysavailable in the real world. Targets whose frontal views are NOT available.
Frontal View Generation • Objective : Fronto-parallel view image from the input image. • Approach : Exploit the phone’s built-in accelerometer. • Assumption:Patch is on horizontal or vertical surface.
Frontal View Generation • The orientation of a target (H / V) is recommended based on the current pose of the phone. π/4 Vertical Horizontal Parallel to Ground Horizontal -π/4 G (detected by acceleromaeter)
Frontal View Generation Under the 1 degree of freedom assumption • Frontal view camera: [I|0] • Captured view camera: [R|c] • T = -Rc • Function to warp image to virtual frontal view.[12] [12] R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge, U.K.: Cambridge Univ. Press, 2000.
Blurred Patch Generation • Objective: Learn the appearances of a target surface fast. • Approach : Adopt the approach of patch learning in ”Gepard” [6] • Real-time learning of a patch on the desktop computer. [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab, “Learning real-time perspective patch rectification,” Int. J. Comput. Vis., vol. 91, pp. 107–130, Jan. 2011.
Review: Gepard[6] • Fast patch learning by linearizing image warping with principal component analysis. • “Mean patch” as a patch descriptor. • Difficult to directly apply to mobile phone platform. • Low performance of mobile phone CPU • Large amount of pre-computed data is required (about 90MB)
Modified Gepard[6] • Remove need for fronto-parallel view • Using phone’s accelerometers and limiting to 2 planes • Skip the Feature Point Detection step • Instead use larger patches for robustness • Replace how templates are constructed • By blurring instead • Added Bluetooth sharing of AR configuration
Blurred Patch Generation • Approach: Use blurred patch instead of mean patch
Blurred Patch Generation • Generate blurred patches through multi-pass rendering in a GPU. • Faster image processing through a GPU’s parallelism.
Blurred Patch Generation • 1st Pass: Warping • Render the input patch from a certain viewpoint • Much faster than on CPU
Blurred Patch Generation • 2nd Pass: Radial blurring to the warped patch • Allow the blurred patch covers a range of poses close to the exact pose
Blurred Patch Generation • 3rd Pass: Gaussian blurring to the radial-blurred patch • Make the blurred patch robust to image noise
Blurred Patch Generation • Fig. 7. Effectiveness of radial blur. Combining the radial blur and the Gaussian blur outperforms simple Gaussian blurring.
Blurred Patch Generation • 4th Pass: Accumulation of blurred patches in a texture unit. • Reduce the number of readback from GPU memory to CPU memory
Post-Processing • Downsamplingblurred patches • (128x128) to (32x32) • Normalization • Zero mean and Standard Deviation of 1
Detection & Tracking • User points the target through the camera. • Square patch at the center of the image is used for detection.
Detection & Tracking • Initial pose is retrieved by comparing the input patch with the learned mean patches. • ESM-Blur[20] is applied for further pose refinement. • NEON instructions are used for faster pose refinement. [20] Y. Park, V. Lepetit, and W. Woo, “ESM-blur: Handling and renderingblur in 3D tracking and augmentation,” in Proc. Int. Symp. MixedAugment. Reality, 2009, pp. 163–166.
Experimental Results • Patch size: 128 x 128 • Number of views used for learning: 225 • Maximum radial blur range: 10 degrees • Gaussian blur kernel: 11x11 • Memory requirement: 900 KB for a target
Experimental Results • More views, more rendering. • Slow radial blur due on the mobile phone. • Possible speed improvement through shader optimization. iPhone 3GS iPhone 4 PC
Experimental Results • Comparison with Gepard[6] Fig. 11. Planar targets used for evaluation. (a) Sign-1. (b) Sign-2. (c) Car. (d) Wall. (e) City. (f) Cafe. (g) Book. (h) Grass. (i) MacMini. (j) Board. The patches delimited by the yellow squares are used as a reference patch. [6] S. Hinterstoisser, V. Lepetit, S. Benhimane, P. Fua, and N. Navab,“Learning real-time perspective patch rectification,” Int. J. Comput. Vis.,vol. 91, pp. 107–130, Jan. 2011.
Experimental Results • Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones. • Our approach performs slightly worse in terms of recognition rates, but it is better adapted to mobile phones.
Experimental Results • The mean patches comparison takes about 3ms with 225 views. • The speed of pose estimation and tracking with ESM-Blur dependon the accuracy of the initial pose provided by patch detection.
Limitations • Weak to repetitive textures and reflective surfaces. • Currently single target only.
Conclusion • Potential applications • AR tagging on the real world • AR apps “anywhere anytime” • Future work • More optimization on mobile phones • Detection of multiple targets at the same time
Video http://www.youtube.com/watch?v=DLegclJVa0E