Structure-from-Motion Algorithm to Capture 3D Information from a Sequence of Video Images

EE 990 Seminar Dec 04, 2003 Structure-from-Motion Algorithm to Capture 3D Information from a Sequence of Video Images By Rishabh Malhotra Supervisor: Dr. Kunio Takaya TRLabs / University of Saskatchewan New Media

Outline of the Presentation Introduction: Problem Definition The “Structure-from-Motion” Algorithm with an illustration Conclusion Applications New Media

Defining The Problem What is SfM? Specific computation of 3D geometry (Structure) from given 2D geometry frames (Motion). At least 2 views are required. 2D is already available. Need to find the third dimension. Depth information obtained creates 3D model only for visible part. Hence called 2.5D model. Motion of object gives intensity change in same pixels of the object which is used to calculate the depth information leading to the 3D Structure. New Media

The Scene (Sphere and Spot Light) remains fixed. Only the Camera moves in pure Translational motion showing the different regions of shadow on the Sphere. For an example – Scene moves to LEFT or Camera moves to RIGHT Scene moves to RIGHT or Camera moves to LEFT Camera, Spot Light and Sphere are Collinear What is VRML ? Specification for displaying 3-D objects on the WWW. 3-D equivalent of HTML. Need a VRML browser or VRML plug-in to a Web browser. E.g. Cortona Plug-in from Parallel Graphics. Produces a hyperspace (or a world), a 3-D space that appears on the display screen. Can figuratively move within this space. This world was developed in VRML (Virtual Reality Modeling Language) New Media

Surface and Depth (of Third Dimension) Estimation N Different intensity values on the same pixel of the two frames Assumptions: Relatively large Sphere radius. Very Small Camera displacement. Camera Moves a very small distance to the Right Conclusion: Surface is Concave and it must be given a lower elevation (third dimension) New Media

Phong Lighting and Shading Model The Concept: Phong reflection model tells us how light reflects from surfaces. Phong Shading is a form of interpolated shading for approximating curved surfaces. Instead of interpolating intensities, it interpolates the vertex normals. Why is it needed here? For Depth estimation at a location of the object. • Phong Lighting: an empirical model to calculate illumination at a point on a surface. • Phong Shading: linearly interpolating the surface normal across the facet, applying the Phong lighting model at every pixel (normal-vector interpolation-shading) Examples of Images made using Phong's Model: New Media

from Akenine-Moller & Haines Phong Lighting and Shading Model Uses 4 vectors: • From Source (L) • To Viewer (V) • Normal (N) • Perfect Reflector (R) An Example: Ambient Diffuse Specular The Phong equation: Has 3 Components: Ambient Light Diffuse Reflection Specular reflection Shininess Coefficient Ambient Intensity Quadratic Attenuation Term Diffuse Reflection Specular Reflection • As Shininess Coefficient (γ) is increased, the reflected light is concentrated in a narrower region, centred on the angle of perfect reflector. New Media

The Structure-from-Motion Algorithm Gradient Vector Flow (GVF) Image Segmentation using 2D Wavelets Motion Vector Estimation using Berkeley MPEG Tools Phong Lighting and Shading Model New Media

Step 1: Gradient Vector Flow Calculation Hence, Magnitude of and Direction of Gradient is a vector quantity and is a 2D first derivative measure of change. By Definition, The gradient of an image of continuous spatial coordinates x and y, is where New Media

Results using Gradient Vector Flow Calculation Zoom In View The Simplest Image: A Sphere (an oversimplified image as it has no edges) Original Image (100 x 100 pixels) Gradient Vector Flow Map New Media

Active contours or snakes are computer-generated curves that move within images to find object boundaries. This is a GVF field for a U-shaped object. These vectors will pull an active contour toward the boundary of the object. Step 2: Image Segmentation using 2D Wavelets GVF Snake Method Process of separating objects from the background, as well as from each other by deciding which pixels belong to each object. Wavelet Transform applied to the vector potential defined in a 2D image. Sub-band filtering applied to the vector potential can produce contour images of different scales. The Mallat or Haar Wavelet is considered for Image Segmentation. A GVF snake can start far from the boundary and will converge to boundary concavities. Active Contour New Media

Other Popular Image Segmentation methods include: • Edge Detection • Segmentation based on color • Region Growing and Shrinking • Clustering • Morphological Filtering Image Segmentation using Edge Detection for a more complicated image: Face Original Image (640 x 480 pixels) Edge Detection using x-direction Sobel operator (Threshold: 153) New Media

What is needed? • Novel motion vector prediction technique • A highly localized search pattern • A computational constraint explicitly incorporated into the cost measure Step 3: Motion Vector Estimation using Berkeley MPEG Tools • Previous Approaches: • Full Search Algorithm (Most precise matching but Computational Complexity  (2w+1)2 times) • Conjugate Direction Search(Complexity is reduced noticeably  3+2w) • Modified Logarithmic Search(Efficient and fast  2+7log(w)) Block Matching Algorithm BMA partitions the current frame in small, fixed size blocks and matches them in the previous frame in order to estimate blocks displacement (referred to as motion vectors) between two successive frames. New Media

Is <2,0> valid ? <2,0> Step 3: Motion Vector Estimation using Berkeley MPEG Tools Motion Estimation technique – using Block Matching Algorithm To find the “best” block from an earlier frame to construct an area of the current frame Apply block-matching algorithm to compute motion-vectors Frame #1 Frame #2 Frame #3 Translate motion-vectors into motion-predictions Motion vector: The displacement of the closest matching block in reference frame (past of future) for a block in current frame New Media

Conclusion Applications • 3D Model Reconstruction • 3D Motion Matching • Camera Calibration • 3D Vision • Stereo Television • Conversion of ordinary 2D films to a stereo movie to be displayed on a stereo TV. • Will become available as the next generation Television. A 2.5 dimensional figure of the object is produced similar to the carved in relief as a result of the series of processing's. New Media

Thank You Questions ? New Media

Shininess Coefficient and Specular Component New Media

Step 3: Motion Vector Estimation using Berkeley MPEG Tools MPEG Encoding • • • • • • I1 B1 B2 B3 P1 B4 B5 B6 P2 B7 B8 B9 I2 • Frame Types • I Intra Encode complete image, similar to JPEG • P Forward Predicted Motion relative to previous I and P’s • B Backward Predicted Motion relative to previous & future I’s & P’s New Media

Structure-from-Motion Algorithm to Capture 3D Information from a Sequence of Video Images

Structure-from-Motion Algorithm to Capture 3D Information from a Sequence of Video Images

Presentation Transcript

Structure from motion

Structure from Motion

Structure from Motion

Structure from images

Structure from Motion

Structure from motion

Structure from Motion

Structure from Motion

From Sequence to Structure

Structure from motion

Animation From Motion Capture

Structure from motion

Hand Signals Recognition from Video Using 3D Motion Capture Archive

Structure from Motion

Structure-from-Motion

Structure from Motion

Structure from Motion

Structure from Motion

Structure from Motion

Structure from motion

Structure from motion

Structure from motion