1 / 55

2D Tracking to 3D Reconstruction of Human Body from Monocular Video

2D Tracking to 3D Reconstruction of Human Body from Monocular Video. Moin Nabi Mohammad Rastegari. Approaches:. Introduction to 3D Reconstruction. Stereo [Multiple Camera] Monocular [Single Camera]. Difficult!. Difficulties of 3D Reconstruction.

ronat
Download Presentation

2D Tracking to 3D Reconstruction of Human Body from Monocular Video

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2D Tracking to 3D Reconstruction of Human Body from Monocular Video Moin Nabi Mohammad Rastegari

  2. Approaches: Introduction to 3D Reconstruction Stereo [Multiple Camera] Monocular [Single Camera] Difficult!

  3. Difficulties of 3D Reconstruction

  4. Local properties not enough for depth estimation. Need to learn global structure. • Overall organization of the image • Contextual Information Difficulties of Monocular 3D Reconstruction

  5. Difficulties of Monocular 3D Reconstruction

  6. Depth ambiguity problem Difficulties of Monocular 3D Reconstruction We can have innumerable States with single Observation we should estimate Depth

  7. Forward-Backward ambiguity problem Difficulties of Monocular 3D Reconstruction We can have 2#limbs configuration • With Physical constrain • With Learning

  8. Application of Monocular 3D Reconstruction 3D Motion Capturing • 3D Medical Imaging

  9. Application of Monocular 3D Reconstruction Human-Computer Interfaces • Video games, More Reality

  10. Problem Backgrounds • Humans interpret 2D video easily 2D video offers limited clues about actual 3D motion Goal: Reliable 3D reconstructions from standard single-camera input

  11. Work-Flow of Monocular 3D Reconstruction Skeleton Extraction 2D Tracking 3D Reconstruction Build Flesh 2D 3D ?

  12. Skeleton Extraction

  13. Skeleton Extraction • Proposed Skeleton for Human Body

  14. Overview of approach 2D Tracking 3D Reconstruction

  15. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ] Objective: To recover the configuration of an articulated object from image measurements Assumptions: Scaled orthographic projection (unknown scale) Relative lengths of segments in model known Input:Correspondences between joints in the model and points in the image Output:Characterization of the set of all possible configurations

  16. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ] ?

  17. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ] The set of all possible solutions can be characterized by a single scalar parameter, s and a set of binary flags indicating the direction of each segment Solutions for various values of the s parameter

  18. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ] In practice the policy of choosing minimum allowable value of scale parameter as default usually yields acceptable result since it reflects the fact that one or more segments in the model are typically quit close to perpendicular to the viewing direction and are, therefore, not significantly foreshortened. The scalar, s, was chosen to be the minimum possible value and the segment directions were specified by the user.

  19. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ] Experimental results:

  20. Reconstruction of articulated Objects from Point Correspondences in a Single Uncalibrated Image [Camillo J. Taylor, 2000 ]

  21. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001] Motion divided into short movements, informally called snippets. Assign probability to 3D snippets by analyzing knowledge base. Each snippet of 2D observations is matched to the most likely 3D motion. Resulting snippets are stitched together to reconstruct complete movement.

  22. Learning Priors on Human Motion Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001] choose snippet -> Long enough to be informative, but short enough to characterize Collect known 3D motions, form snippets. Group similar movements, assemble matrix. SVD gives Gaussian probability cloud that generalizes to similar movements.

  23. Posterior Probability Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001] Bayes’Law gives probability of 3D snippet given the 2D observations: P(snip|obs)=k P(obs|snip) P(snip) Training database gives prior -> P(snip). Assume normal distribution of tracking errors to get likelihood -> P(obs|snip).

  24. Stitching Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001] Snippets overlap by n frames. Use weighted interpolation for frames of overlapping snippets.

  25. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

  26. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

  27. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video [N. R. Howe, M. E. Leventon, W. T. Freeman, 2001]

  28. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] Depth ambiguity -> by using Taylor method Forward-Backward ambiguity -> Prune possible binary configurations

  29. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] Forward-Backward ambiguity -> For any point-set, X, representing a motion, we can represent its binary configuration with respect to the image plane where 0 means that the limb points outwards, from the image plane, and 1 means that the limb points inwards, towards the image plane.

  30. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] Example for 4 limbs: In this case, limb 1 and limb 2 are both parallel to the mage-plane. If limb 1 is the root segment, limb 3 points towards the image plane, while limb 4 points away from the image plane. Any infinitesimal rotation (except for rotations around limb 1 and limb 2), of this structure will put it into one of the following four binary configurations: [0, 0, 1, 0], [0, 1, 1, 0], [1, 0, 1, 0], [1, 1, 1, 0]

  31. Limited Domain: Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] 3d Reconstruction in Limited Domain Key frame Selection

  32. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] Qualitative measure: Sign of determinant Humming distance

  33. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004] Experimental Results:

  34. Monocular Reconstruction of 3D Human Motion by Qualitative Selection[M. Eriksson, S. Carlsson, 2004]

  35. Learning to Reconstruct 3D Human Pose and Motion from Silhouettes[A. Agarwal, B. Triggs, 2004] • Recover 3D human body pose from image silhouettes • 3D pose = joint angles • Use either individual images or video sequences

  36. 2 Broad Classes of Approaches • Model based approaches • Presuppose an explicitly known parametric body model • Inverting kinematics / Numerical optimization • subcase: Model based tracking • Learning based approaches • Avoid accurate 3D modeling/rendering • e.g. Example based methods

  37. “Model Free” Learning – based Approach • Recovers 3D pose (joint angles) by direct regression on robust silhouette descriptors • Sparse kernel-based regressor trained used human motion capture data • Advantages: • no need to build an explicit 3D model • easily adapted to different people / appearances • may be more robust than model based approach • Disadvantages: • harder to interpret than explicit model, and may be less • accurate

  38. The Basic Idea To learn a compact system that directly outputs pose from an image • Represent the input (image) by a descriptor vector z. • Write the multi-parameter output (pose) as a vector x. • Learn a regressor x = F(z) + ε Note: this assumes a functional relationship between z and x, which might not really be the case.

  39. Silhouette Descriptors

  40. Why Use Silhouettes ? • Captures most of the available pose information • Can (often) be extracted from real images • Insensitive to colour, texture, clothing • No prior labeling (e.g. of limbs) required Limitations • Artifacts like attached shadows are common • Depth ordering / sidedness information is lost

  41. Ambiguities Which arm / leg is forwards? Front or back view? Where is occluded arm? How much is knee bent? Silhouette-to-pose problem is inherently multi-valued … Single-valued regressors sometimes behave erratically

  42. Shape Context Histograms • Need to capture silhouette shape but be robust against occlusions/segmentation failures • Avoid global descriptors like moments • Use Shape Context Histograms – distributions of local shape context responses

  43. Shape Context Histograms Encode Locality First 2 principal components of Shape Context (SC) distribution from combined training data, with k-means centres superimposed, and an SC distribution from a single silhouette. SCs implicitly encode position on silhouette – an average overall human silhouettes -like form is discernable

  44. Nonlinear Regression

  45. Regression Model Predict output vector x (here 3D human pose), given input vector z (here a shape context histogram): x = ∑ akφk(z) + ε ≡A f(z) +ε • {φk(z) | k = 1…p} : basis functions • A≡ (a1a2 … ap) • f(z) = (φ1(z) φ2(z) … φp(z))T • Kernel basesφk = K(z,zk) for given centre points zkand kernel K. e.g. K(z,zk) = exp(-β║z-zk║2) p k=1 A

  46. Regularized Least Squares A = arg min { ∑║Af(zi) - xi║2 + R(A)} = arg min { ║AF - X║2 + R(A)} R(A): Regularizer / penalty function to control overfitting Ridge Regression: R(A) = trace(A T A) n A i=1 A

  47. Synthetic Spiral Walk Test Sequence

  48. Spiral Walk Test Sequence Mostly OK, but ~15% “glitches” owing to pose ambiguities

  49. Glitches • Results are OK most of the time, but there are frequent “glitches” • regressor either chooses wrong case of an ambiguous pair, or remains undecided. • Problem is especially evident for heading angle the most visible pose variable.

  50. Real Image example

More Related