Accurate Camera Calibration from Multi-view Stereo and Bundle Adjustment

Accurate Camera Calibration from Multi-view Stereo and Bundle Adjustment Yasutaka Furukawa, University of Illinois at Urbana-Champaign, USA Jean Ponce, EcoleNormaleSuperieure, Paris, France Reporter：Jheng-You Lin

Outline • Introduction • Imaging Model • Algorithm • Experimental Results

Introduction • Multi-view stereo(MVS) • Bundle Adjustment • Calibration problem

Introduction • Multi-view stereo(MVS) • PMVS(Patched-based Multi-view Stereo) [Furukawa and Ponce, 2007] • Bundle Adjustment • Calibration problem Images → patches (oriented points set and visible information)

Introduction • Multi-view stereo(MVS) • Bundle Adjustment • Fine tune the positions of the scene points and the entire set of camera parameters • Standard Bundle Adjustment (SBA) • Calibration problem

Introduction • Multi-view stereo(MVS) • Bundle Adjustment • Calibration problem • Chart-based calibration (CBC) • Tasi, 1987 and Bouguet 2008 • Structure from motion (SFM) • Ineffective for poorly texture and widely separated images • Selection of feature correspondence (SFC)

Introduction • Contribution • Better feature localization (considered surface geometry estimations) • Better coverage and dense feature correspondences (by visible information and surface geometry) • Can handle weak texture and accumulate errors (two difficult issue for SFM/BA)

Imaging Model • Standard perspective projection model

Imaging Model • Standard perspective projection model Distortion is supposed to be negligible.

Imaging Model • SBA (minimizing the sum of squared reprojection errors)

Algorithm Input: Cameras parameters { Kj,Rj,tj } and expected reprojection error Er. Output: Refined cameras parameters { Kj,Rj,tj } . Build image pyramids for all the images. Compute a level L to run PMVS: L ← max(0, floor(log2Er)). Repeat four times Run PMVS on level L of the pyramids to obtain patches {Pi } and their visibility information { Vi } . Initialize feature locations: Pij← F(Pi, { Kj,Rj,tj } ). Sub-sample feature correspondences. For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) For L∗ ← L down to 0 Use level L∗ of image pyramids to refine Pij : Pij ← argmax PijNCC(qi j,qi j0). Filter out features that have moved too much. Refine { Pi, Kj,Rj,tj } by a standard BA with {Pij} . Update Er by the mean and std of reprojection errors.

Algorithm Input: Cameras parameters { Kj,Rj,tj } and expected reprojection error Er. Output: Refined cameras parameters { Kj,Rj,tj } . Build image pyramids for all the images. Compute a level L to run PMVS: L ← max(0, floor(log2Er)). PMVS is robust to errors in camera parameters as long as the image resolution matches the corresponding reprojection errors. Repeat four times Run PMVS on level L of the pyramids to obtain patches {Pi } and their visibility information { Vi } . Initialize feature locations: Pij← F(Pi, { Kj,Rj,tj } ). Sub-sample feature correspondences. For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) For L∗ ← L down to 0 Use level L∗ of image pyramids to refine Pij: Pij ← argmaxPijNCC(qi j,qi j0).

Algorithm Input: Cameras parameters { Kj,Rj,tj } and expected reprojection error Er. Output: Refined cameras parameters { Kj,Rj,tj } . Build image pyramids for all the images. Compute a level L to run PMVS: L ← max(0, floor(log2Er)). Repeat four times Run PMVS on level L of the pyramids to obtain patches {Pi } and their visibility information { Vi } . Initialize feature locations: Pij← F(Pi, { Kj,Rj,tj } ). project the points Pi into the images(visible) to obtain an initial set of image correspondences. Pij = F(Pi, Cj). Sub-sample feature correspondences. For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) For L∗ ← L down to 0 Use level L∗ of image pyramids to refine Pij : Pij ← argmax PijNCC(qi j,qi j0).

Algorithm Input: Cameras parameters { Kj,Rj,tj } and expected reprojection error Er. Output: Refined cameras parameters { Kj,Rj,tj } . Build image pyramids for all the images. Compute a level L to run PMVS: L ← max(0, floor(log2Er)). Repeat four times Run PMVS on level L of the pyramids to obtain patches {Pi } and their visibility information { Vi } . Initialize feature locations: Pij← F(Pi, { Kj,Rj,tj } ). Sub-sample feature correspondences. divide each image into 10x10 blocks, each block randomly select at most ε features. For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) …

Algorithm For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) For L∗ ← L down to 0 Use level L∗ of image pyramids to refine Pij : Pij ← argmax PijNCC(qi j,qi j0).

Algorithm For each feature correspondence {Pij | j ∈ Vi } Identify a reference camera Cj0 in Vi with the minimum foreshortening factor. For each non-reference feature Pij( j ∈ Vi, j ≠ j0 ) For L∗ ← L down to 0 Use level L∗ of image pyramids to refine Pij : Pij ← argmax PijNCC(qi j,qi j0). a patch Qi is represented by a δxδ grid of 3D points.

Experimental Results - Datasets

Experimental Results - Datasets CBC (turn table) SFM Add noise Manual Above silhouettes are manually extracted using Photoshop except “dino”.

Experimental Results The proposed method is able to match features in many images even without salient textures. (use of surface geometry and visible information)

Experimental Results • It is impossible to give a full quantitative evaluation.

Experimental Results • It is impossible to give a full quantitative evaluation. • Mean and standard deviation of reprojection errors.

Experimental Results • It is impossible to give a full quantitative evaluation. • Epipolar geometry

Experimental Results • Running time in minutes in one iteration. • Dual Xeon 3.2GHz

Accurate Camera Calibration from Multi-view Stereo and Bundle Adjustment