410 likes | 720 Views
Building Rome in a Day. Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft Research. Outline. 1. Introduction 2. System Design 3. Result 4. Conclusion. Introduction.
E N D
Building Rome in a Day Sameer Agarwal1 Noah Snavely2 Ian Simon1 Steven M. Seitz1 Richard Szeliski3 1University of Washington 2Cornell University 3Microsoft Research
Outline • 1. Introduction • 2. System Design • 3. Result • 4. Conclusion
Introduction • Entering the search term “Rome” on flickrreturns more than two million photographs. • 3D reconstruction • in Google Earth and Microsoft’s Virtual Earth
Outline • 1. Introduction • 2. System Design • 1.pre-processing & feature extraction • 2.matching • 3.geometric estimation • 3. Result • 4. Conclusion
Scene reconstruction • Automatically estimate • position, orientation, and focal length of cameras • 3D positions of feature points
Feature detection Detect features using SIFT [Lowe, IJCV 2004]
Feature detection Detect features using SIFT [Lowe, IJCV 2004]
Feature detection Detect features using SIFT [Lowe, IJCV 2004]
Feature matching Match features between each pair of images approximate nearest neighbor matching
Feature matching Refine matching using RANSAC [Fischler & Bolles 1987] to estimate fundamental matrices between pairs
Structure from motion structure for motion: automatic recovery of camera motion and scene structure from two or more images. It is a self calibration technique and called automatic camera tracking or match moving. Unknown camera viewpoints
p4 minimize p1 p3 f(R,T,P) p2 p5 p7 p6 Structure from motion rotations R, positions t, and 3D point locations P that minimize sum of squared reprojection errors f Camera 1 Camera 3 R1,t1 R3,t3 Camera 2 R2,t2
Vocabulary trees (Nister & Stewenius, 2006) • Computational efficiency • k-means tree is used to quantize the feature descriptors
TF-IDF(term frequency–inverse document frequency) • Consider a document containing 100 words wherein the word cow appears 3 times. • (TF) = (3 / 100) = 0.03. • Assume we have 10 million documents and cow appears in one thousand of these. • (IDF) = log(10 000 000 / 1 000) = 4.
TF-IDF score is the product of these quantities: 0.03 × 4 = 0.12 • The word is important if the TF-IDF score is large 某一特定文件內的高詞語頻率,以及該詞語在整個文件集合中的低文件頻率,可以產生出高權重的TF-IDF。因此,TF-IDF傾向於過濾掉常見的詞語,保留重要的詞語。
Query expansion • Large-scale image matching • Better approach: use bag-of-words technique to find likely matches • For each image, find the top M scoring other images, do detailed SIFT matching with those
Outline • 1. Introduction • 2. System Design • 3. Result • 4. Conclusion
Building Rome in a Day St. Peter’s Basilica Colosseum Trevi Fountain Rome, Italy. Reconstructed 150,000 in 21 hours on 496 machines
Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352
Dubrovnik Dubrovnik, Croatia. 4,619 images (out of an initial 57,845). Total reconstruction time: 23 hours Number of cores: 352
San Marco Square San Marco Square and environs, Venice. 14,079 photos, out of an initial 250,000. Total reconstruction time: 3 days. Number of cores: 496.
Outline • 1. Introduction • 2. System Design • 3. Result • 4. Conclusion
Conclusion • Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores. Large-scale image matching 3D models http://grail.cs.washington.edu/rome/ http://phototour.cs.washington.edu/applet/index.html