180 likes | 280 Views
Collaborative Mobile Visual Computing. Zoltan Kato University of Szeged. „ Infocommunication technologies and the society of future (FuturICT.hu) ” TÁMOP-4.2.2.C-11/1/KONV-2012-0013. Future: Collaborative sensing. (video) cameras became standard on mobile phones
E N D
Collaborative MobileVisual Computing Zoltan Kato University of Szeged „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Future: Collaborative sensing • (video) cameras became standard on mobile phones • Almost everybody is equiped with a camera • Collaborative sensing (ad-hoc mobile network of cameras) • Collection of still images and video • Mobile phone’s computing power is quickly increasing (GPU is also becoming standard) • Panorama stitching • 3D reconstruction • Wide range tracking (e.g. everywhere in a city) • … „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Collaborative sensing applications • Emergency situations (e.g. Great East Japan Earthquake 2011: 125,000 buildings damaged or destroyed) • Fast environment mapping is critical • Image based navigation, look at distant places • Rendering synthetic views/videos of e.g. sport events (always see the actual event from the best point of view) • Security: detection of unusual events, wide range visual search of a suspected car/person • etc… „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Motivation • Mobile • Smartphone explosion! • Sensors on-board • Camera module + position, orientation, acceleration, … • Network connection • Wifi and/or mobile internet communication • Capable and still increasing computing power • Billions of potential users! • Collaborative • Near synchronous imaging • Decentralized, ad-hoc camera network • Computation is done by the peers • Data is shared among them by request „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Problem Statement • High-level collaborative tasks • 3D scene reconstruction • Synthetic view generation • Panorama generation • Fundamental algorithmic problems • Correspondence • Detect corresponding objects in the images • Ad-hoc camera network calibration • How the 3D scene is projected to a plane? • Which cameras are close to each other? • Which cameras have a common view? • Reconstructing the third dimension • Each camera provides a 2D images • Fuse the visual content of several cameras to produce a3D image • Communication • Infrastructure for peer-to-peer data exchange • Distributed algorithm design „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Image database • Scene types • Large flat regions • Building facades • Street view • Planes with different orientations • Landscape images • Object far away • Indoor scenes • Image acquisition • 4-5 mobile devices with different cameras • Smartphones and tablets • VGA and 2 megapixel images • Photos + sensor information • position, orientation • Goal • cca. 5 photos per scene of cca. 50 different scenes „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Correspondence • Problem • Dense correspondence: find a pair for each pixel in the other images • Sparse correspondence: find the occurrence of detected objects (points, regions) in other images of the same scene • Wide-baseline • Possible solutions • Keypoint detection • Region detection • Outlier filtering is important! „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Keypoint-based Correspondence • Keypoint detection • Detect pixels in the images with unique neighborhood properties • State of the art methods from literature: SIFT, SURF, GFTT, MSER, STAR • Provided by OpenCV software package • Descriptors • Describes the keypoint neighborhood • Feature vector of 64 or 128 dimensions • State of the art methods from literature: SIFT, SURF • Pairing • Based on the distance between descriptor feature vectors • FLANN algorithm • Outlier detection • Remove invalid pairs • RANSAC algorithm with fundamental matrix hypothesis and reprojection error „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Preliminary Results • Today’s high-end mobile devices can solve the correspondence problem for VGA and 2 megapixel image sizes in acceptable time • Repetitive patterns generate outliers (frequent in urban scenes)… „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Region-based Correspondences • Advantages over keypoints • Keypoint detection can be problematic due to poor imaging hardware and lossy JPEG image compression • Method of choice • MSER (Maximally Stable Extremal Regions) • Appearance is consistent with thetransformation • Shape must be covariant to object position • Problems to solve • Finding corresponding regions • Using computed plane normals • Region merging and rejection algorithm • Application • Patch-based 3D reconstruction „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Calibration & Vision Graph Construction • Our solution to the problems • The construction of the Vision Graph of the network from point correspondences and sensor information • The relative pose estimated with respect to a planar structure containing a low-rank texture (e.g. flats, windows, brick walls, etc.) • Given a set of mobile cameras, our task is to determine the locations of each camera with respect to the 3D scene using visual information and sensor datas. • The main problems • Determine which cameras are seeing a common view • Estimate the pose of these cameras „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Vision graph construction • Taking images with custom Android app • Image + sensor data (location, orientation, FOV) • Sensor placement based on • Location data • Orientation and FOV data • Constructing graphs G(V,E) from sensor data • V: sensor locations • E: connected if 3D sensor views have some overlap 1. 3. 2. „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Placing images based on sensor data and graph information • Using a ~300m radius • Filtering images of the same group based on content • Extracting interest points (SURF) • Extracting local image features around interest points • LBP, texture, edge histograms • Filtering interest points based in local feature (dis-)similarity • Checking for existing correspondences between such images • Reason: possible content occlusions in images of the same area (we do not have map/street information) • Goal: image placement based on above location and filtering information 4. 6. „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Camera Pose Estimation • Calculate the relative pose within the network • Choose an arbitrary main camera.The main camera estimates the relative translation and rotation to its neighbors. The relative pose can be easily extracted from the essential matrices. • The necessary point correspondences determined from extracted SIFT or SURF features. • Relate the camera network to a planar surface • The main camera determines the relative pose to an extracted patch of a low-rank texture using the TILT (Transform Invariant Low-rank Textures) algorithm. • The algorithm estimates the best planar homography, which aligns the extracted pattern to be low-rank. • The relative position and orientation factorized from the estimated planar homography. 2. „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Calibrating the whole network w.r.t. the 3D world • The camera network pose and the low-rank homography are usually not in the same scale, thus we have to determine a relative scale to achieve a consistent calibration of the network. • Assuming that at least another camera sees the same pattern, this problem can be easily solved by a classical mutual information based registration algorithm. • This algorithm runs on each mobile of the network, the final scale will be the median of the estimated scales. „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Patch-based 3D reconstruction • Given a pair of corresponding regionsdetermine the normal & depth of the 3D surface • The usual way to determine normals based on cross product computing (using surface points reconstructed beforehand) • Our method requires the knowledge of the affine transformation between the images of a surface patch • The proposed method is compatible with region based correspondencecomputation(e.g. MSER) • Affine transformation can be computed without establishing point-point correspondences „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Properties • The proposed method • Has closed-formexpression for normalvectors • Serves exact solution for any calibrated projections (i.e. isn’trestricted only toperspective cameras) • Good approximate solutions for any (smooth) surfaces • The precalculated normal can be used for distance calculation (closed form solution existsfor planar patches observed with perspective cameras) • Limitations: the analysis showsthat it can’t beusedin the following cases • If the camera centers and the observed region are on thesame line (i.e. the transformation is scaling) • Forobjects’ contour points (where the normal vector perpendicular to the reprojected ray tangent to the object) • These cases can easily be determined in algebraic manner „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013
Team members Jozsef Molnar, PhDRui Huang, PhDLevente Kovacs, PhDAttila Tanacs, PhDAtulRaiZsolt SantaEndreJuhasz This work was partially supported by the European Union and the European Social Fund through project FuturICT.hu (grant no.: TAMOP-4.2.2.C-11/1/KONV-2012-0013). „Infocommunication technologies and the society of future (FuturICT.hu)” TÁMOP-4.2.2.C-11/1/KONV-2012-0013