• 230 likes • 392 Views
Image Mosaicing with Motion Segmentation from Video. Augusto Roman, Taly Gilat EE392J Final Project 03/20/01. Project Goal. To create a single panoramic image from a sequence of video frames Align frames using motion estimation. The Wang Adelson Algorithm.
E N D
Image Mosaicing with Motion Segmentation from Video Augusto Roman, Taly Gilat EE392J Final Project 03/20/01
Project Goal • To create a single panoramic image from a sequence of video frames • Align frames using motion estimation
The Wang Adelson Algorithm • Dense MV is gold standard estimate for each pixel • Estimate region motion • Segment regions based on motions • Iterate
Dense Motion Estimation • Need a motion vector for every pixel. • Full search block matching is far too slow (more than 2 hours per frame pair!) • Phase Correlation used to estimate motion • Results in a few possible motions for each block. • For each pixel in the center of the block, test with each of the possible motions and choose the motion with the smallest MSE. • Shift block by small amount and compute new phase correlation and repeat. • Much faster! ~30 sec per frame pair!
Phase Correlation • Given a block B1,B2 from each image • Compute 2D FFT of each • Compute cross-power spectrum • Normalized value of: F{B1} F{B2}* • Take IFFT to get Phase Correlation Function • This is very similar to the correlation function between the two blocks
Motion Vectors vs Motions Hypothesis • There are a number of different motion hypotheses available. • In theory, each of these hypothesis corresponds to a distinct motion in the video. • Each pixel is assigned to the motion hypothesis that most closely approximates its motion vector. • This segments the frame into distinct regions, one for each motion hypothesis.
Motion Hypothesis Generation • For each region want motion hypothesis that best represents all pixel motions in that region. • Least squares fit to find best affine motion parameter in a region • First iteration initialized with blocks
Motion Hypothesis Refinement • K-means used to cluster motion hypotheses • K unknown • Empty clusters removed • Large clusters split to maintain minimum k value
Region Segmentation • For each pixel compare hypotheses to dense MV • Find closest hypothesis • Group all pixels represented by a motion hypothesis into a region • Pixels with large error unassigned • Hypotheses without membership removed
Region Adjustment • Region Splitter • Assumes areas with same motion are connected • Disconnected areas within a region are split into separate regions • Increases number of hypotheses for k-means • Region Filter • Small regions give poor motion estimates • Remove all regions with area below threshold • Disconnected objects with same motion will be merged at next segmentation step
Segmentation Results Frame 1 Frame 2
Segmentation Results Initial Segmentation Iteration 1 Iteration 2 Iteration 3 Iteration 5 Iteration 7
The Whole Enchilada • The dense motion estimate, region segmentation and motion estimation performed for all frames pairs. • For first pair, segmentation initialized to blocks and k-means initialized to lattice in 6D affine space. • Subsequent frame pairs initialized with final segmentation and motion hypotheses from previous frame pair.
Layer Synthesis • Motion estimates relate each frame only to the previous frame • Frames are projected onto first video frame • Cumulative projection kept in 3x3 transformation matrix • Layers are not necessarily ordered similarly between frames • Assumed largest layer is background • Median taken of all values projected to each pixel in final image
Implementation Difficulties • Parameter tweaking • Phase Correlation • Attempted to focus on textured areas of image • Initial test videos lacked texture • Registering layers across frames • Cumulative motion error • Resolution loss in final panoramic
Conclusions • Nice panoramas • Moving objects removed • Future improvements • Sub-pixel dense motion estimation • More accurate segmentation • Region registration across frame