640 likes | 846 Views
Edge Templates, Tracking. CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington. Contours. A contour is a curve/line (typically not straight) that delineates the boundary of a region, or between regions.
E N D
Edge Templates, Tracking CSE 4310 – Computer Vision Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Contours • A contour is a curve/line (typically not straight) that delineates the boundary of a region, or between regions.
Shapes Without Texture • Letters/numbers. • Contours. • Edge templates.
Detecting Shapes Without Texture • Normalized correlation does not work well. • Slight misalignments have a great impact on the correlation score. star3 combined star1
Chamfer Distance • For each edge pixel in star1: • How far is it from the nearest edge pixel in star3? • The average of all those answers is the directed chamfer distance from star1 to star3.
Chamfer Distance • For each edge pixel in star1: • How far is it from the nearest edge pixel in star3? • The average of all those answers is the directed chamfer distance from star1 to star3.
Chamfer Distance • For each edge pixel in star1: • How far is it from the nearest edge pixel in star3? • The average of all those answers is the directed chamfer distance from star1 to star3.
Chamfer Distance • For each edge pixel in star1: • How far is it from the nearest edge pixel in star3? • The average of all those answers is the directed chamfer distance from star1 to star3.
Chamfer Distance • For each edge pixel in star3: • How far is it from the nearest edge pixel in star1? • The average of all those answers is the directed chamfer distance from star3 to star1.
Input: two sets of points. red, green. c(red, green): Average distance from each red point to nearest green point. Directed Chamfer Distance
Input: two sets of points. red, green. c(red, green): Average distance from each red point to nearest green point. c(green, red): Average distance from each green point to nearest red point. Directed Chamfer Distance
Input: two sets of points. red, green. c(red, green): Average distance from each red point to nearest green point. c(green, red): Average distance from each red point to nearest green point. Chamfer Distance Chamfer distance: C(red, green) = c(red, green) + c(green, red)
Chamfer Distance • On two stars: • 31 pixels are nonzero in both images. • On star and crescent: • 33 pixels are nonzero in both images. • Correlation scores can be misleading.
Chamfer Distance • Chamfer distance is much smaller between the two stars than between the star and the crescent.
Detecting Hands Template. Input image • Problem: hands are highly deformable. • Normalized correlation does not work as well. • Alternative: use edges.
Detecting Hands template window • Compute chamfer distance, at all windows, all scales, with template. • Which version? Directed or undirected? • We want small distance with correct window, large distance with incorrect windows.
Direction Matters template window • Chamfer distance from window to template: problems?
Direction Matters template window • Chamfer distance from window to template: problems? • Clutter (edges not belonging to the hand) cause the distance to be high.
Direction Matters template window • Chamfer distance from template to window: problems?
Direction Matters template window • Chamfer distance from template to window: problems? • What happens when comparing to a window with lots of edges?
Direction Matters template window • Chamfer distance from template to window: problems? • What happens when comparing to a window with lots of edges? Score is low.
Choice of Direction template window • For detection, we compute chamfer distance from template to window. • Being robust to clutter is a big plus, ensures the correct results will be included. • Incorrect detections can be discarded with additional checks.
Computing the Chamfer Distance • Compute chamfer distance, at all windows, all scales, with template. • Can be very time consuming.
Distance Transform Edge image e1 Distance transform d1 • For every pixel, compute distance to nearest edge pixel.d1 = bwdist(e1);
Distance Transform t1 Edge image e1 Distance transform d1 • If template t1 is of size (r, c): • Chamfer distance with a window (i:(i+r-1), (j:(j+c-1)) of e1 can be written as:
Distance Transform t1 Edge image e1 Distance transform d1 • If template t1 is of size (r, c): • Chamfer distance with a window (i:(i+r-1), (j:(j+c-1)) of e1 can be written as: • Computing image of chamfer scores for one scale: window = d1(i:(i+r-1), j:(j+c-1)); sum(sum(t1 .* window))
Distance Transform t1 Edge image e1 Distance transform d1 • Computing image of chamfer scores for one scale s • How long does that take? Can it be more efficient? resized = imresize(image, s, 'bilinear'); resized_edges = canny(resized, 7); resized_dt = bwdist(resized_edges); chamfer_scores = imfilter(resized_dt, t1, 'symmetric'); figure(3); imshow(chamfer_scores, []);
Improving Efficiency t1 Edge image e1 Distance transform d1 • Which parts of the template contribute to the score of each window?
Improving Efficiency t1 Edge image e1 Distance transform d1 • Which parts of the template contribute to the score of each window? • Just the nonzero parts. • How can we use that?
Improving Efficiency t1 Edge image e1 Distance transform d1 • Which parts of the template contribute to the score of each window? Just the nonzero parts. • How can we use that? • Compute a list of non-zero pixels in the template. • Consider only those pixels when computing the sum for each window.
Results for Single Scale Search • What is causing the false result?
Results for Single Scale Search • What is causing the false result? • Window with lots of edges. • How can we refine these results?
Results for Single Scale Search • What is causing the false result? • Window with lots of edges. • How can we refine these results? • Skin color, or background subtraction
What Is Tracking? • We are given: • the state of one or more objects in the previous frame. • We want to estimate: • the state of those objects in the current frame.
What Is Tracking? • We are given: • the state of one or more objects in the previous frame. • We want to estimate: • the state of those objects in the current frame. • “State” can include: • Bounding box. This will be the default case in our class, unless we specify otherwise. • Velocity(2D vector, motion along y axis per frame and motion along x axis per frame). • Precise pixel-by-pixel shape. • Orientation, scale, 3D orientation, 3D position, …
Why Do We Care About Tracking? • Improves speed. • We do not have to run detection at all locations, all scales, all orientations.
Why Do We Care About Tracking? • Improves speed. • We do not have to run detection at all locations, all scales, all orientations. • Allows us to establish correspondences across frames. • Provides representations such as “the person moved left”, as opposed to “there is a person at (i1, j1) at frame 1, and there is a person at (i2, j2) at frame 2”. • Needed in order to recognize gestures, actions, activity.
Example Applications • Activity recognition/surveillance. • Figure out if people are coming out of a car, or loading a truck. • Gesture recognition. • Respond to commands given via gestures. • Recognize sign language. • Traffic monitoring. • Figure out if any car is approaching a traffic light. • Figure out if a street/highway is congested. • In all these cases, we must track objects across multiple frames.
Estimating Motion of a Block • What is a block? • A rectangular region in the image. • In other words, an image window that is specified by a bounding box. • Given a block at frame t, how can we figure out where the block moved to at frame t+1?
Estimating Motion of a Block • What is a block? • A rectangular region in the image. • In other words, an image window that is specified by a bounding box. • Given a block at frame t, how can we figure out where the block moved to at frame t+1? • Simplest method: normalized correlation.
Main Loop for Block Tracking Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • What is missing to make this framework fully automatic?
Main Loop for Block Tracking Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • What is missing to make this framework fully automatic? • Detection/initialization: find the object, obtain an initial object description.
Main Loop for Block Tracking Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • Tracking methods ignore the initialization problem. • Any detection method can be used to address that problem.
Source of Efficiency Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • Why exactly is tracking more efficient than detection? In what lines of the pseudocode is efficiency improved?
Source of Efficiency Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • Why exactly is tracking more efficient than detection? In what lines of the pseudocode is efficiency improved? • Line 2. We search fewer locations/scales/orientations.
Updating Object Description Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • How can we update the block descriptionin Step 3? • The simplest approach is to simply store the image subwindow corresponding to the bounding box that was found in step 2.
Drifting Input: block extracted from previous frame. 1. read current frame. 2. find best match of block in current frame (here we can search the entire image or just a region close to the location in the previous frame). 3. (optional) update description of block to match the appearance in the current frame. 4. advance frame counter. 5. goto 1. • The estimate can be off by a pixel or so at each frame. • Sometimes larger errors occur. • If we update the appearance, errors can accumulate.
Changing Appearance • Sometimes the appearance of an object changes from frame to frame. • Example: left foot and right foot in walkstraight sequence. • There is a fundamental dilemma between avoid drift and updating the appearance of the object. • If we do not update the object description, at some point the description is not good enough. • If we update the object description at each frame, the slightest tracking error means that we update the description using the wrong bounding box, which can lead to more tracking errors in subsequent frames.