610 likes | 979 Views
Robust Object Tracking by Hierarchical Association of Detection Responses. Present by fakewen. outline. Introduction Hierarchical Association of Detection Responses Low-Level Association Middle-Level Association High-Level Association Experimental Results conclusion.
E N D
Robust Object Tracking by Hierarchical Association ofDetection Responses Present by fakewen
outline • Introduction • Hierarchical Association of Detection Responses • Low-Level Association • Middle-Level Association • High-Level Association • Experimental Results • conclusion
a detection-based three-level hierarchical association approach to robustly track multiple objects in crowded environments from a single camera.
low level, reliable tracklets. • middle level, these tracklets are further associated to form longer tracklets based on more complex affinity measures. • high level, entries, exits and scene occluders are estimated using the already computed tracklets, which are used to refine the final trajectories.
outline • Introduction • Hierarchical Association of Detection Responses • Low-Level Association • Middle-Level Association • High-Level Association • Experimental Results • conclusion
notation • detection response • position • size • occurrence frame index • color histogram
notation • object trajectory/tracklet • object trajectory/trackletset • association results of the low level, the middle level and the high level respectively
outline • Introduction • Hierarchical Association of Detection Responses • Low-Level Association • Middle-Level Association • High-Level Association • Experimental Results • conclusion
Low-Level Association • : set of all detection responses
outline • Introduction • Hierarchical Association of Detection Responses • Low-Level Association • Middle-Level Association • High-Level Association • Experimental Results • conclusion
Middle-Level Association • The middle level association is an iterative process: each round takes the trackletsgenerated in the previous round as the input and does further association
First round • input • trackletassociation • lkis the number of tracklets in Sk. • corresponding trajectory of Sk • trackletassociation set.
Hungarian Algorithm(1) • Arrange your information in a matrix with the "people" on the left and the "activity" along the top, with the "cost" for each pair in the middle.
Hungarian Algorithm(2) • Ensure that the matrix is square by the addition of dummy rows/columns if necessary. Conventionally, each element in the dummy row/column is the same as the largest number in the matrix.
Hungarian Algorithm(3) • Reduce the rows by subtracting the minimum value of each row from that row.
Hungarian Algorithm(4) • Reduce the columns by subtracting the minimum value of each column from that column.
Hungarian Algorithm(5) • Cover the zero elements with the minimum number of lines it is possible to cover them with. (If the number of lines is equal to the number of rows then go to step 9)
Hungarian Algorithm(6) • Add the minimum uncovered element to every covered element. If an element is covered twice, add the minimum element to it twice.
Hungarian Algorithm(7) • Subtract the minimum element from every element in the matrix.
Hungarian Algorithm(8) • Cover the zero elements again. If the number of lines covering the zero elements is not equal to the number of rows, return to step 6.
Hungarian Algorithm(9) • Select a matching by choosing a set of zeros so that each row or column has only one selected.
Hungarian Algorithm(10) • Apply the matching to the original matrix, disregarding dummy rows. This shows who should do which activity, and adding the costs will give the total minimum cost.
Implementation Details • for each input tracklet, a KalmanFilter is used to refine the positions and sizes of its detection responses and estimate their velocities. • refined color histogram by a RANSAC method
appearance affinity • motion affinity • frame gap between the tail of and the head of
temporal affinity • number of frames in which the tracked object is occluded by other objects • number of frames in which the tracked object is visible but missed by the detector
Initialization and termination probabilities of each tracklet • In the following rounds, trackletswith longer frame gaps are associated by progressively increasing .
outline • Introduction • Hierarchical Association of Detection Responses • Low-Level Association • Middle-Level Association • High-Level Association • Experimental Results • conclusion
High-Level Association • During the middle-level association, all tracklets have the same initialization/termination probabilities as there is no prior knowledge about entries and exits at that stage.
High-Level Association • At the high level, an entry map and an exit map are inferred from T M, which are used to specify the initialization/termination of each tracklet in the scene. • a scene occluder map is also inferred from T Mto revise the link probabilities.
The three maps,as hidden variables, constitute a scene structure model in the high-level association. • the scene structuremodel is estimated in the ground plane coordinates for better accuracy. • solve thiscoupled scene-estimation tracklet-association problem by an EM-like algorithm.
E-step • Bayesian inference • indicator function for entries, exits or scene occluders(q = {en; ex; oc}) at position x on the ground plane.
The complete version of a trackletTkthat includes missed detections, ,is obtained by filling the gaps between inconsecutive detection responses with interpolated ones.
entry/exit map • estimated position and velocity at the head/tail of by the KalmanFilter • short time span for predicting the positions of the entry and the exit.
entry/exit map • where is the position of response .
occluder map • subset of the complete tracklet
M-step • In the M-step, the tracklets in T Mare further associated to form even longer ones. • based on the scene structure model obtained from the E-step, the initialization and termination probabilities
M-step • the frame number of missed detection part between the head (or tail) of Tkto the nearest entry (or exit):
This is used to reivse the temporal affinity in Equ.13 by considering occlusions by scene • occluders when counting the occluded frame number .