390 likes | 526 Views
Tracking with Local Spatio -Temporal Motion Patterns in Extremely Crowded Scenes. Present by 陳群元. outline. Introduction Previous work Predicting motion patterns Spatio -temporal transition distribution Discerning pedestrians Experimental results conclusion. introduction.
E N D
Tracking with Local Spatio-Temporal Motion Patternsin Extremely Crowded Scenes Present by 陳群元
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
introduction • Tracking individuals in extremely crowded scenes is a challenging task, • we predict the local spatio-temporal motion patterns that describe the pedestrian movement at each space-time location in the video. • we robustly model the individual’s unique motion and appearance to discern them from surrounding pedestrians.
Previous work • Previous work track features and associate similar trajectories to detect individual moving entities within crowded scenes. • We encode many possible motions in the HMM, and derive a full distribution of the motion at each spatio-temporal location in the video.
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
Markov Model 0.6 • An example : a 3-state Markov Chain λ • State 1 generates symbol A only, State 2 generates symbol B only, and State 3 generates symbol C only • Given a sequence of observed symbols O={CABBCABC}, the only one corresponding state sequence is {S3S1S2S2S3S1S2S3}, and the corresponding probability isP(O|λ)=P(q0=S3) P(S1|S3)P(S2|S1)P(S2|S2)P(S3|S2)P(S1|S3)P(S2|S1)P(S3|S2) =0.10.30.30.70.20.30.30.2=0.00002268 s1 A 0.3 0.3 0.3 0.1 0.2 0.7 0.7 s2 s3 0.2 C B
0.6 s1 {A:.3,B:.2,C:.5} 0.3 0.3 0.3 0.1 0.2 0.7 0.7 s2 s3 0.2 {A:.7,B:.1,C:.2} {A:.3,B:.6,C:.1} Hidden Markov Model • An example : a 3-state discrete HMM λ • Given a sequence of observations O={ABC}, there are 27 possible corresponding state sequences, and therefore the corresponding probability is
Spatio-temporal gradient • f(Pos) = (f(Pos+1) -f(Pos) + f(Pos) -f(Pos-1))/2 = f(Pos+1)-f(Pos-1)/2; • For each pixel i in cuboid • I is intensity
spatio-temporal motion pattern • the local spatio-temporal motion pattern • represented by a 3D Gaussian of spatio-temporal gradients
Training HMM • The hidden states of the HMM are represented by a set of motion patterns • The probabilityof an observed motion pattern given a hidden state s is
Kullback–Leiblerdivergence • Kullback–Leiblerdivergence is a non-symmetric measure of the difference between two probability distributions P and Q.
predictive distribution • After training a collection of HMMs on a video of typicalcrowd motion, we predict the motion pattern at eachspace-time location that contains the tracked subject. • where S is the set of hidden states, w(s) is defined by
Vector of scaled message Reference :A Tutorial On Hidden Markov Models andSelected Applications in Speech Recognition.
predicted localspatio-temporal motion pattern • a weighted sum of the 3D Gaussian distributionsassociated with the HMM’s hidden states
The centroid we are interested in is a multivariate normal density that minimizes the total distortions. Formally, a centroid c is defined as, Reference: On Divergence Based Clustering of Normal Distributions and Its Application to HMM Adaptation
Predicted motion pattern • where and are the mean and covariance of the hidden state s, respectively.
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
we use the gradient information to estimate the opticalflow within each specific sub-volume and track the target ina Bayesian framework. • Bayesian tracking can be formulated as maximizingthe posterior distribution of the state xt of the target at timet given available measurements z1:t= {zi; i = 1 : : : t} by • zt is the image at time t, p (xt|xt-1) is the transition distribution, and p (zt|xt) is the likelihood. • state vector x t as the width, height, and 2D location of the target within the image.
we focus on the target’smovement between frames and use a 2nd-degree autoregressivemodel for the transition distribution of thetarget’s width and height. • Ideally, the state transition distribution p (xt|xt-1) directly reflects the two-dimensional motion of the target between frames t -1 and t. • where is the 2D optical flow vector, and is the covariance matrix.
optical flow • Assuming the movement to be small, the image constraint at I(x,y,t) with Taylor series can be developed to get • H.O.T
The predicted motion pattern is defined by a mean gradient vector and a covariance matrix • The motion information encoded in the spatio-temporal gradients can be expressed in the form of the structure tensor matrix • The optical flow can then be estimated from the structure tensor by solving • where w = [u; v; z]T is the 3D optical flow
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
Typical models of the likelihood distribution p (z t |x t ) • where is the variance, is a distance measure, and Z is a normalization term. • difference between aregion R (defined by state x t ) of the observed image z t andthe template. • We assume pedestrians exhibit consistency in their appearance and their motion, and model them in a joint likelihood by • where pA and pM are the appearance and motion likelihoods
Update motion template • After tracking in frame t, we update each pixel i in the motion template by • where is the motion template at time t, • Is the region of spatio-temporal gradient defined by the tracking result (i.e., the expected value of the posterior) • is the learning rate.
update this error measurement • The error at pixel i and time t becomes • ti and ri are the normalized gradient vectors of the motion template and the tracking result at time t • To reduce the contributions of frequently changing pixels to the computation of the motion likelihood, we weigh each pixel in the likelihood’s distance measure. • where Z is a normalization term such that
distance measure • The distance measure of the motion likelihood distribution becomes
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
The training video for the concourse scene contains 300 frames (about 10 seconds of video), • the video for ticket gate scene contains 350 frames. • We set the cuboid size to 10*10*10 for both scenes. • The learning rate , appearance variance , and motion variance are 0.05.
outline • Introduction • Previous work • Predicting motion patterns • Spatio-temporal transition distribution • Discerning pedestrians • Experimental results • conclusion
Conclusion • In this paper, we derived a novel probabilistic method that exploits the inherent spatially and temporally varying structured pattern of a crowd’s motion to track individuals in extremely crowded scenes.
The end • Thank you