190 likes | 343 Views
Learning an Attribute Dictionary for Human Action Classification. Qiang Qiu. Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ” Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011. Action Feature Representation. Shape. Motion. HOG.
E N D
Learning an Attribute Dictionary for Human Action Classification Qiang Qiu Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, ”Sparse Dictionary-based Representation and Recognition of Action Attributes”, ICCV 2011
Action Feature Representation Shape Motion HOG
Action Sparse Representation Sparse code Action Dictionary = 0.43 0.63 0 0 -0.33 0 -0.36 0 0 0 0 0.64 0.53 -0.40 0.35 0 0 0 = + 0.63 × - 0.33 × - 0.36 × 0.43 × = + 0.53× - 0.40 × +0.35 × 0.64×
K-SVD Sparse codes x1 x2 • K-SVD • Input: signals Y, dictionary size, sparisty T • Output: dictionary D, sparse codes X Input signals Dictionary d1 d2 d3 … y2 y1 = D Y 0.43 0.63 0 0 -0.33 0 -0.36 0 0 0 0 0.64 0.53 -0.40 0.35 0 0 0 X arg min |Y- DX|2 s.t. i , |xi|0 ≤ T D,X [1] M. Aharon and M. Elad and A. Bruckstein, K-SVD: An Algorithm for Designing Overcomplete Dictionries for Sparse Representation, IEEE Trans. on Signal Process, 2006
Objective Learn a Compact Discriminative and Dictionary.
Probabilistic Model for Sparse Representation • A Gaussian Process • Dictionary Class Distribution
More Views of Sparse Representation y1 y2 y3 y4 x1 x2 x3 x4 = xd1 l1 l2 l1 l2 0.43 0.63 0 0 -0.33 0 -0.36 0 0 0 0 0 0 0 0 0.64 0.53 -0.40 0.35 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.28 0.698 0.37 0.25 0 0 0 0 0 -0.42 0 0 0 0 0.42 0.47 0 0.32 d1 d2 d3 … l1 l2 l1 l2
x1 x2 x3 x4 xd1 d1 A Gaussian Process 0 0.53 0 0 0 0.64 0 0 0.63 0 0 0 0.43 0 0 0 -0.33 -0.40 0 -0.42 xd2 d2 d3 xd3 d4 xd4 y2 y4 y1 y3 xd5 d5 … … l1 l2 l1 l2 l1 l1 l2 l2 • A Gaussian Process • Covariance function entry: K(i,j) = cov(xdi, xdj) • P(Xd*|XD*) is a Gaussian with a closed-form conditional variance
x1 x2 x3 x4 xd1 d1 Dictionary Class Distribution 0 0.53 0 0 0 0.64 0 0 0.63 0 0 0 0.43 0 0 0 -0.33 -0.40 0 -0.42 xd2 d2 d3 xd3 d4 xd4 y2 y4 y1 y3 xd5 d5 … … l1 l2 l1 l2 l1 l1 l2 l2 • Dictionary Class Distribution • P(L|di), L [1, M] • aggregate |xdi| based on class labels to obtain a M sized vector • P(L=l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348 • P(L=l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37
Dictionary Learning Approaches • Maximization of Joint Entropy (ME) • Maximization of Mutual Information (MMI) • Unsupervised Learning (MMI-1) • Supervised Learning (MMI-2)
Maximization of Joint Entropy (ME) - Initialize dictionary using k-SVD Do = - Start with D* = • Untill |D*|=k, iteratively choose d* from Do\D*, • d* = arg max H(d|D*) d • where ME dictionary • A good approximation to ME criteria • arg max H(D) D
Maximization of Mutual Information for Unsupervised Learning (MMI-1) - Initialize dictionary using k-SVD Do = - Start with D* = • Untill |D*|=k, iteratively choose d* from Do\D*, MMI dictionary Coverage Diversity • A near-optimal approximation to MMI • arg max I(D; Do\D) • within (1-1/e) of the optimum • Closed form: D • d* = arg max H(d|D*) - H(d|Do\(D* d)) d
x1 x2 x3 x4 xd1 Revisit d1 -0.33 -0.40 0 -0.42 0 0.64 0 0 0.63 0 0 0 0.43 0 0 0 0 0.53 0 0 xd2 d2 d3 xd3 d4 xd4 y2 y4 y1 y3 xd5 d5 … … l1 l2 l1 l2 l1 l1 l2 l2 • Dictionary Class Distribution • P(L|di), L [1, M] • aggregate|xdi|based on class labels to obtain a M sized vector • P(l1|d5) = (0.33+0.40)/(0.33+0.40+0.42) = 0.6348 • P(l2|d5) = (0+0.42)/(0.33+0.40+0.42) = 0.37 • P(Ld) = P(L|d) • P(LD) = P(L|D) , where
Maximization of Mutual Information for Supervised Learning (MMI-2) - Initialize dictionary using k-SVD Do = - Start with D* = • Untill |D*|=k, iteratively choose d* from Do\D*, • d* = arg max [H(d|D*) - H(d|Do\(D* d))] • + λ[H(Ld|LD*) – H(Ld|LDo\(D* d))] d - MMI-1 is a special case of MMI-2 with λ=0.
Representation Consistency [1] [1] J. Liu and M. Shah, Learning Human Actions via Information Maximization, CVPR 2008
Recognition Accuracy The recognition accuracy using initial dictionary Do: (a) 0.23 (b) 0.42 (c) 0.71
Recognizing Realistic Actions • 150 broadcast sports videos. • 10 different actions. • Average recognition rate: 83:6% • Best reported result 86.6%