Analyzing Human Actions as Space-Time Shapes

Action as Space-Time Shapes Benny Yonovich Leon Ribinik

“Actions as Space-Time Shapes” • Recognize, detect and cluster human actions. Goal Approach • Represent actions as space-time shapes.

Motivation • Limitations in current methods: • Optical flow estimation is difficult. • Periodicity analysis is limited to cyclic actions. • Treating video sequence as a space-time volume is useful for analyzing actions. • Silhouettes contain detailed information about the shape of objects.

Space-Time Shapes • Induced by a concatenation of 2D silhouettes in the space-time volume. • Contain both spatial and dynamic information.

Concept Generalization of a method developed for the analysis of 2D shapes to deal with volumetric space-time shapes induced by human actions.

Algorithm Overview Input: Video sequence Extract the 2D silhouettes and build the space-time volume. Calculate shape descriptor by solving a Poisson equation. Use the solution to extract space-time shape features and global features measure. Classify, cluster and detect actions using the global features measure.

Extract the 2D silhouettes and build the space-time volume Video is simpler than image. Background subtraction.

Calculate shape descriptor • First approach: Medial axis distance transform. • Assign each internal pixel a value reflecting its minimum distance to the boundary contour. • Does not reflect global properties of a silhouette. • Article approach: Shape representation using the Poisson equation. • A measure that “senses” the boundaries and assigns each pixel a value reflecting its relative position.

Poisson equation where is the Laplace Operator, also denoted by • In three-dimensional Cartesian coordinates, the equation takes the form: Partial differential equation with broad utility in electrostatics, mechanical engineering and theoretical physics. In Euclidean space:

Shape representation using the Poisson equation [1] • Compute: with on the bounding surface. subject to Laplacian: • Artificial boundary condition (Neumann): • Solution method: geometric multigrid solver. An action and its space-time shape S. Random walk.

Let’s get some intuition – 2D Poisson equation • Monotonic decreasing: • Boundary: • Maximum point – center: Consider a conic: Special case – circle: Poisson equation solution:

Shape representation using the Poisson equation [2] • High values of U are attained in the central part of the shape.

Extract space-time shape features [1] • Space-Time Saliency • Distinguish between different human parts. • Emphasize torso:where: • Emphasize fast moving parts:

Extract space-time shape features [2] Emphasize fast moving parts:

Extract space-time shape features [3] • Space-Time Orientations • Estimate the local orientation and aspect ratio of different space-time parts. • Use the 3x3 Hessian H of U. • Hessian matrix - square matrix of second-order partial derivatives of a function

Extract space-time shape features [4] • - “stick” structure. • - “plate” structure. • - “ball” structure. Let be the eigenvalues of H. The first principal eigenvector corresponds to the shortest direction. The third principal eigenvector corresponds to the elongated direction.

Extract space-time shape features [5] • “Plateness”: • “Stickness”: • “Ballness” – redundant. • Deviation of dominant eigenvector from principal axes: • Orientation local features:

Extract space-time shape features [6] where: g(x,y,t) – characteristics function w(x,y,t) – one of the seven possible weighting functions Global Features - In order to represent an action with global features, a weighted moments measure is used:

Results and Experiments [1] • Action classification and Clustering: • 90 low-resolution (180x144, deinterlaced 50 fps) video sequences showing 9 different people, each performing 10 natural actions (“run”, “walk”, “jumping-jack” and more). • Silhouettes obtained by subtracting the median background from each of the sequences. • Poisson equation and seven features were computed.

Results and Experiments [2] • Action classification and Clustering: • Sliding window in time to extract 8 frames space-time cubes, with an overlap of 4 frames between the consecutive space-time cubes. • Centered each space-time cube around its space-time centroid. • Procedure does not involve any global video alignment! • Computed global features measure vector with moments.

Results and Experiments [3]

Action Classification • Leave-one-out procedure: remove the entire sequence from the database, keep other actions of the same person. • Compare each cube of the removed sequence to all the cubes in database. • Classify using the nearest neighbor procedure on global features measure (Euclidean distance). • Results: The algorithm misclassified 20 out of 923 space-cubes (2.17% error)!

Action Clustering • Spectral Clustering. • Results: 4 out of 90 misclassification (4.4% error). • A common spectral clustering algorithm was applied to 90 unlabeled action sequences, representing 10 different actions. • Distance between two sequences is a variant of he Median Hausdorff Distance:

Robustness [1] 10 test video sequences, people walking in various difficult scenarios. 10 additional sequences, each showing the “walk” action captured from a different viewpoint. Measured the Median Hausdorff Distance between each sequence and each action type, Classified each sequence as the smallest distance action.

Robustness [2] • Results: • First group sequences were classified correctly as the “walk” action, with relatively large difference between the first and second choices. • Second group sequences were classified correctly, viewpoints between 0 degree and 54 degree with relatively large difference. For Larger view points, a gradual deterioration occurs.

Action Detection [1] Ballet movie. Let’s find all the places with the male dancer performing a “cabriole pa”! Simple Euclidean distances threshold.

Action Detection [2] 111Kbps, wmv format 192x144x750 ballet movie Query:

Bibliography “Shape Representation and Classification Using the Poisson Equation”, L. Gorelick, M. Galun, E. Sharon, A. Brandt, and R. Basri. “On Spectral Clustering: Analysis and an Algorithm”, A. Ng, M. Jordan, and Y. Weiss. Lena Gorelick’s website and materials (http://www.wisdom.weizmann.ac.il/~yelenag). Wikipedia.

Analyzing Human Actions as Space-Time Shapes