270 likes | 399 Views
Recognizing Movement using Digital Image Processing. 1999. 4/22 Lee, Jeong-Cheol. Recognizing Movement using Motion Histograms. James W. Davis MIT Media Laboratory, 20 Ames Steet Cambridge, MA 02139 M.I.T Media Lab. Perceptual Computing Section Technical Report No. 487, April, 1998.
E N D
Recognizing Movement using Digital Image Processing 1999. 4/22 Lee, Jeong-Cheol
Recognizing Movement using Motion Histograms James W. Davis MIT Media Laboratory, 20 Ames Steet Cambridge, MA 02139 M.I.T Media Lab. Perceptual Computing Section Technical Report No. 487, April, 1998
About Author • PhD candidate in Media Arts and Sciences • Research interests • Representing and recognizing human and animal movement, natural computation, human-computer interaction, gesture recognition, and motion tracking • Publications • Appearance-based motion recognition of human action, 1996 • The representation and recognition of movement using temporal templates, 1997 • A robust human-silhouette extraction technique for interactive virtual environments, 1998 IEFAL Seminar
Contents • Introduction • Motion histograms • Motion history images • Gradient of motion • Histogram hierarchy • Recognition • Movement model • Matching • Variable length movements • Extensions for occlusions and distractor motions • Conclusion IEFAL Seminar
Introduction • Giving computers the ability to see is not an easy task. • Computer vision relates to recognizing 3D world through 2D image containing 3D information of the original world. • What images to use? • How to recognize the images? • This paper present a real-time computer vision approach to recognizing human movement. IEFAL Seminar
Binary Silhouette images(1/4) • Generation of the motion between frames by differencing successive binary silhouette images of the person. • Optical flow methods enable us to tell the magnitude or direction of the motion but are still too brittle for real imagery of people moving(due to noise, shadows, textures, and rate of movement) and generally computationally taxing. • Much of the clothing texture frequently signals unwanted motion, which can cause problems when using motion for recognition. • Image differencing continues to be a fairly robust method for cheaply locating the presence of motion but is unable to find the magnitude or direction of the motion. • The accumulation of image differences can yield directional motion information. IEFAL Seminar
Silhouette Binary Silhouette Image(2/4) Fig. 1. Conceptual drawing of blocking(eclipsing) infrared light from the camera to generate a silhouette of the person IEFAL Seminar
Binary Silhouette Image(3/4) IEFAL Seminar
Binary Silhouette Image(4/4) (a) Reference image (b) Input image (c) Binarized difference (d) Image morphology and region growing result Fig. 2. Image Processing IEFAL Seminar
Motion Energy Image(MEI) • Cumulative binary motion image IEFAL Seminar
Motion History Image(1/3) IEFAL Seminar
Motion History Image(2/3) IEFAL Seminar
Motion History Image(3/3) • MHI(x, y) = T if current motion at (x, y) 0 else if MHI(x, y) < (T- d) T : current time-stamp d : maximum time duration constant • By linearly normalizing the MHI time-stamps to values between 0 and 255, we can see that the more recently moving pixels are brighter than pixels belonging to older motion. • It can be said that the MHI “visually encodes” some motion information from the silhouette boundary. • We see the direction of movement clearly, but the magnitude is not as accessible. IEFAL Seminar
Gradient of Motion(1/2) • The local gradient orientations of the MHI directly show the direction of the silhouette boundary movement. • Therefore, we can convolve classic gradient masks with the MHI to extract the directional motion information. • Sobel gradient masks Fx = , Fy = = arctan(Fy/Fx) -1 -2 -1 0 0 0 1 2 1 -1 0 1 -2 0 2 -1 0 1 IEFAL Seminar
(a) (b) Gradient of Motion(2/2) • The boundaries of the MHI should not be used because non-moving (zero valued) pixels would be included in the gradient calculation. • The gradient directions show the approximate motion of the arms. IEFAL Seminar
Histogram Hierarchy(1/3) • A simple means of localizing the motion for recognition is to separately pay attention to different regions around the body. • One way of doing this is to divide the MHI into various regions(or windows) and then characterize each region by using a histogram of the motion orientations for.the region. • To generate the histograms for these window regions, we first quantize the gradient directions from the MHI into multiples of 30 degrees, resulting in histograms with 12 bins each(mainly for speed during recognition). • To handle changes in scale between different sized people(or in location of depth),we normalize each window by the sum of all the motion orientation pixels found in the gradient map. IEFAL Seminar
Histogram Hierarchy(2/3) • Overlapping windows for generating motion histograms • The dark areas represent areas which are included in that window’s histogram • The white areas are ignored. • The first window(top window) covers the entire motion region within the MHI. • The windows below cover progressively smaller regions of the motion IEFAL Seminar
Histogram Hierarchy(3/3) Normalized count Direction bins IEFAL Seminar
Recognition • The result of generating motion histograms for the body movement is a collection of nine, 12-bucket histograms for one body movement • To use this data for recognition, we concatenate the histograms into a single column vector(1081) and use the Euclidean distance between an input and stored model vector as a measure of closeness for recognition. IEFAL Seminar
Movement Model • To generate a model for a particular movement, 1. Gather the samples of movements 2. Calculate a set of mean motion histograms 3. Find the mean and variance of the Euclidean distance from training vectors to the newly generated mean vector. 4. Store the vector mean, distance mean, and distance variance of the training motion histograms as the model for that particular movement. IEFAL Seminar
Matching(1/2) • To match new input, 1. Calculate the Euclidean distance between the input motion histogram vector and the model mean vector for matching new input. 2. Calculate the Mahalanobis distance for the new vector’s Euclidean measure. 3. Repeat this process to seek a match against all the stored movement models(without much computational expense). IEFAL Seminar
Matching(2/2) IEFAL Seminar
Variable length movements(1/2) • Since a set of movements to recognize most likely contains gestures of different time lengths, we need a recognition mechanism that can handle variable length movements. • Recognition mechanism • Generate the MHI for the movement which have the longest time duration. • Lower the time duration constant for the MHI. • Look for a match with the new simulated MHI and its updated motion histograms. • Repeat until the time duration reaches the minimum values for the recognizable movements. IEFAL Seminar
Variable length movements(2/2) • This method progressively removes older motion from the MHIs in such a way that all possible MHIs that could have been generated within the movement duration window are in fact quickly created for examination of a match during the iteration phase. IEFAL Seminar
Extensions for occlusion and distractor motions • Retain the valid regions and remove the occluded regions • compute plausibility for the new input on a window by window basis • normalize not based on the overall window, but based on the largest, plausible window. IEFAL Seminar
Conclusion • A real-time computer vision approach to recognizing human movements based on patterns of motion. • MHI is characterized by multiple, overlapping histograms of the motion orientations. • MHI separate and localize regions of motion for a better description of the movement. • Occlusion and distractor motions are also addressable within this framework. IEFAL Seminar
References • Davis, J. and A. Bobick. The representation and recognition of human movement using temporal templates. In Proc. Comp. Vis. and Pattern Rec., pates 928-934, June 1997. • Davis, J. and A. Bobick. SIDEshow: A silhouette-based interactive dual-screen envionment. MIT Media Lab Perceptual Computing Group Technical Report No. 457, MIT, 1998. • Freman, W., and M. Roth. Orientation histogram for hand gesture recognition. In Int’l Workshop on Automatic Face- and Gesture-Recognition, 1995. IEFAL Seminar