240 likes | 399 Views
MPEG-7 Motion Descriptors. Reference. ISO/IEC JTC1/SC29/WG11 N4031 ISO/IEC JTC1/SC29/WG11 N4062 MPEG-7 Visual Motion Descriptors (IEEE Transactions on Circuits and Systems for Video Technology) Video Indexing using Descriptors of Spatial Distribution of Motion Activity
E N D
Reference • ISO/IEC JTC1/SC29/WG11 N4031 • ISO/IEC JTC1/SC29/WG11 N4062 • MPEG-7 Visual Motion Descriptors (IEEE Transactions on Circuits and Systems for Video Technology) • Video Indexing using Descriptors of Spatial Distribution of Motion Activity (submitted to IEEE Transactions on Circuits and Systems for Video Technology)
Introduction • MPEG-7, formally named “Multimedia Content Description Interface”, is a standard for describing features of multimedia content. • Users can search, browse, and retrieve that content more efficiently and effectively than they could using today’s mainly text-based search engines. • We describes tools and techniques for representing motion information in the context of MPEG-7.
Camera Motion • This descriptor characterizes 3-D camera motion parameters. • It supports the following well-known basic camera operations:fixed,panning,tracking, tilting,booming,zooming,dollying and rolling.
Motion Trajectory • Motion trajectory is a high-level feature, defined as the spatio-temporal localization, of one of its representative point of this object. • The descriptor is essentially a list of keypoints along with a set of optional interpolating function that describe the path of the object between keypoints.
In surveillance,alarms can be triggered if some object has a trajectory identified as dangerous; in sports, specific actions can be recognized.
Parametric Motion • This descriptor addresses the motion of objects in video sequences as a 2D parametric model. • Translational models: vx(x, y) = a1 vy(x, y) = a2 • Rotation/scaling models: vx(x, y) = a1 + a3x + a4y vy(x, y) = a2 - a4x + a3y • Affine models: vx(x, y) = a1 + a3x + a4y vy(x, y) = a2 + a5x + a6y • Perspective models: vx(x, y) = (a1 + a3 x +a4 y) / (1 + a7 x +a8 y) vy(x, y) = (a2 + a5 x +a6 y) / (1 + a7 x +a8 y) • Quadratic models: vx(x, y) = a1 + a3 x + a4 y + a7 xy + a9 x2 + a10 y2 vy(x, y) = a2 + a5 x + a6 y + a8 xy + a11 x2 + a12 y2
Motion Activity • Video content in general spans the gamut from high to low activity, therefore we need a descriptor that enables us to accurately express the activity of a given sequence/shot. • The activity descriptor includes the following attributes: • Intensity of Activity • Direction of Activity (optional) • Spatial Distribution of Activity (optional) • Temporal Distribution of Activity (optional)
Intensity of Activity • Expressed by an 3-bit integer lying in the range 1~5. • A high value of intensity indicates high activity while a low value of intensity indicates low activity. • For example, a still shot has a low intensity of activity while a “fast break” basketball shot has a high intensity of activity.
Intensity is defined as the standard deviation of motion vector magnitudes, appropriately normalized by the frame resolution.
1 – very low activity2 – low activity3 – medium activity4 – high activity5 – very high activity if(std_dev<t1) intensity = 1;else if(std_dev<t2) intensity = 2;else if(std_dev<t3) intensity = 3;else if(std_dev<t4) intensity = 4;else intensity = 5; t1 = 0.257*l/F t2 = 0.706*l/F t3 = 1.280*l/F t4 = 2.111*l/F diagonal length l = sqrt(w*w + h*h) F is the frame rate in frames/second.
Spatial distribution of Activity • The descriptor indicate whether the activity is spread across many regions or restricted to one large region. • It is an indication of the number and size of “active” regions in a frame. • For example, a talking head sequence would have one large active region, while an shot of busy street would have many small active regions.
Recording the length of zero runs in a raster scan order over the thresholded motion vector magnitude matrix. • Short runs are defined as runs that are less than 1/3 of the frame width. • 1/3 < medium runs < 2/3 • Long runs > 2/3 • The element consists of three field: Nsr, Nmr, Nlr,which contain the numbers of short, medium, and long runs of zeros,respectively.
The dark area consists of macroblocks that get non-zero values after thresholding. The remaining area consists of macroblocks that get “zero out” after thresholding.
With smaller,widely spaced objects note that there are more long-run lengths and medium run-lengths
Direction of Activity • While a video shot may give several objects with different activities, we can often identify a dominant direction. • /* quantize angle using uniform 3 bit quantization over 0-360 degrees i.e. 0,45,90,135,180,225,270,315 */ if((f_angle>=-22.5)&&(f_angle<22.5)) direction=0; else if((f_angle>=22.5)&&(f_angle<67.5)) direction=1; else if((f_angle>=67.5)&&(f_angle<112.5)) direction=2; else if((f_angle>=112.5)&&(f_angle<157.5)) direction=3; else if((f_angle>=157.5)&&(f_angle<202.5)) direction=4; else if((f_angle>=202.5)&&(f_angle<247.5)) direction=5; else if((f_angle>=247.5)&&(f_angle<292.5)) direction=6; else if((f_angle>=292.5)&&(f_angle<337.5)) direction=7;
Temporal Distribution of Activity • Express the variation of activity over the duration of the video segment/shot. • A histogram consisting of 5 bin, where histogram bins N0,N1,N2,N3,and N4 correspond to intensity value of 1,2,3,4,and 5 respectively. • Each value is the percentage of occurrences of each quantized intensity level.
Usage and Applications • Video browsing: The motion-activity intensity descriptor enables selection of the video segments of a program based on intensity of motion activity. • Content-based querying of video databse: We can use motion activity to separate the high and low motion parts of the video sequence and or as a first stage content filter.