150 likes | 292 Views
MPEG-7 Visual part of eXperimentation Model. Presented by: Moustafa A. Hammad. Introduction. MPEG-7 “ Multimedia content description interface”. A Quick review: Feature, Descriptor, Descriptor value, Description Scheme, Description Definition Language (DDL)
E N D
MPEG-7 Visual part of eXperimentation Model Presented by: Moustafa A. Hammad.
Introduction • MPEG-7 “ Multimedia content description interface”. • A Quick review: Feature, Descriptor, Descriptor value, Description Scheme, Description Definition Language (DDL) • Visual Elements ( Color, Spatial Structure, Shape, Motion) • Representation of descriptors using DDL ( MPEG-7 DDL has been approved, October 99 )
Topics of Discussion • Visual Elements Descriptors: • Color • Color Space • Dominant Color • Color Histogram • Spatial Structure • Grid Layout • Shape • Object Bounding Box • Motion • Camera Motion • Object Motion Trajectory
Color • Color Space • Like : • RGB, YCrCb, HSV, linear transformation matrix with reference to RGB • Syntax Color_Space_Descriptor{ Enum description_color_space {rgb, ycrcb, hsv, linear_matrix}; If (description_color_space == linear_matrix){ Int color_trans_mat[3][3] } }
Color (Cont.) • Dominant Color (DC) • To specify set of DC in a shaped region. • Syntax Dominant_Color_D{ int Dominant_Colors_Number struct Dominant_Color DCs[Dominant_Colors_Number] Int Confidence_Measure } struct Dominant_Color{ int Color_Value[Color_Space_Dimension] Int Percentage } • Use color histogram (to get dominant colors (non normative part) • Content-based retrieval.
Color (Cont.) • Color Histogram • Denotes percent of each color in image object. • Syntax Color_Histogram_Descriptor{ int histogram_norm_factor int number_histogram_bins int histogram_value[number_histogram_bins] } Histogram_value Histogram_bin
Spatial Structure • Grid Layout Grid_Layout { int PartNumberH; int PartNumberV; }
Shape h 1 / DAR v (0,0) 1 Major axis • Object Bounding Box Bounding_Box{ Enum LengthUnits; //Picture height, meters. double BoxHeight double BoxWidth double BoxDepth double FractionalOccupancy boolean Is3D If (HasCompositionInfo0{ double BoxCentreH double BoxCentreV double } If (Is3D){ double BoxCentreD double } } DAR = h/w
Shape (Cont.) • Extraction for 2D objects (non normative part) • Matching/Query process (non normative part) • Find the 2D objects whose aspect ratio (=,>,<) a certain value or within a certain range. • Find the 2D or 3D objects whose sizes are similar to the one in this object or (=,>,<) a certain vale or within a certain range. • Find the object that are positioned near (x,y) or (x,y,z) location in the picture/3D world. • Find the object whose height/width/depth are (=,>,<) a certain value or within a certain range. ObjectID of interest Identifying samples belonging to object of interest Bounding Box Estimation Image Segmentation map Bitmap of the object Of interest Segmentation Aspect Ratio
Motion Boom up Track right Dolly forward Dolly backward Track left Pan right Tilt up Roll Boom down Pan left Tilt down • Camera Motion • Camera operations (8 well known operation) The operations in the figures + (zooming, change of the focal length) and Fixed • Extract (camera motion parameter information) • Sub shots - (frames characterized by type (s) of camera motion) - mixture or non-mixture • Sub shots are the building blocks of this descriptor
Motion (Cont.) • Syntax Camera_Motion_Descriptor { Int Num_Segment_Description Int Description_Mode Segmented_Camera_Motion Info[Num_Segment_Description] } Segmented_Camera_Motion{ TimeStamp start_time float duration Fractional_Presence presence Amount_of_Motion speeds float FOE_FOC_Horizontal_Position float FOE_FOC_Vertical_Position } Fractional_Presence{ float TRACK_LEFT float TRACK_RIGHT float BOOM_DOWN float BOOM_UP float DOLLY_FORWARD float DOLLY_BACKWARD float PAN_LEFT float PAN_RIGHT float TILT_UP float TILT_DOWN float ROLL_CLOCKWISE float ROLL_ANTICLOCKWISE float ZOOM_IN float ZOOM_OUT float FIXED } Amount_of_Motion{ float TRACK_LEFT float TRACK_RIGHT float BOOM_DOWN float BOOM_UP float DOLLY_FORWARD float DOLLY_BACKWARD float PAN_LEFT float PAN_RIGHT float TILT_UP float TILT_DOWN float ROLL_CLOCKWISE float ROLL_ANTICLOCKWISE float ZOOM_IN float ZOOM_OUT } Example: A shot represented as mixture mode of duration 40 time unit Shot Num_Segment_Description = 1 Description_Mode = 1 // mixture mode Segmented_Camera_Motion Info[1] Start_time = 0 Duration = 40 Presence = 0.25-0.25 0.32 - - - 0.25 - - - - 0.2 – 0.2 <rest of attributes>
Motion (Cont.) • Object Motion Trajectory • Spatio-temporal localization of one representative point of the object (such as a centroid). Object_Motion_Trajectory { int Camera_follows_object enum Coord_system {local, world} I (Coord_system == world){ Boolean local_to_world_parameters_known if ( local_to_world_parameters_known){ world_coord_info *world_params } enum spatial_units {picture_height, meters} boolean Coords_are_3D int N_key_points double Key_point_t[N_key_points] double Key_point_x[N_key_points] double Key_point_y[N_key_points] If ( Coords_are_3D){ double Key_point_z[N_key_points] } boolean Object_Always_Visible If ( ! Object_Always_Visible){ Object_is_visible[N_key_points – 1] } boolean Use_default_interp_only If( ! Use_default_interp_only){ If ( Coords_are_3D){ Interval_3D_info Intervals_3D[N_key_points – 1] } else{ Interval_2D_info Intervals_3D[N_key_points – 1] } } } Interval_3D_info{ interpolation_function_info x_function interpolation_function_info y_function interpolation_function_info z_function } Interval_2D_info{ interpolation_function_info x_function interpolation_function_info y_function } Interpolation_function_info{ int Function_ID if ( Function_ID > 0){ list of parameters_values } } f(t) = fa + va ( t – ta ) f(t) = fa + va ( t – ta ) + ½ aa (tb – ta)
Motion (Cont.) • Extraction (non-normative part) • Input : object binary segmentation mask sequence. • Output : global motion information. • Process: • Instantiate key points time instants in description ( one per frame) • Calculate the mass center of the mask at each frame. • Calculate speed and acceleration (z information may be deduced by from the size variation) • Choose interpolation function • If the object is moving eventually subtract the global motion from the object motion if you want the trajectory in the scene reference and not in the camera reference. • We define object followed by the camera as: • Unmoving object at a position near from the image center • Object having irregular displacement of little amplitude around a position near the image center • Matching • The distance between two trajectory descriptors D, D’ is: • d(D, D’) = i ( (xi - x’ i )2 + (yi - y’ i )2 + (zi - z’ i )2 ) / (t i ) + ( (Vxi - V’xi )2 + (Vyi - V’yi )2 + (Vzi - V’zi)2 ) / (t i ) + ( (axi - a’ xi )2 + (ayi - a’yi )2 + (azi - a’zi)2 ) / (t i )
Motion (Cont.) • Example query • Find the video frames in which object k is moving to the right (left, up, down) • Soln: For all pairs of frames contained within the existance interval for object k compute the motion vector v(t2) = pos(t2) - pos(t1) Where, pos(t) is the object spatial coordinate at time t.
Descriptor representation using MPEG-7 proposed DDL <DType name='speeds'> <attribute name='TRACK_LEFT' type='real'/> <attribute name='TRACK_RIGHT' type='real'/> <attribute name='BOOM_DOWN' type='real'/> <attribute name='BOOM_UP' type='real'/> <attribute name='DOLLY_FORWARD' type='real'/> <attribute name='DOLLY_BACKWARD' type='real'/> <attribute name='PAN_LEFT' type='real'/> <attribute name='PAN_RIGHT' type='real'/> <attribute name='TILT_UP' type='real'/> <attribute name='TILT_DOWN' type='real'/> <attribute name='ROLL_CLOCKWISE' type='real'/> <attribute name='ROLL_ANTICLOCKWISE' type='real'/> <attribute name='ZOOM_IN' type='real'/> <attribute name='ZOOM_OUT' type='real'/> </DType> <DType name='presence'> <subDOf name='speeds/> <attribute name='fixed' type='real'/> </DType> • Camera motion descriptor: <DType name='CameraMotionD'> <attribute name='NumSegmentDescription' type=’integer’/> <attribute name='DescriptionMode' type=’boolean’/> <seq minOccurs='0' maxOccursPar='NumSegmentDescription'> <DTypeRef name='SegmentedCameraMotionD'/> </seq> </DType> <DType name='SegmentedCameraMotionD'> <DType name='start_time' type='time'/> <DType name='duration' type='timeDuration'/> <DTypeRef name='presence'/> <DTypeRef name='speeds'/> <DType name='FOE_FOC_HorizontalPosition' type=’real/> <DType name='FOE_FOC_VerticalPosition' type='real'> </DType>