340 likes | 481 Views
- 1 - 10/11/2014. 3 rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES. REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION. François BREMOND, Monique THONNAT and Thinh VU Van
E N D
- 1 - 10/11/2014 3rd SINO-FRANCO WORKSHOP ON MULTIMEDIA AND WEB TECHNOLOGIES REPRESENTATION, RECOGNITION AND VISUALISATION OF HUMAN BEHAVIOURS FOR VIDEO INTERPRETATION François BREMOND, Monique THONNAT and Thinh VU Van ORION lab, INRIA Sophia-Antipolis, FRANCE
- 2 - 10/11/2014 Plan of presentation • Part I: Video interpretation • Global framework • Scenario recognition • Part II: Visualisation of the interpretation • Scene context (3D geometry) • Human body • Human behaviour • Results • Conclusion
time “Car accident?” “Two strangers exchanging objects?” Mobile object detection & tracking Video stream Recognised scenario ... Scenario recognition Part I: Video interpretation:Global framework • Our goal: to model the interpretation process of video sequences from pixel up to behaviour. • Main issue: current video interpretation systems are based on specific (ad hoc) routines: • depend on sensors (camera orientation). • dedicated to specific scenarios (detection of fighting people) and sites (metro stations).
- 4 - 10/11/2014 Video interpretation: Global framework We define several entities: • Context object: predefined static object of the scene environment (entrance zone, bench, walls, equipment,...). • Moving region: any intensity change between a reference and the current images. • Mobile object: any moving region which has been tracked and classified (person, group of persons, vehicle, noise, … etc). • Basic action: spatio-temporal property, instantaneous, numerical, generic state and event. • Scenario: long term, symbolic, application dependent, behaviour and activity.
- 5 - 10/11/2014 Video interpretation:Global framework A priori knowledge Descriptions of action recognition routines Mobile object classes Tracked object types Sensors information Context objects Scenario library Recognition of scenario 1 Moving region detection Mobile object tracking Recognition of scenario 2 Recognised scenario Recognition of actions Video stream ... Scenario recognition module Recognition of scenario n
- 6 - 10/11/2014 Video interpretation: scenecontext • Definition: a priori knowledge describing: • the sensors (cameras, optical cells and contact sensors): 3D position of the sensor, camera type (colour, resolution), field of view and calibration matrix. • context objects: equipment (bench, trash, door), walls, interesting zones (entrance zone), areas of interest. • 3D geometry: 3D location of the object and its volume. • Semantic information: type of the object (equipment), its characteristics (yellow, fragile) and its function (seat). • Role: • to keep the interpretation independent from the sensors and the sites. • to provide additional knowledge to interpret up to the scenario level.
- 7 - 10/11/2014 Video interpretation: basic actions and scenarios • Issues: large variety of actions and scenarios • more or less abstract (running/fighting). • general (standing)/sensor and application (sit down) dependent. • spatial granularity: the view observed by one camera/the whole site. • temporal granularity: instantaneous/long term. • 3 levels of complexity depending on the complexity of temporal relations and on the number of actors: • non-temporal constraint relative to one actor (being seated). • temporal sequence of sub-scenarios relative to one actor (open the door, go toward the chair then sit down). • complex temporal constraints relative to several actors (A meets B at the coffee machine then C gets up and leaves).
- 8 - 10/11/2014 Video interpretation: basic actions and scenarios We use several formalisms • Action and scenario representation: • n-ary tree. • finite state automaton. • graph. • set of constraints. • Action and scenario recognition: • specific routines. • classification. • bayes. • HMM. • propagation of temporal constraints. • constraint resolution.
Video interpretation: basic actions and scenarios - 9 - 10/11/2014 • Example: a scenario is represented by a set of constraints. Scenario(vandalism_against_ticket_machine, Actors((p : Person), (eq : Equipment, Name = “Ticket_Machine”) ) Constraints( (exist ( (action s1: p move_close_to eq) (action s2: p stay_at eq) (action s3: p move_away_from eq) (action s4: p move_close_to eq) (action s5: p stay_at eq) ) ( (s1 != s4) (s2 != s5) (s1 before s2) (s2 before s3) (s3 before s4) (s4 before s5) ) ) ) Production( (sc : Scenario) ( (Name of sc := "vandalism_against_ticket_machine") (StartTimeof sc := StartTime of s1) (EndTimeofsc := EndTime of s5) ) ) )
Video interpretation:basic actions and scenarios - 10 - 10/11/2014
- 11 - 10/11/2014 Video interpretation:Part I: conclusion • Approach: a framework combining several formalisms: • structure the knowledge to obtain a general model. • to have a declarative description of the knowledge. • to make the knowledge explicit. • to mix bottom-up and top-down processing. • to use evaluation and learning techniques.
- 12 - 10/11/2014 Part II: Visualisation of the interpretation • Development of a test platform for an AVIS (Automatic Video Interpretation System): (a) visualisation of the scenarios recognised by an AVIS. (b) simulation of the input of an AVIS. (c) verification that the test platform is coherent with the AVIS. (d) validation of the AVIS.
- 13 - 11/10/2014 Visualisation of the interpretation • 3 tasks of the test platform: (1) generation of realistic 3D animations corresponding to the scenarios recognised by an interpretation system. (2) generation of videos from 3D animations using a model of a virtual camera. (3) generation of realistic 3D animations corresponding to the scenarios described by an expert.
AVIS Image sequence acquired by a camera 1 1 Recognised scenario - 14 - 10/11/2014 State, event, scenario models for the recognition Scenario recognition Scene context model
AVIS Test platform Image sequence acquired by a camera 1,2 2,3 3 1,2,3 1,2,3 3D Animation corresponding to the scenario Generated image sequence Scenario described by experts Recognised scenario - 15 - 10/11/2014 State, event, scenario models for the recognition Scene context model Scenario recognition Scenario visualisation Human body, action, scenario and animationmodels for the visualisation Scene context model
Scene context Camera Animation Human body Scenarios Actions - 16 - 10/11/2014 Visualisation of the interpretation:approach • Conception of the test platform based on six generic models: • visualisation by using GEOMVIEW.
(2) - 17 - 10/11/2014 Visualisation of the interpretation:Scene context (1) (1) : visualisation of a scene context for a metro station (2) : example of a context object : a bench
1 2 3 - 18 - 10/11/2014 Visualisation of the interpretation:Human body • Model: hierarchical and articulated. • The human body parts are build based on three primitives: (1) sphere. (2) truncated cone. (3) parallelepiped.
- 19 - 10/11/2014 Visualisation of the interpretation:Human body • Generic model of the human body parts: (1) the relative position of the body part in the referential of the super body part. For example, the hand is defined relatively to the arm. (2) the angular co-ordinates of the body part in its referential. (3) the size of the body part along its referential axis. (4) the sub-parts or/and geometric primitives that constitute the body part. (5) the colour of the body part.
(1) (2) (3) (4) (5) - 20 - 10/11/2014 Visualisation of the interpretation:Human body • Definition of 14 classes of human body parts: human body, head, arm, leg, shoulder,… • Different views:
- 21 - 10/11/2014 Visualisation of the interpretation:Human behaviour • Human behaviours for interpretation systems: • basic action: • state: characterises an individual at a given time. • event: change of states at two successive times. • scenario: combination of actions. • Human behaviours for the test platform: • posture: corresponds to all body parameters of an individual at a given time. • action: change of body parameters of an individual. • scenario: combination of actions.
t = t1 t = t2 - 22 - 10/11/2014 Visualisation of the interpretation:Human behaviour: action Generic model of action: • concerned human body part. • initial/final positions. • variation of rotation angles around itsreferential. • global period of the action. • list of sub actions with: • the concerned sub part of human body. • the variation of rotation angles around the sub part referential. • their relative period. • visualisation speed. • fixed part of human body on the ground.
- 23 - 10/11/2014 Visualisation of the interpretation:Human behaviour: action • 21 classes of actions: «walking», «running»,… Actions «walking», «running» and «hand up»
t = 100 t = 150 Fixed point on the ground - 24 - 10/11/2014 Visualisation of the interpretation:Human behaviour: visualisation of action • calculation of the current posture from the previous instant. • calculation of the global position of the individual: • automatic recognition: use the position of the detected individual. • expert description: based on a fixed point on the ground. • visualisation of geometric primitives through GEOMVIEW.
t = 80 t = 240 - 25 - 10/11/2014 Visualisation of the interpretation:Human behaviour: scenario • A scenario is a set of actions combining the individuals of the scene and the context objects which are relevant to the same activity. • Sequence of sub scenarios ordered by their period. • Elementary scenario: action.
- 26 - 10/11/2014 Visualisation of the interpretation:Human behaviour: animation «Walking on the platform» «Person A and person B meet at the coffee machine M»
- 27 - 10/11/2014 Visualisation of the interpretation:Human behaviour: animation «Pushing someone on the tracks» «Following another person»
- 28 - 10/11/2014 Visualisation of the interpretation:Results • Construction of models: • human body with 25 primitives. • 21 types of individual actions. • 4 types of scenarios. • 4 types of animations. • Generation of 7 types of 3D animations from descriptions. • Generation of 3D animations visualising individuals tracked by AVIS. • Checking the coherence by taking animations as input for AVIS.
Visualisation of the interpretation:Results: comparison of 2 animations - 29 - 10/11/2014 Raw video Tracked individuals Animation of tracked individuals Animation from a synthesised video
- 30 - 10/11/2014 Visualisation of the interpretation:Part II: Conclusion and contributions • Six generic models: • scene context. • virtual camera. • human body. • individual actions. • scenarios. • animations. • A description language for modeling the knowledge of the scene. • Validation of these models on metro scenes.
- 31 - 10/11/2014 Part I&II: conclusion and perspectives • Help the developer: • visualisation of results of the interpretation (case multi-cameras). • generation of test sequences (add the noisy phenomena) for validating and establishing the limits of an AVIS. • Help the expert for describing new scenarios. • Define an unified platform using the same models for the interpretation and the test platform.
- 32 - 10/11/2014 Visualisation of the interpretation:Scene context • A scene context is composed of 4 elements: • zones (e.g. zone of bench) with semantic information (e.g. expected mobile objects): represented by polygons. • walls: represented by vertical polygons. • context objects (e.g. bench) with semantic information (e.g. function of the object, time and distance of utilization): represented by 3D geometric primitives (sphere, truncated cone, parallelepiped). • camera information: calibration matrix containing the parameters of the virtual camera (e.g. position, direction, FOV).
- 33 - 10/11/2014 Video interpretation: tracking of mobile objects • Issue: detection errors, non rigid objects, occlusions, merging and splitting of trajectories. • Approach: combining different types of tracking- frame to frame tracker: to compute correspondences between successive mobile objects.- individual tracker: tracking of specific individuals using time delay.- group tracker: global tracking of groups of persons. • For example: a group of persons is defined as a set of individuals which has four characteristics:- special coherency: the mobile objects are close to each other.- size coherence: the mobile objects are bigger than a person.- temporal coherence: the motion of mobile objects corresponds to the motion of a person.- structure coherence: the number and the size of the mobile objects are stable. • Enable to compute a reliable historic of all mobile objects.
- 34 - 10/11/2014 Visualisation of the interpretation:Human behaviour: animation • An animation combines and instantiates all previously defined scenarios: • scene context: with the set up values (e.g. colour of the context objects). • actors with their set up values (e.g. position). • scenarios with the involved actors and their period of occurrence. • virtual camera used to visualise the scene. • visualisation speed.