WP 4: Context Aware Video Acquisition

WP 4: Context Aware Video Acquisition • James L. Crowley • Professeur I.N.P.Grenoble • Project PRIMA, Laboratoire GRAVIR • INRIA Rhône Alpes

WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work

FAME Augmented Meeting Environment • 5 Sony Steerable Cameras • Wide Angle Camera • Microphone Array • 3 Video Interaction Devices: • Vertical • Horizontal • Steerable

FAME Augmented Meeting Environment

WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario • Process Federation for CA Video Acquistion v1.0 • Task, Situation and Context • Compiling the Situation Graph Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work

Supervisory Process Event Bus Agent Tracker Agent Tracker Agent Tracker Speech Detection Speech Location Camera 1 Camera 1 Camera 1 Microphone Array Camera Control Federation

Task, Situation and Context • Situation: An configuration of roles and relations. • Role: Interpretation of an entity or agent • Relation: A predicate over entities and agents • A task model describes the state space of situations • and the actions of the system for each situation • Approach: Compile a federation of processes to observe the agents and entities that define situations.

Task, Situation and Context • Basic Concepts: • Property: Any value observed by a process • Entity: A “correlated” set of properties • Composite entity: A composition of entities • Relation: A predicate defined over entities • Agent: An entity that can act. • Situation: A configuration of roles and relations. • Context: A network of situations

Context Aware Video Acquisition • Design Method: 1) Define actions to be taken by system 2) Define situations for each action 3) Define roles and relations 4) Define observation processs 5) Compile situation graph into supervisor rules.

Actions to be taken by Context Aware Video Acquisition System v1.0 • Record Shots: • A1 Record wide angle view of the scene • A2 Record the speaker • A3 Record the audience

Camera 3 Camera 1 Camera 2 Camera Angles

Situations for the Context Aware Video Acquisition System • Situations: S0 empty room Æ A1 S1 Actor enters the room Æ A1 S2 Speaker (actor) speaks Æ A2 S3 Audience (actor) asks a question Æ A3

JESS (CLIPS in Java) Events Events Process1 Process 2 Process 3 Data Properties Process Federation Tool • JESS (CLIPS in Java) Environment sends messages to processes

Define the roles and relations for Context Aware Video Acquisition System • Roles and relations for camera controller • R1: Agent in audience asks a question • (Agent in audience is speaking) • R2: Lecturer (Agent at lecture-position) • R3: Arriver (Agent at door) • R4: Audience (Agents in audience region) • R5: Speaker (Agent currently speaking)

Compiling Situations to Rules XML: <role name="lecturer" arity="1"> <description> The person giving a lecture </description> </role> SituationsPetri NetXML Description->Rules

Compiling Situations to Rules CLIPS: (defrule t2EventTransition ?tr <- (transitionTrigger (name "t2") (lock ?l&:(neq ?l 0))) ?pre_S1 <- (situation (name "S1") (entities ?newComer)) (situationPlace (name "S1") (mark ?m_S1&:(neq ?m_S1 0))) (or (not (newComer (isPlayedBy ?newComer))) ) (lecturer (isPlayedBy ?new_lecturer)) (speaker (isPlayedBy ?new_speaker)) (isSameAs (isVerifiedBy ?new_speaker ?new_lecturer)) ?post_S2 <- (situation (name "S2")) => (modify ?tr (lock (- ?l 1))) (assert (event (name "t2"))) (modify ?pre_S1 (entities)) (modify ?post_S2 (entities ?new_lecturer ?new_speaker)) ) SituationsPetri NetXML Description->Rules

WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation • Perceptual Processes • Tracking Bodies Hands and Faces • Recognizing and Locating Speech Sounds Version 1.0 of the Intelligent Camera Man Current Work

Supervisory Process Event Bus Agent Tracker Agent Tracker Agent Tracker Speech Detection Speech Location Camera 1 Camera 1 Camera 1 Microphone Array Video Acquisition Process Federation

Processs to observe agents, entities and relations • P1: Supervisory Controller • P2: Visual Tracking Process for agents with Camera 1 (Wide Angle camera) • P3: Visual Tracking Process for agents with Camera 2 (Audience region) • P4: Visual Tracking Process for agents with Camera 3 (lecturer region) • P5: Speech preprocessing and detection • P6: Speech position estimation.

Agent Detection and Tracking Process • Observation Modules: • Color Histogram Ratio • Background Difference • Receptive Field Histograms • Motion History Image

Agent Detection and Tracking Process • Process Phases: • While True Do • Acquire next image • Calculate ROI for targets • Verify and update targets • Detect new targets • Regulate module parameters • Interpret entities • Process messages

Agent Detection and Tracking • Actors: Composite Entities. • Entity Tracker: Background difference, motion and color • Entity Grouper: Assigns roles to blobs as body, hands, face or eyes

Software bus Audio Localisation Audio Router Synchronized audio channels Channel 1 Time Difference Of Arrivals (TDOA) Geometric coordinates evaluation (4 microphones => 3D localisation) Channel 2 Voice activity detection : - energy analysis - speech signal Recognition Channel 3 TCP/IP Client Channel 4 TCP/IP server Speech recognition Channel n • Audio preprocessing : • hardware offset • echo cancellation TCP/IP Client Acoustic Perception Processes

supervisory Controller Receive : - configuration messages - commands Send : - position of video targets Software bus Send : - speech activity messages - position of audio targets Receive : - configuration messages - position of video targets Send : - configuration messages Receive : - position of video targets - position of audio localisation targets - Speech activity Send : speech activity messages Receive : configuration messages Audio Router Audio Localisation Context tracker Acoustic Process Federation

Multi channel Acoustic Server Process Channel selection Speech Waveform and Spectrogram Remove soundcard offset Usage Adaptive cepstral echo cancellation Final voice activity detection (using energy and neural net results) Temporal energy analysis Voice detection

Microphones Room map Current target Software bus status connection status Processing flag Configuration Confidence Acoustic Position Estimation

Speech Detection Process

Context Aware Video Acquisition v1.0

WP 4: Context Aware Video Acquisition • Plan Situations and Actions for the Lecture Scenario Assembling a Process Federation Version 1.0 of the Intelligent Camera Man Current Work • Adding Camera Pan-Tilt Control • Adding New cameras • Estimating Face Orientation • Integration of Topic Spotter

Camera Control

Face Orientation Estimation

WP 4: Context Aware Video Acquisition • PRIMA Group, Laboratoire GRAVIR (UMR) • INPG (P2), INRIA (P8), UJF (P9), CNRS(P10) • Personnel Contributing during period: • James L. Crowley (Prof. INPG) • Augustin Lux (Prof INPG) • Patrick Reignier (MdC UJF) • Dominique Vaufreydaz (Post Doc INPG) • Alban Caparossi (Engineer UJF) • Stan Borkowski (Doctoral Student, INPG) • Hai Tranh (Doctoral Student, INPG) • Nicolas Gourier (Doctoral Student, INRIA)

WP 4: Context Aware Video Acquisition