Knowledge-based event recognition from salient regions of activity

Nicolas Moënne-Loccoz Viper group Computer vision & multimedia laboratory University of Geneva Knowledge-based event recognition from salient regions of activity M4 – Meeting – January 2004 January 23 2003 / Nicolas.Moenne-Loccoz@cui.unige.ch

Outline • Context • Salient Regions of Activity (SRA) • Learning the semantic of SRA • Visual Event Query language • Conclusion NML - CVML - UniGe

Context • Retrieval of visual events based on user query • Abstract representation of the visual content • Query Language to express visual events • Approach • Region-based description of the content • Classification of the regions • Events queried as spatio-temporal constraints on the regions NML - CVML - UniGe

Overview Domain Knowledge Salient regions of activity Videos database Labelled regions Region extraction Classification User queries NML - CVML - UniGe

Salient regions of activity • Regions of the image space • Moving in the scene • Having an homogenous colour distribution  Moving objects or meaningful parts of moving objects • Extraction : • From moving salient points • By an adaptive mean-shift algorithm NML - CVML - UniGe

Salient points extraction • Scale invariant interest points(Mikolajczyk, Schmid 2001) • Extracted in the linear scale-space • Local maxima of the scale normalized Harris function (image space) • Local maxima of the scale normalized Laplacian (scale space) NML - CVML - UniGe

Salient points extraction • Example : scale NML - CVML - UniGe

Salient points trajectories • Trajectories used to : • Find salient points moving in the scene • Track salient points along the time • Points matching using Local grayvalue invariants (Schmid) NML - CVML - UniGe

Salient points trajectories • Mahalanobis distance : • Set of matching points minimize • Greedy Winner-Takes-All algorithm • Set of points trajectories • Moving salient points : NML - CVML - UniGe

Salient regions estimation • Estimate characteristic regions of the moving salient points • Mean-Shift algorithm : estimate the position • Likelihood of pixels (RGB colour distribution) • Ellipsoidal Epanechnikov Kernel NML - CVML - UniGe

Salient regions estimation • Kernel adaptation step : estimate shape and size • Algorithm : NML - CVML - UniGe

Salient regions representation • Set of salient regions of activity represented by : • Position • Ellipsoid • Colour distribution • Set of salient points • Salient regions tracking • Regions are matched by a majority vote of their salient points NML - CVML - UniGe

Salient regions of activity NML - CVML - UniGe

Regions classification • To obtain an abstract description : • Map regions to a domain-specific basicvocabulary  Meetings : {Arm, Head, Body, Noise} • SVM classifier : • Set of 500 annotated salient regions of activity (~200 frames) NML - CVML - UniGe

Regions classification • Confusion Matrix : • Discussion : • Noise class is ill-defined • Good results explained by the limited number of classes NML - CVML - UniGe

Visual event language • To express visual events queries • Spatio-temporal constraints on labelled regions (LR) • To integrate domain Knowledge • As specification of the layout (L) • As set of basic events • a formula of the language is a conjunctive form of : • Temporal relations {after, just-after} between 2 LR • Spatial relations {above, left} between 2 LR {in} between a LR and a L • Identity relations {is} between 2 LR {is-a} between a LR and a label NML - CVML - UniGe

Knowledege - Meetings • Scene layout : L = {SEATS, DOOR, BOARD} NML - CVML - UniGe

Knowledege - Meetings • Basic events : {Meeting-participant, sitting, standing} • Meeting-participant : actors LR • constraintsis-a(head, LR). • Sitting : actor : LR • constraints : Meeting-participant(LR), • in(SEATS, LR). • Standing : actor : LR • constraints : Meeting-participant(LR), • ~in(SEATS, LR). NML - CVML - UniGe

Events queries • Example of user queries : • Sitting-down : actors LR1, LR2 • constraintsis(LR1, LR2), • sitting(LR1), • standing(LR2), • just-after(LR1, LR2). • Go-to-board : actors LR1, LR2 • constraintsis(LR1, LR2), • standing(LR1), • ~in(Board, LR1), • standing(LR2), • in(Board, LR2), just-after(LR2, LR1). NML - CVML - UniGe

Events queries - Results • Results : • Discussion : • Recall validate the retrieval capability • False alarms occurbecause of the hard decision NML - CVML - UniGe

Conclusion • Contributions • Well-suited framework for constraint domains • Generic representation of the visual content • Paradigm to retrieve visual events from videos • Limitations • Cannot retrieve all visual events (e.g. emotion) • Ongoing work • Uncertainty handling and fuzziness • Integration of other modalities (e.g. transcripts) NML - CVML - UniGe

Knowledge-based event recognition from salient regions of activity