290 likes | 419 Views
Evaluation of Research Theme CogB. Objectives. LEAR : LEA rning and R ecognition in vision Visual recognition and scene understanding Particular objects and scenes Object classes and categories Human motion and actions Strategy : Robust image description + learning techniques. Axes.
E N D
Objectives • LEAR: LEArningand Recognition in vision • Visual recognition and scene understanding • Particular objects and scenes • Object classes and categories • Human motion and actions • Strategy : Robust image description + learning techniques
Axes • Robust image description • Appropriate descriptors for objects and categories • Statistical modeling and machine learning for vision • Selection and adaptation of existing techniques • Visual object recognition and scene understanding • Description + learning
Overview • Presentation of the team • Positioning within INRIA and internationally • Progress towards initial goals • Main scientific contributions • Future – next four years
Team • Creation of the LEAR team in July 2003
Positioning in INRIA • Main INRIA strategic challenge: Developing multimedia data and multimedia information processing • The only INRIA team with object recognition as its central goal • Expertise in image description and applied learning
INRIA teams with related themes • Imedia: indexing, navigation and browsing in large multi-media data streams • TexMex: management of multi-media databases, handling large data collections and developing multi-media and text descriptors • Vista: analysis of image sequences, motion descriptors • Ariana: image processing for remote sensing
International positioning • In France and Europe: a few groups work on the problem (Amsterdam, Oxford, Leuven, TU Darmstadt) • In the US: several groups use machine learning for visual recognition (CMU, Caltec, MIT, UBC, UCB, UCLA, UIUC) • Competitive results compared to the above groups in • Image description (scale and affine invariant regions) • Classification and localization of object categories; winner of 14 out of 18 tasks of the PASCAL object recognition challenge • Learning-based human motion modeling
Progress towards initial goals • LEAR was created two and a half years ago • Significant progress towards each goal, especially • Category classification and detection • Machine learning • Scientific production • Publications (65 journals, conferences & books in 3 years, mainly in the most competitive journals and conferences) • Software, databases available on our web page • Collaborations (INRIA team MISTIS, UIUC in the US, ANU in Australia, Oxford, Leuven, LASMEA Clermont-Ferrand …)
Progress towards initial goals • Industrial contracts (MBDA, Bertin technologies,Thales Optronics, Techno-Vision project Robin) • Research contracts (French grant ACI “Large quantities of data” MoviStar, EU network PASCAL, EU project AceMedia, EU project CLASS, EADS and Marie Curie postdoctoral grants) • Scientific organization (Editorial boards of PAMI and IJCV; program committees/area chairs of all major computer vision conferences; organization of ICCV’03 and CVPR’05; vice-head of AFRIF; co-ordination of EU project CLASS, Techno-Vision project Robin and ACI MoviStar)
Main contributions - overview • Image descriptors • Scale- and affine-invariant detectors + descriptors • Local dense representations • Shape descriptors • Color descriptors • Learning • Clustering • Dimensionality reduction • Markov random fields • SVM kernels
Main contributions - overview • Object recognition • Texture recognition • Bag-of-features representation • Spatial features (semi-local parts, hierarchical spatial model) • Multi-class hierarchical classification • Recognition with 3D models • Human detection • Human tracking and action recognition • Learning dynamical models for 2D articular human tracking • 3D human pose and motion from monocular images
Invariant detectors and descriptors • Scale and affine-invariant keypoint detectors [IJCV’04] • Matching in the presence of large viewpoints changes
Invariant detectors and descriptors • Evaluation of detectors and descriptors [PAMI’05, IJCV’06] • Database with different scene types (textured and structured) and transformations • Definition of evaluation criteria • Collaboration with Oxford, Leuven, Prague • Database and binaries available on the web • 4000 access and 1000 downloads
Dense representation • Dense multi-scale local descriptors [ICCV’05] • Still local, but captures more of the available information • Clustering to obtain representative features • our clustering algorithm deals with very different densities • Feature selection determines the most characteristic clusters
Bag-of-features for image classification SVM Extract regions Compute descriptors Find clusters and frequencies Compute distance matrix Classification
bikes books building cars people phones trees Bag-of-features for image classification • Excellent results in the presence of background clutter • Our team won all image classification tasks of the PASCAL network challenge on visual object recognition
A Recognition with spatial relations Approach [ICCV’05]: • Semi-local parts: point regions and similar geometric neighborhood structure • Validation, i.e. part selection • Learn a probabilistic model of the object class (discriminative maximum entropy framework)
Recognition with spatial relations Improved recognition for classes with structure
Human detection [CVPR’05] Histogram of oriented image gradients as image descriptor SVM as classifier, importance weighted descriptors Winner of the PASCAL challenge on human detection
Evaluation of category recognition • Techno-Vision project Robin (2005-2007) • Funded by the French ministries of defence and of research • Construction of datasets and ground truth • Industrial partnership with MBDA, SAGEM, THALES, Bertin Tech, Cybernetix, EADS and CNES • Production of six datasets with thousands of annotated images, from satellite images to ground level images
Evaluation of category recognition • Evaluation metrics for category classification and localization in collaboration with ONERA and CTA/DGA • Organization of competitions in 2006, 38 registered participants (research teams) at the moment • Datasets, metrics and evaluation tools will be publicly available for benchmarking
Learning based human motion capture learning [CVPR’04, ICML’04, PAMI’06], best student paper at the Rank Foundation Symposium on Machine Understanding of People
Future – next four years • The major objectives remain valid • Image description [low risk] • Learn image descriptors [PhD of D. Larlus] • Shape descriptors [postdoc of V. Ferrari] • Color descriptors [postdoc of J. Van de Weijer] • Spatial relations [PhD of M. Marszalek] • Learning [medium risk] • Semi- & unsupervised learning, automatic annotation • Hierarchical structuring of categories • Existing collaborations, EU project CLASS, postdoc of J. Verbeek
Future – next four years • Object recognition • Object detection & localization [low risk] • Large number of object categories [medium risk] • Scene interpretation [high risk] • Human modeling and action recognition • Pose & motion for humans in general conditions [PhD A. Agarwal] • Recognition of actions and interactions