1 / 43

Jenny Benois-Pineau, LaBRI – Université de Bordeaux – CNRS UMR 5800/ University Bordeaux1

Jenny Benois-Pineau, LaBRI – Université de Bordeaux – CNRS UMR 5800/ University Bordeaux1 H. Boujut , V. Buso , L. Letoupin Ivan Gonsalez-Diaz( University Bordeaux1) Y. Gaestel , J.-F. Dartigues (INSERM).

korbin
Download Presentation

Jenny Benois-Pineau, LaBRI – Université de Bordeaux – CNRS UMR 5800/ University Bordeaux1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jenny Benois-Pineau, LaBRI – Université de Bordeaux – CNRS UMR 5800/ University Bordeaux1 H. Boujut, V. Buso, L. LetoupinIvan Gonsalez-Diaz(University Bordeaux1) Y. Gaestel, J.-F. Dartigues(INSERM) Egocentric vision formwearable cameras for studies of neurodegenerativediseasesVisual attention maps

  2. Summary • Introduction and motivation • Wearablevideo • Visual attention/SalincyMapsfromwearablevideo application to recognition of manipulatedobjects. • Perspectives

  3. Introduction and motivation • Recognition of Instrumental Activities of Daily Living (IADL) of patients suffering from Alzheimer Disease • Decline in IADL is correlated with future dementia • IADL analysis: • Survey for the patient and relatives → subjective answers • Observations of IADL with the help of video cameras worn by the patient at home • Objective observations of the evolution of disease • Adjustment of the therapy for each patient: IMMED ANR, Dem@care IP FP7 EC 3

  4. Context • Projects: • - ANR Blanc IMMED 2009 – 2012 • LABRI, IMS, ISPED, IRIT • EU FP7 PI Dem@care 2011 – 2015 • ITI-CERTH (Gr), INRIA Sophia, UBx1 ( LABRI, IMS), CHUN/ISPED, LTU(Sw), DCU/DCU memoryclinic(Ir), Cassidian(Fr), Philipps(NL), • VISTEK ISRA vision (T). • General trend : computer vision and multimediaindexing for healthcare applications (USA – a lot, Georgia Tech, Harward, USC etc…) as non-intrusive, écological

  5. (c) 2. Wearable videos • Video acquisition setup • Wide angle camera on shoulder • Non intrusive and easy to use device • IADL capture: from 40 minutes up to 2,5 hours • Natural integration into home visit by paramedical assistants protocol Loxie – ear-weared Looking-glaces wearedwitheye-tracker ( eyebrain(?) 5

  6. Wearable videos • 4 examples of activities recorded with this camera: • Making the bed, Washing dishes, Sweeping, Hovering 6

  7. 4. Visual attention/SaliencyMapsfromwearablevideo application to recognition of manipulatedobjects • Introduction • State of the art • Saliency Modeling • Viewpoint: Actor vs. Observer • Object recognition in egocentric videos with saliency • Results • Conclusion

  8. Introduction • Object recognition (Dem@care IP FP7 EU Funded, 7M€) • From wearable camera • Egocentric viewpoint • Manipulated objects from activities of daily living

  9. Window search: a common approach Objectness[1][2] Measure to quantify how likelyitis for an image window to contain an object. [1] Alexe, B., Deselares, T. and Ferrari, V. Whatis an object? CVPR 2010. [2] Alexe, B., Deselares, T. and Ferrari, V. Measuringthe objectness of image windows PAMI 2012.

  10. Object Recognition with Saliency • Many objects may be present in the camera field • How to consider the object of interest? • Our proposal: By using visual saliency IIMMED DB

  11. Our approach : moleling Visual Attention • Several approaches • Bottom-up or top-down • Overt or covert attention • Spatial or spatio-temporal • Scanpath or pixel-based saliency • Features • Intensity, color, and orientation (Feature Integration Theory [1]), HSI or L*a*b* color space • Relative motion [2] • Plenty of models in the literature • In their 2012 survey, A. Borji and L. Itti[3] have taken the inventory of 48 significant visual attention methods [1] Anne M. Treisman & Garry Gelade. A feature-integration theory of attention. Cognitive Psychology, vol. 12, no. 1, pages 97–136, January 1980. [2] Scott J. Daly. Engineering Observations from Spatiovelocity and Spatiotemporal Visual Models. In IS&T/SPIE Conference on Human Vision and Electronic Imaging III, volume 3299, pages 180–191, 1 1998. [3] Ali Borji & Laurent Itti. State-of-the-art in Visual Attention Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99, no. PrePrints, 2012.

  12. State of the Art • Saliencybasedapproach • [1] performed a recentcomparison of action recognition performances usingSaliency-based modification of the BOW framework on Hollywood2 dataset (videosextractedfrommovies) • Resultsoutpermormprevious state of the art with a 61,9% rate of action recognition (previously 58,3%) • [1] E. Vig, M. Dorr, and D. Cox. Space-variant descriptorsamplingfor action recognition based on saliency and eye movements. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, editors, Computer Vision ECCV 2012 Fig. 1. 6 differentsaliencymasksfromleft to right, top row: unmasked, center mask, empiricalsaliencymask (eye-tracker), bottomrow: analyticalsaliencymask, peripheralmask, off set empiricalmask

  13. “Obejctive”/automatic saliency from video: Itti’smodel • The most widely used model • Designed for still images • Does not consider the temporal dimension of videos [1] Itti, L.; Koch, C.; Niebur, E.; , "A model of saliency-basedvisual attention for rapidsceneanalysis , » Pattern Analysis and Machine Intelligence, IEEE Transactions on , vol.20, no.11, pp.1254-1259, Nov 1998

  14. Spatiotemporal Saliency Modeling • Most of spatio-temporal bottom-up methods work in the same way[1], [2] • Extraction of the spatial saliency map (static pathway) • Extraction of the temporal saliency map (dynamic pathway) • Fusion of the spatial and the temporal saliency maps (fusion) [1] Olivier Le Meur, Patrick Le Callet & Dominique Barba. Predictingvisualfixations on video based on low-level visual features. Vision Researchearch, vol. 47, no. 19, pages 2483–2498, Sep 2007. [2] Sophie Marat, Tien Ho Phuoc, Lionel Granjon, Nathalie Guyader, Denis Pellerin & Anne Guérin-Dugué. Modellingspatio-temporalsaliencyto predict gaze direction for short videos. International Journal of Computer Vision, vol. 82, no. 3, pages 231–243, 2009. Département Images et Signal.

  15. Spatial Saliency Model • Based on the sum of 7 color contrast descriptors in HSI domain [1][2] • Saturation contrast • Intensity contrast • Hue contrast • Opposite color contrast • Warm and cold color contrast • Dominance of warm colors • Dominance of brightness and hue • The 7 descriptors are computed for each pixels of a frame I using the 8 connected neighborhood. • The spatial saliency map is computed by: • Finally, is normalized between 0 and 1 according to its maximum value [1] M.Z. Aziz & B. Mertsching. Fast and Robust Generation of Feature Maps for Region-Based Visual Attention. Image Processing, IEEE Transactions on, vol. 17, no. 5, pages 633 –644, may 2008. [2] Olivier Brouard, Vincent Ricordel & Dominique Barba. Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif. In Compression et representation des signaux audiovisuels, CORESA 2009, page 6 pages, Toulouse, France, March 2009.

  16. Temporal Saliency Model • The temporal saliency map is extracted in 4 steps [Daly 98][Brouard et al. 09][Marat et al. 09] • The optical flow is computed for each pixel of frame i. • The motion is accumulated in and the global motion is estimated. • The residual motion is computed: • Finally, the temporal saliency map is computed by filtering the amount of residual motion in the frame. • with , and

  17. Saliency Model Improvement • Spatio-temporal saliency models were designed for edited videos • Not well suited for unedited egocentric video streams • Our proposal: • Add a geometric saliency cue that considers the camera motion anticipation 1. H. Boujut, J. Benois-Pineau, and R. Megret. Fusion of multiple visualcues for visualsaliency extraction fromwearable camera settings withstrong motion. In A. Fusiello, V. Murino, and R. Cucchiara, editors, Computer Vision ECCV 2012, IFCV WS

  18. Geometric Saliency Model • 2D Gaussian was already applied in the literature [1] • “Center bias”, Busswel, 1935 [2] • Suitable for edited videos • Our proposal: • Train the center position as a function of camera position • Move the 2D Gaussian center according to camera center motion. • Computed from the global motion estimation • Considers the anticipation phenomenon [Land et al.]. [1] TilkeJudd, Krista A. Ehinger, Frédo Durand & Antonio Torralba. Learning to predict where humans look. In ICCV, pages 2106–2113. IEEE, 2009. [2] Michael Dorr, et al. Variability of eye movements when viewing dynamic natural scenes. Journal of Vision (2010), 10(10):28, 1-17 Geometric saliency map

  19. Geometric Saliency Model • The saliency peak is never located on the visible part of the shoulder • Most of the saliency peaks are located on the 2/3 at the top of the frame • So the 2D Gaussian center is set at: Saliency peak on frames from all videos of the eye-tracker experiment

  20. Saliency Fusion • Several fusion methods for pooling spatio-temporal saliency cues already exists in the literature (without geometric saliency) [1], [2]. • We have tested three fusion methods on wearable video database: • Log sum fusion • Squared sum fusion (only on GTEA) • Multiplicative [1] Sophie Marat, Tien Ho Phuoc, Lionel Granjon, Nathalie Guyader, Denis Pellerin & Anne Guérin-Dugué. Modellingspatio-temporalsaliencyto predict gaze direction for short videos. International Journal of Computer Vision, vol. 82, no. 3, pages 231–243, 2009. Département Images et Signal. [2] H. Boujut, J. Benois-Pineau, T. Ahmed, O. Hadar, and P. Bonnet, "A Metric For No Referencevideoqualityassessment for HD TV Deliverybased on SaliencyMaps," ICME 2011, Workshop on Hot Topics in MultimediaDelivery, Jul. 2011.

  21. Saliency Fusion Frame Spatio-temporal-geometric saliency map Subjective saliency map

  22. Visual attention maps : Subjective Saliency • D. S. Wooding method, 2002 (was tested over 5000 participants) 2D Gaussians (Fovea area = 2° spread) + Subjective saliency map Eye fixations from the eye-tracker

  23. Subjective Saliency

  24. How people – Observers -watch videos from wearable camera? • Psycho-visual experiment • Gaze measure with an Eye-Tracker (Cambridge Research Systems Ltd. HS VET 250Hz) • 31 HD video sequences from IMMED database. • Duration 13’30’’ • 25 subjects (5 discarded) • 6 562 500 gaze positions recorded • We noticed that subject anticipate camera motion

  25. Evaluation on IMMED DB • Normalized Saliency Scanpath (NSS) correlation method Metric • Comparison of: • Baseline spatio-temporal saliency • Spatio-temporal-geometric saliency without camera motion • Spatio-temporal-geometric saliency with camera motion Results: • Up to 50% betterthanspatio-temporalsaliency • Up to 40% betterthanspatio-temporal-geometricsaliencywithout camera motion H. Boujut, J. Benois-Pineau, R. Megret: « Fusion of Multiple Visual Cues for Visual Saliency Extraction fromWearable Camera Settings withStrongMotion ». ECCV Workshops (3) 2012: 436-445

  26. Evaluation on GTEA DB • IADL dataset • 8 videos, duration 24’43’’ • Eye-tracking mesures for actors and observers • Actors (8 subjects) • Observers (31 subjects) 15 subjects have seeneachvideo • SD resolutionvideoat 15 fpsrecordedwitheyetracker glasses A. Fathi, Y. Li, and J. Rehg. Learning to recognizedaily actions using gaze. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, and C. Schmid, editors, Computer Vision ECCV 2012

  27. Evaluation on GTEA DB • “Center bias” : camera on looking glaces, head mouvement compensates • Correlation is database dependent Objective vs. Subjective -viewer

  28. Visual attention maps in actions : Actor vs. Observer vs. GTEA dataset vs. Gaze focused on next action Actor Observer

  29. Viewpoint: Actor vs. Observer (cont.) Time relationships of vision and motor acts [1][2] Tea-making: average relative timings of body movements, object fixation, and object manipulation [1] [1] Land, M. F., Mennie, N., & Rusted, J. (1999). The roles of vision and eye movements in the control of activities of daily living. Perception, 28, 1311–1328. [2] C. Prablanc, J. Echailler, E. Komilis, and M. Jeannerod. Optimal response of eye and hand motor systems in pointing at a visual target. Biol. Cybernetics, 35:113–124, 1979.

  30. Distributions of qualitymetrics

  31. Viewpoint: Actor vs. Observer (cont.) Actor saliency correlation with Viewer saliency (GTEA database / 8 videos at 15 fps / 31 subjects) Wilcoxon test : OK

  32. Conclusion 1 • A time shift isidentified in wearablevideobetweenactor’s and observer’s point –of-viewat the beginning of actions. • The value isaround 500 ms – thisconfirms the findings by M. Land (1999) and C. Prablanc(1978) for elementarygrasping actions. • Fromwearablevideowecan • Confirm the known bio-physicalfact : the actoranticipates the grasping action. He foveates the object –of-interestbefore the action.

  33. Conclusion : actor vs observer(1) • The observer isfocused on an action, hisvisual attention isdelayed in time at the beginning of an action • The visual attention maps of viewers vs observers, NC vs AD, NC vs PD subjectscanbecompared in order to identifyand quantify the deviation. • The NC observersvisual attention maps ( subjective saliency) canbereasonablypredicted by existing signal-basedmodels. Therefore, onlymeasurement of Actor’s Visual attention mapwouldbenecessary.

  34. Objectrecognitionwithpredictedsaliencymaps W Image Matching Local Patch Detection & Description Mask computation BoW computation Image retrieval Spatially constrained approach using saliency methods Visual vocabulary Supervised classifier Object recognition

  35. GTEA Dataset • GTEA [1] isan ego-centric video dataset, containing 7 types of dailyactivitiesperformedby 14 differentsubjects. • The camera ismountedon a capwornbythesubject. • Scenes show 15 objectcategories of interest. • Data splitinto a training set (294 frames) and a test set (300 frames) Categories in GTEA datasetwiththenumber of positives in train/test sets [1]AlirezaFathi, Yin Li, James M. Rehg, Learning to recognize daily actions using gaze, ECCV 2012.

  36. Assessment of Visual Saliency in ObjectRecognition • Wetestedvariousparameters of themodel: • Variousapproachesfor local regionsampling. • Sparsedetectors (SIFT, SURF) • Dense Detectors (grids at differentgranularities). • Differentoptionsforthespatialconstraints: • Withoutmasks: global BoW • Ideal manuallyannotatedmasks. • Saliencymasks: geometric, spatial, fusionschemes… • In twotasks: • Objectretrieval (imagematching): mAP ~ 0.45 • Objectrecognition (learning): mAP ~ 0.91

  37. Object Recognition with Saliency Maps

  38. Object Recognition with Saliency Maps The best: Ideal, Geometric, Squared-with-geometric, Actors’ is low

  39. Conclusion -2 • Proposed automatic / “objective” observers’s saliency maps are as good as subjective visual attention maps of observers in the tasks of automatic object recognition • Time-shifted gaze correlation between actors and observers at the beginning of actions • Perspective : Studyof « normal » and « abnormal » saliencymaps in video for patients withvariousneurodegenerativediseases. • AutomaticPrediction of « normal » saliencymaps

  40. Perspectives • Fusion of multimple media cues : • Video, audio accelerometers, gyroscopes : • ParkinsonSTIC – projet –region soumis, TECSAN soumis, projet UBx1 – acquis • (Problème de reconnaissance du contexte des chutes chez les malades Parkinsoniens) • Visual saliency : Study of « normal » and « abnormal » saliencymaps in video for patients withvariousneurodegenerativediseases. • AutomaticPrediction of « normal » saliencymaps

  41. Acknowledgments • PUPH. Jean-François Dartigues, INSERM • Em. Pr. Dominique Barba, IRCCyN/University of Nantes • Pr. Mark L. Latash, PSU(USA)

More Related