Scene Classification: Computational and Cognitive Approaches

Scene Classification: Computational and Cognitive Approaches Hamed Kiani

Introduction OUTLINE Background Scene Classification Conclusion

Introduction • Scene “A semantically coherent human-scaled view of a real-world environment comprising background elements and multiple discrete objects arranged in a spatially related layout.” [Henderson]

Introduction (cnt.) S.C Outdoor, City, Man-made Outdoor, Mountain, Natural Indoor, Office, Man-made • Scene Classification Problem 2D image Class Label (where?)

Introduction (cnt.)

Introduction OUTLINE Background • Computational Vision: • Feature Level • Scene Perception Scene Classification Conclusion

Background (Cnt.) • Features (Level of Information): • Low level Features • Contextual Level Features

Background (Cnt.) • Low Level Features • Color: • RGB, LAB, LUV, HSV (HSL), YCrCb and the hue-min-max-difference (HMMD) [Liu et al], • Color-covariance matrix, color histogram, color moments, etc. • Texture: • To describe the content of many natural images: fruit skin, clouds, trees, bricks, and fabric.

Background (Cnt.) • Low Level Features • Edge: • Edge histogram descriptor (EHD), SIFT, HOG • Man-made objects • Shape: • Aspect ratio, circularity, Fourier descriptors, moment invariants, object boundary, etc.

Background (Cnt.) • Contextual Level Features • Context: “Any information that may influence the way a scene and the objects within it are perceived” [Strat]. • Why Contextual level features? “Semantic Gap”

Background (Cnt.) • Feature Extraction: Contextual Level Features • “Semantic Gap” : limited description of primitive image features and the richness of human semantics [Chen et al.]. • How bridge the “Semantic Gap”? By representative high level features using different source of contexts in image.

Background (Cnt.) • Contextual Level Features • Local Context • 2D Scene Gist • Semantic Context

Background (Cnt.) • Contextual Level Features – Local Context • Any context represented by: • Object boundary, • Object shape/contour models, • Code words (bag-of-feature, visual codes)

Background (Cnt.) • Contextual Level Features – 2D Scene Gist • Global statistics of an image to capture the “gist” or “concept frame” [Oliva and Torralba].

Background (Cnt.) • Contextual Level Features – Semantic Context • Event, activity, • Sub concept, • Presence and location (spatial context) of objects, parts and region [Galleguilloset al.].

Background (Cnt.) • Human Scene Perception • How does human brain perceive the real world’s scenes? Object-centered or Scene-centered representation?

Background (Cnt.) • Human Scene Perception – Object centered representation Scene is represented by a set of objects and parts as the atoms (basic elements) [Fergus et al.].

Background (Cnt.) • Human Scene Perception – Object centered representation

Background (Cnt.) • Human Scene Perception – Object centered representation Why not object centered? • Human’ brain realizes scene image very rapidly (70 ms), even in presence of blurring [Biederman]

Background (Cnt.) • Human Scene Perception – Scene centered representation Scene is represented by global information, “schemas”, “gist”, based on the overall spatial organization of objects in early stage of perception: • Low-frequency spatial information • Diagnostic object (Man-made/Indoor images) • Color as key characteristic (Natural images)

Introduction OUTLINE Background Scene Classification • S.C based on Computational Vision • S.C based on Visual Cognition Conclusion

Literature Review (Cnt.) • Scene Classification (S.C) based on Computational Vision • Local Scale Classification • Global Scale Classification • Multimodal Classification Systems

Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification • S.C is performed using features which extracted from sub image elements such as super pixels, blocks, code words (bag-of-features), objects, blobs, parts and regions.

Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Bosch et al. [Bosch et al.]: • Discovering the objects (grass, buildings, roads, etc.) in each image, • Representing by visual words (color, texture, orientation), • Using the distribution of visual words to perform scene classification (probabilistic Latent Semantic Analysis (pLSA).

Literature Review (Cnt.) Bosch et al. [Bosch et al., 2006]:

Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Vogel & Schiele [Vogel and Schiele]: • Dividing images into a grid of 10x10 local blocks, • Classifying each blocks is into one of nine local-concept classes (sky, water, grass, trunks, foliage, field, rocks, flowers, and sand.) • Calculating the occurrence vector of local concepts for each image • Using the occurrence vectors for learning and image categorization.

Literature Review (Cnt.) Vogel & Schiele [Vogel and Schiele, 2007]

Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Carson et al. [Carson et al.]: • Representing the image as a combination of fine to coarse blobs: Blobworld (texture, color) • Classifying based on similarity between training Blobwords and given input query.

Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Carson et al. [Carson et al.]

Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification • S.C is performed using features from the global configuration, • Ignoring the details about local concepts, and object information.

Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification Renninger & Malik [Renninger and Malik] • Representing image’ textures as a vocabulary of distinctive patterns • Encoding pattern’s vocabulary by Texton • Constructing global representation of image by frequency histogram of Texton • Classifying an input query by χ2 similarity integrated with k-NN

Literature Review (Cnt.) Renninger & Malik [Renninger and Malik]

Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification Vailaya et al. [Vailaya et al.] City vs. landscape classification • Representing images globally by a set of salient features based on color (histogram, coherence vectors), texture (moments of the DCT coefficients) and edge (direction histogram and direction coherence vectors) • Classifying an input query using a k-NN classifier on the five low level features

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems • Integrating the evidence presented by multiple sources of information: • Features level, • Sub tasks level, • Classifiers level.

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Boutell and Luo [Boutell and Luo, 2004] Integrating low level feature (color histograms and wavelet (texture) features ) +camera metadata information (exposure time, flash fired, and subject distance) using Bayesian classifier for indoor vs. outdoor scenes classification

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Sub task level: Integration of a set of computational vision tasks such as occlusion reasoning, surface orientation estimation, object recognition, segmentation and scene categorization.

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Heitz et al. [Heitz at el] Cascaded Classification Models (CCMs): integrate scene classification, object detection, multi-class segmentation, and 3D reconstruction to improve performance on some or all tasks.

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Heitz et al. [Heitz at el]

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Li et al. [Li et al.] Feedback Enabled Cascaded Classification Models (FE-CCM) Integrating scene classification, depth estimation, event categorization and saliency detection, object detection and geometric labeling.

Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Li et al. [Li et al.] (FE-CCM)

Literature Review (Cnt.) • Scene Classification based on Visual Cognition: • How does human perceive surrounding scenes? • Which are relevant scene categories for humans? • Which image features are possibly evaluated by humans? • How can weak and strong objects affect the accuracy of scene perception? • How the visual cognition can be modeled for scene classification?

Literature Review (Cnt.) • Scene Classification based on Visual Cognition: Determining semantic categories of photographs [Rogowitz et al.]: • database of 97 images • categorize images along two main axes: man-made vs. natural and human vs. non-human, 4 main categories and 20 subcategories • Color/edge/boundary/lines role in classification: natural vs. man-made

Literature Review (Cnt.) • Scene Classification based on Visual Cognition: which image features generally used by human to perform scene recognition [McCotter et al.] • experiments on eight scene categories (highway, street,…) • phase spectra of the scene category • category-specific diagnostic regions in the phase spectra

Introduction OUTLINE Background Scene Classification Summary • Summary/ Possible Research Topics • Conclusion

Summary and Conclusion (Cont.) • Summary

Summary and Conclusion (Cnt.) • Possible Research Topics Cognitive Scene Classification • Integrating visual cognition findings with computational vision/machine learning techniques, • Providing a platform to model scene, space, context and relation inspired by human scene perception among different element of a scene.

Summary and Conclusion (Cnt.) • Possible Research Topics Multimodal Scene Classification • Integrating different source of knowledge provided by different types of features, classifiers and vision tasks • Overcoming some of the limitations caused by uni-modal classification systems.

Summary and Conclusion (Cnt.) • Possible Research Topics Contextual Scene Classification • Modeling more discriminative and meaningful representation of concept • Proposing a comprehensive model of scenes/concepts/objects/parts/regions to convey different source of context extracted from image.

Summary and Conclusion • Scene classification • Bridging “semantic gap”, from low level to contextual level features • Classification scale (local vs. global) • Computational vision vs. Cognition based scene classification

References [Henderson and Hollingworth, 1999b]. “High-level scene perception”. Annual Review of Psychology, vol. 50, pp. 243-271, 1999. [Ballard and Brown, 1982] D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, 1982. Liu et al., 2004a. [Strat, 1993] T. M. Strat, “Employing contextual information in computer vision”, In Proc. of ARPA Image Understanding Workshop, 1993. Chen et al., 2003. [Oliva and Torralba, 2001] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope”, International Journal of Computer Vision, vol. 42, pp. 145-175, 2001. [Galleguillos et al., 2008] C. Galleguillos, A. Rabinovich, and S. Belongie, “Object categorization using co-occurrence, location and appearance”. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008. Paek and Chang, 2000.

Scene Classification: Computational and Cognitive Approaches

Scene Classification: Computational and Cognitive Approaches

Presentation Transcript

Computational Cognitive Linguistics

Computational Cognitive Modelling

Computational Cognitive Modelling

Computational Cognitive Neuroscience

Computational Cognitive Modelling

Computational Cognitive Modelling

Cognitive Approaches

Computational Cognitive Neuroscience

Cognitive Approaches

Computational Cognitive Neuroscience Lab

Cognitive Computing…. Computational Neuroscience

Computational Cognitive Modelling

Computational Image Classification

Computational Cognitive Neuroscience Lab

COMPUTATIONAL COGNITIVE SCIENCE

Computational Biology and Approaches

Scene Classification

Cognitive-Behavioral Approaches

Computational Auditory Scene Analysis

Computational Auditory Scene Analysis

Cognitive Science Computational modelling