550 likes | 787 Views
Scene Classification: Computational and Cognitive Approaches. Hamed Kiani. Introduction. OUTLINE. Background. Scene Classification. Conclusion. Introduction. Scene
E N D
Scene Classification: Computational and Cognitive Approaches Hamed Kiani
Introduction OUTLINE Background Scene Classification Conclusion
Introduction • Scene “A semantically coherent human-scaled view of a real-world environment comprising background elements and multiple discrete objects arranged in a spatially related layout.” [Henderson]
Introduction (cnt.) S.C Outdoor, City, Man-made Outdoor, Mountain, Natural Indoor, Office, Man-made • Scene Classification Problem 2D image Class Label (where?)
Introduction OUTLINE Background • Computational Vision: • Feature Level • Scene Perception Scene Classification Conclusion
Background (Cnt.) • Features (Level of Information): • Low level Features • Contextual Level Features
Background (Cnt.) • Low Level Features • Color: • RGB, LAB, LUV, HSV (HSL), YCrCb and the hue-min-max-difference (HMMD) [Liu et al], • Color-covariance matrix, color histogram, color moments, etc. • Texture: • To describe the content of many natural images: fruit skin, clouds, trees, bricks, and fabric.
Background (Cnt.) • Low Level Features • Edge: • Edge histogram descriptor (EHD), SIFT, HOG • Man-made objects • Shape: • Aspect ratio, circularity, Fourier descriptors, moment invariants, object boundary, etc.
Background (Cnt.) • Contextual Level Features • Context: “Any information that may influence the way a scene and the objects within it are perceived” [Strat]. • Why Contextual level features? “Semantic Gap”
Background (Cnt.) • Feature Extraction: Contextual Level Features • “Semantic Gap” : limited description of primitive image features and the richness of human semantics [Chen et al.]. • How bridge the “Semantic Gap”? By representative high level features using different source of contexts in image.
Background (Cnt.) • Contextual Level Features • Local Context • 2D Scene Gist • Semantic Context
Background (Cnt.) • Contextual Level Features – Local Context • Any context represented by: • Object boundary, • Object shape/contour models, • Code words (bag-of-feature, visual codes)
Background (Cnt.) • Contextual Level Features – 2D Scene Gist • Global statistics of an image to capture the “gist” or “concept frame” [Oliva and Torralba].
Background (Cnt.) • Contextual Level Features – Semantic Context • Event, activity, • Sub concept, • Presence and location (spatial context) of objects, parts and region [Galleguilloset al.].
Background (Cnt.) • Human Scene Perception • How does human brain perceive the real world’s scenes? Object-centered or Scene-centered representation?
Background (Cnt.) • Human Scene Perception – Object centered representation Scene is represented by a set of objects and parts as the atoms (basic elements) [Fergus et al.].
Background (Cnt.) • Human Scene Perception – Object centered representation
Background (Cnt.) • Human Scene Perception – Object centered representation Why not object centered? • Human’ brain realizes scene image very rapidly (70 ms), even in presence of blurring [Biederman]
Background (Cnt.) • Human Scene Perception – Scene centered representation Scene is represented by global information, “schemas”, “gist”, based on the overall spatial organization of objects in early stage of perception: • Low-frequency spatial information • Diagnostic object (Man-made/Indoor images) • Color as key characteristic (Natural images)
Introduction OUTLINE Background Scene Classification • S.C based on Computational Vision • S.C based on Visual Cognition Conclusion
Literature Review (Cnt.) • Scene Classification (S.C) based on Computational Vision • Local Scale Classification • Global Scale Classification • Multimodal Classification Systems
Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification • S.C is performed using features which extracted from sub image elements such as super pixels, blocks, code words (bag-of-features), objects, blobs, parts and regions.
Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Bosch et al. [Bosch et al.]: • Discovering the objects (grass, buildings, roads, etc.) in each image, • Representing by visual words (color, texture, orientation), • Using the distribution of visual words to perform scene classification (probabilistic Latent Semantic Analysis (pLSA).
Literature Review (Cnt.) Bosch et al. [Bosch et al., 2006]:
Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Vogel & Schiele [Vogel and Schiele]: • Dividing images into a grid of 10x10 local blocks, • Classifying each blocks is into one of nine local-concept classes (sky, water, grass, trunks, foliage, field, rocks, flowers, and sand.) • Calculating the occurrence vector of local concepts for each image • Using the occurrence vectors for learning and image categorization.
Literature Review (Cnt.) Vogel & Schiele [Vogel and Schiele, 2007]
Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Carson et al. [Carson et al.]: • Representing the image as a combination of fine to coarse blobs: Blobworld (texture, color) • Classifying based on similarity between training Blobwords and given input query.
Literature Review (Cnt.) • S.C based on Computational Vision: Local scale classification Carson et al. [Carson et al.]
Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification • S.C is performed using features from the global configuration, • Ignoring the details about local concepts, and object information.
Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification Renninger & Malik [Renninger and Malik] • Representing image’ textures as a vocabulary of distinctive patterns • Encoding pattern’s vocabulary by Texton • Constructing global representation of image by frequency histogram of Texton • Classifying an input query by χ2 similarity integrated with k-NN
Literature Review (Cnt.) Renninger & Malik [Renninger and Malik]
Literature Review (Cnt.) • S.C based on Computational Vision: Global scale classification Vailaya et al. [Vailaya et al.] City vs. landscape classification • Representing images globally by a set of salient features based on color (histogram, coherence vectors), texture (moments of the DCT coefficients) and edge (direction histogram and direction coherence vectors) • Classifying an input query using a k-NN classifier on the five low level features
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems • Integrating the evidence presented by multiple sources of information: • Features level, • Sub tasks level, • Classifiers level.
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Boutell and Luo [Boutell and Luo, 2004] Integrating low level feature (color histograms and wavelet (texture) features ) +camera metadata information (exposure time, flash fired, and subject distance) using Bayesian classifier for indoor vs. outdoor scenes classification
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Sub task level: Integration of a set of computational vision tasks such as occlusion reasoning, surface orientation estimation, object recognition, segmentation and scene categorization.
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Heitz et al. [Heitz at el] Cascaded Classification Models (CCMs): integrate scene classification, object detection, multi-class segmentation, and 3D reconstruction to improve performance on some or all tasks.
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Heitz et al. [Heitz at el]
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Li et al. [Li et al.] Feedback Enabled Cascaded Classification Models (FE-CCM) Integrating scene classification, depth estimation, event categorization and saliency detection, object detection and geometric labeling.
Literature Review (Cnt.) • S.C based on Computational Vision: Multimodal systems Li et al. [Li et al.] (FE-CCM)
Literature Review (Cnt.) • Scene Classification based on Visual Cognition: • How does human perceive surrounding scenes? • Which are relevant scene categories for humans? • Which image features are possibly evaluated by humans? • How can weak and strong objects affect the accuracy of scene perception? • How the visual cognition can be modeled for scene classification?
Literature Review (Cnt.) • Scene Classification based on Visual Cognition: Determining semantic categories of photographs [Rogowitz et al.]: • database of 97 images • categorize images along two main axes: man-made vs. natural and human vs. non-human, 4 main categories and 20 subcategories • Color/edge/boundary/lines role in classification: natural vs. man-made
Literature Review (Cnt.) • Scene Classification based on Visual Cognition: which image features generally used by human to perform scene recognition [McCotter et al.] • experiments on eight scene categories (highway, street,…) • phase spectra of the scene category • category-specific diagnostic regions in the phase spectra
Introduction OUTLINE Background Scene Classification Summary • Summary/ Possible Research Topics • Conclusion
Summary and Conclusion (Cont.) • Summary
Summary and Conclusion (Cnt.) • Possible Research Topics Cognitive Scene Classification • Integrating visual cognition findings with computational vision/machine learning techniques, • Providing a platform to model scene, space, context and relation inspired by human scene perception among different element of a scene.
Summary and Conclusion (Cnt.) • Possible Research Topics Multimodal Scene Classification • Integrating different source of knowledge provided by different types of features, classifiers and vision tasks • Overcoming some of the limitations caused by uni-modal classification systems.
Summary and Conclusion (Cnt.) • Possible Research Topics Contextual Scene Classification • Modeling more discriminative and meaningful representation of concept • Proposing a comprehensive model of scenes/concepts/objects/parts/regions to convey different source of context extracted from image.
Summary and Conclusion • Scene classification • Bridging “semantic gap”, from low level to contextual level features • Classification scale (local vs. global) • Computational vision vs. Cognition based scene classification
References [Henderson and Hollingworth, 1999b]. “High-level scene perception”. Annual Review of Psychology, vol. 50, pp. 243-271, 1999. [Ballard and Brown, 1982] D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, Englewood Cliffs, NJ, 1982. Liu et al., 2004a. [Strat, 1993] T. M. Strat, “Employing contextual information in computer vision”, In Proc. of ARPA Image Understanding Workshop, 1993. Chen et al., 2003. [Oliva and Torralba, 2001] A. Oliva and A. Torralba, “Modeling the shape of the scene: A holistic representation of the spatial envelope”, International Journal of Computer Vision, vol. 42, pp. 145-175, 2001. [Galleguillos et al., 2008] C. Galleguillos, A. Rabinovich, and S. Belongie, “Object categorization using co-occurrence, location and appearance”. In Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2008. Paek and Chang, 2000.