250 likes | 400 Views
Non-Local Characterization of Scenery Images : Statistics, 3D Reasoning, and a Generative Model. Tamar Avraham and Michael Lindenbaum Technion. Characterization of Scenery Images: Overview. scenery images (LabelMe). manual segmentation and region annotation. Statistical Characterization
E N D
Non-Local Characterization of Scenery Images: Statistics, 3D Reasoning, and a Generative Model Tamar Avraham and Michael Lindenbaum Technion
Characterization of Scenery Images: Overview scenery images (LabelMe) manual segmentation and region annotation • Statistical Characterization • Rough shape of regions • Relative location of regions • Shape of boundaries • 3D Reasoning • Why are background contours horizontal? • A Generative Model • Provides a prior on scenery image annotation • Generates image sketches, capturing the gist of scenery images Given the above segmentation (without texture), which region labeling is more likely ? OR ? ground sky trees mountain mountain sea sand rocks
Property 1 : Horizontalness • Most background objects exceed the image width • Background objects are wide and of low height while foreground objects’ shape tend to be isotropic • background objects : sky, mountain, sea, trees, field, river, sand, ground, grass, land, rocks, plants, snow, plain, valley, bank, fog bank, desert, lake, beach, cliff, floor • foreground objects: all others
sky Property 2: Order / Relative Location desert mountain • The relative top-bottom locations of types of background are often highly predictable trees field valley Topological ordering of background identities can be defined: this DAG is associated with the reachability relation R : {(A,B)| p(A above B) > 0.7} river lake land The probability for a background region of identity A to appear above a background region with identity B, summarized in a histogram for various background identity pairs sea sand plants ground grass rocks
Property 3: Boundary Shape • The characteristics of the upper region’s contour correlate with the region’s identity A sample of contour segments associated with background object classes mountain, sea, and trees Chunks of upper boundaries as 1D signals: Curves associated with sea, grass or field resemble DC signals. Curves associated with trees and plants are high frequency signals. Curves associated with mountains resemble signals with low frequency and high amplitude
3D Reasoning: Why are background regions horizontal? X2 X1 Land regions whose contour tangents in aerial images are uniformly distributed appear with strong horizontal bias in images taken by a photographer standing on the ground X3 Flatland ”Place a penny on the middle of one of your tables in Space ... look down upon it. It will appear a circle....gradually lower your eyes ... and you will find the penny becoming more and more oval to your view....” From Flatland, by Edwin A. Abbott, 1884 p Θ - the set of tangent angles for contours in aerial images (relative to an arbitrary 2D axis on the surface) Θ’ - the set of angles that are the projections of the angles in Θ on the camera’s image plane θΘ θ’ Θ’ A schematic illustration of an aerial image An image is taken by a photographer standing on the ground The distribution Θ’, assuming Θ=U[0,180°), h~2[m], z~U[0,1000[m], x~U[0,500[m]] weed sand p flora lake grass soil
3D Reasoning cont. Ground elevation and slope statistics Two landscape image contour types: 1) The contours between different types of regions on the terrain 2) The contours of mountains associated with occluding boundaries (e.g., skylines)
3D Reasoning cont. Ground elevation and slope statistics The contours between different types of regions on the terrain A point p lies on a boundary between land regions, located on an elevated surface with gradient angle ϕ. The plane is rotated at an angle ω relative to X1axis X2 ω X1 X3 O The distribution Θ’ assuming Θ =U[0,180°), ϕ~slope statistics, ω~[-90°,90°]. H’s distribution was estimated from sampling an elevation map in pair locations up to 9km apart Estimated terrain slope distribution using the IIASA-LUC dataset
3D Reasoning cont. Ground elevation and slope statistics The contours of mountains associated with occluding boundaries Tangents in images bounded by the max-slope-over-land-regions statistics Estimated distribution of the maximum slope over land regions, each covering approx. 9 square kilometers * The paper also discussed the effect of land cover and points out other factors that should be considered in a more complete analysis.
The Generative Model • top-bottom order • region height • A normal distribution for the height covered by each region type upper boundaries modeled by PCA of “1D” signals top sky trees ground sea bottom
The Generative Model: advantages • The generative nature of the model makes it possible to: • Generate image sketches, capturing the gist of scenery images • ECCV10 • Obtain priors for region annotation • more recent work
The Generative Model: Training • Given a set of manually segmented and annotated scenery images: • Top-bottom order: estimate the transaction matrix M counting number of occurrences of the different ‘moves’. • Relative region coverage: estimate mean and variance for the relative average height of each type of region • Upper boundary: for each background region type, collect 64-pixels length chunks. Find the first k principle components and Eigen values so that 95% of the variation in the training set is modeled. ( ) • Possible to train different models for different scenery categories. here we trained together of 3 categories: coast, mountain, open country. top sky trees ground sea bottom
The Generative Model: Generating Sketches • randomly selecting the top-bottom sequence by a random walk on the Markov network , starting at ‘top’, stopping at the sink ‘bottom’. • randomly select the relative average height of each region • randomly generate the boundaries: • For each generate 4 chunks sky mountain mountain trees trees Sky mountain mountain trees Trees
The Generative Model: Generated Results • typical scenery images (LabelMe) manual segmentation and region annotation • semantic sketches of scenery images generated by our model
The Generative Model: (More) Generated Results
Region Classification Q: Can the new cues contribute to region classification/annotation? A: Complimentary to textural & color cues Goal: to show that region classification using global + local descriptors is better than only local descriptors only layout only color&texture sky sky? sea? sky + = mountain mountain? ground? mountain sea? ground? sea sea rocks?plants? rocks rocks
Region Classification - HMM Marginals by the sum-product message passing algorithm Classification by max
Region Classification - Discussion General object annotation and detection using context: G. Csurka and F. Perronnin. An efficient approach to semantic segmentation. IJCV, 2010. C. Desai, D. Ramanan, and C. Fowlkes. Discriminative models for multi-class object layout. ICCV, 2009. C. Galleguillos and S. Belongie. Context based object categorization:A critical survey. Comput. Vis. Image Understand, 2010. X. He, R. S. Zemel, and D. Ray. Learning and incorporating top-down cues in image segmentation. ECCV, 2006. S. Kumar and M. Hebert. A hierarchical field framework for unified context-based classification. ICCV, 2005. A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora,and S. Belongie. Objects in context. ICCV, 2007. J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for image understanding: multi-class object recognition and segmentation by jointly modeling appearance, shape and context. IJCV, 81(1):2–23, 2009. Approximated inference needed (e.g., greedy iterative methods, loopy belief propagation) Background region classification of scenery images: a 1D problem Enables exact inference
Region Classification - Details • Textural & color features: as in Vogel&Scheile IJCV 07: • HSV Color histograms • Edge direction histograms • Gray-level co-occurrences (GCLM, Haralick et al. 73). 4 offsets. For each, contrast, energy, entropy, homogeneity, inverse difference moment, and correlation. • and are each modeled with a multiclass probabilistic SVM (LibSVM, Wu, in, Weng 04), RBF kernel. • 5-fold cross validation at image level. Each training includes parameter selection by inter-training set cross validation. • Dataset of 1144 images (LabelMe: coast, open country, mountains). Regions:
Region Classification – Results 1 Input image ground truth relative location boundary shape color&texture all cues
Region Classification – Results 2 Input image ground truth relative location boundary shape color&texture all cues
Region Classification – Results 3 • Accuracy per class: • Color&texture: higher accuracy for trees, field, rocks, plants, snow • New cues: better for sky, mountain, sea, sand • Other classes performance: very low due to their number. • Discussion • We achieved the goal of showing that the new cues improve texture&color only based region classification. • Many classifications counted as errors are actually correct • Related to recent work on object categorization with huge amount of categories (Deng, Berg, Li, Fei-Fei ECCV10, Fergus, Weiss, Torralba ECCV10) • Work in progress. 19 categories
Summary • Focus of characterization of scenery images • Intuitive observations regarding the statistics of co-occurrence, relative location, and shape of background regions were explicitly quantified and modeled • Some 3D reasoning • Non-local properties can capture the gist of images • Contextual background region classification with exact inferences. • The new cues improve local-descriptors based region classification
Future & General Discussion • A better way to evaluate region classification: work in progress • Use the layout cues for better top-down segmentation (Felzenszwalb&Veksler, CVPR 10). Shape prior to address “shrinking bias” (Vicente, Kolmogorov, Rother, CVPR 08) • Use the layout cues to improve scene categorization • Augment foreground objects into the model. Extend model to other domains. • Use the cues to align pictures. • Generated sketches as a basis for rendering. • Scenery : too simple? • Lets first succeed in understanding those images, following the biological visual system evolution
Thank You For Your Time