140 likes | 265 Views
Learning Models for Object Recognition from Natural Language Descriptions. Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316. . Goal Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language?
E N D
Learning Models for Object Recognitionfrom Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316
. Goal • Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language? • Manually collecting and labeling large image sets is difficult • New training set needs to be created for each new category • Finding images for fined grained object categories is tough • Ex- species of plants and animals • But detailed visual descriptions may be readily available
. Outline • Datasets for training and testing • Natural Language Processing methods • Template Filling • Extraction of visual attributes from test images • Score an image against the learnt template models • Results • Observations
. Dataset • Text descriptions associated with ten species of butterflies from the eNature guide to construct the template model • Butterflies, because they have distinctive visual features like wing colors, spots, etc • Images downloaded from google for each of the ten butterfly categories form the testing set Danaus plexippus Heliconius charitonius Heliconius erato Junonia coenia Lycaena phlaeas Nymphalis antiopa Papilio cresphontes Pieris rapae Vanessa atalanta Vanessa cardui
. Natural Language Processing • Goal: Convert unstructured data in descriptions into structured templates Factual but unstructured data in text Information Extraction ……….. …….…. ………..
. Tokenization Part-of-Speech Tagging Custom Transformation Chunking Template Filling Template Filling • Text is tokenized into words • Tokens are tagged with parts of speech (using C&C tagger) • Custom transformations are performed to correct known mistakes • Required because eNature guide tends to suppress some information • Chunks of texts matching pre-defined tag sequence are extracted • Ex- noun phrases (‘wings have blue spots’), adjective phrases (‘wings are black’) • Extracted phrases are filtered through a list of colors, patterns and positions to fill the template slots
Visual Processing Performed based on two attributes of butterflies • Dominant Wing Color • Colored Spots 1) Image Segmentation • Variation in the background can pose challenges during image classification • Hence, the butterfly image was segmented from the background using the ‘star shape’ graph cut approach
2) Spot Detection (Using a spot classifier) • Hand marked butterfly images with no prior class information form the training set for the spot classifier • Candidate regions likely to be spots are extracted by using Difference-of-Gaussians interest point operator • Image descriptors (SIFT features) are extracted around the candidate spot to classify it as a spot or non-spot • 3) Color Modelling • Required to connect color names of dominant wing colors and spot colors in learnt templates to image observations • For each color name ci, probability distribution p(z|ci) was learnt from training butterfly images ,where z is a pixel color observation in the L*a*b* color space
Generative Model Given an input image I the probability of the image given a butterfly category Bi as a product over the spot and wing observations: Spot color name prior Equal priors to all spot colors Dominant color name prior
. Experimental Results Two set of experiments were performed • Performance of human beings in recognizing butterflies from textual descriptions • Because this may be reasonably considered as an upper bound • Performance of the proposed method
Observations • Accuracy of proposed method was comparable to accuracy of non-native English speakers • Accuracy of proposed method was more than 80 percent for four categories • Classification of ‘Heliconius charitonius’ was the toughest for humans and also with the ground-truth and learnt templates • Performance with ground-truth templates was comparable to that with the learnt templates • Errors in templates due to NLP methods did not have much impact