190 likes | 325 Views
Building text features for object image classification. Group 1 : Eddie Sun, Youngbum Kim, Yulong Wang. Which object is presented ?. Why we need text features?. Main idea & Insights. Main idea
E N D
Building text features for object image classification Group 1: Eddie Sun, Youngbum Kim, Yulong Wang
Main idea & Insights • Main idea • Determine which objects are present in an image based on the text that surrounds similar images. • Insights • First, it is often easier to determine the image content using surrounding text than with currently available image features. • Given a large enough dataset, we are bound to find very similar images to an input image, even when matching with simple image features.
Illustration for building text features Internet Images with text Text Features
Framework of the approach Texts of These Similar Images Training Process K Most Similar Images Visual Features: SIFT, Gist, Color, Gradient and Unified of all previous one
Experiment • Dataset • The PASCAL Visual Object Classes Challenge
Experiment • Features • SIFT • Gist • an abstract representation of the scene that spontaneously activates memory representations of scene categories (a city, a mountain, etc.) • Color • Color Features in the RGB space • Gradient • Unified • a concatenation of the above four features
Summary How it works Results
How it works? Return most similar images with their labels Internet images dataset with text • SIFT • Gist • Color • Gradient • Unified Get similar images based on visual features Construct text features from labels Extract visual features Cute, puppy, canine Dog cool dogs, boxer Input Image 1. Training images 2. Test images Visual features Dog Visual Classifier Puppy Dog, pet, animal Text features Text Classifier Learn parameters on training images Merge • Notes • Unified Feature – weighted average of the above 4 features • Text features – normalized histogram of tags counts Fusion Classifier Dog Final Output
Results • Text features are built from visual features. Better visual features -> better text features • Combining visual and text classifiers Visual and text classifiers correct each other • Number of training images Small number of training images -> text classifiers outperform visual classifiers Combine -> always better • Number of Internet images in dataset 200,000 -> 600,000 : Big improvement 600,000 -> 1 million : very small improvement