400 likes | 538 Views
I know what you did last summer: object-level auto-annotation of holiday snaps. Stephan Gammeter , Lukas Bossard , Till Quack, Luc Van Gool. Outline. Introduction Automatic object mining Scalable object cluster retrieval Object knowledge from the wisdom of crowds
E N D
I know what you did last summer: object-level auto-annotation of holiday snaps Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Intorduction • Most of photo organization tools allow tagging (labeling) with keywords • Tagging is a tedious process • Automated annotation
Auto annotation step • First step : Build database on large-scale data crawling from community photo collections • Second step : Recognition from database
Step detail • The crawling stage : • Create a large database of object model, each object is represented as a cluster of images (object clusters) • Tell us what the cluster contain (labels, GPS location, related content ) • The retrieval stage : • Consists of a large scale retrieval system which is based on local image feature • Optimize this stage
Step detail (2) • The annotation stage : • Estimates the position of object within image (bounding box) • Annotates with text, location, related content from the database
Resulting method differs • Not general annotation of image with words • The annotation happens at the object level, and include textual labels, related web-sites, GPS location • The annotation of a query image happens within seconds Building Taipei 101
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Automatic object mining • Geospatial grid is overlaid over the earth, query Flickr to retrieve geo-tagged photo GPS location
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Scalable object cluster retrieval • Visual vocabulary technique : Created by clustering the descriptor vectors of local visual features such as SIFT or SURF • Ranked using TF*IDF • Using RANSAC to estimate a homography between candidateand queryimage • Retain only candidate when the number of inliers exceeds a give threshold
TF*IDF D : candidate document (candidate image) contain set of visual word v : visual words (local feature) df(v) : document frequency of visual word v Note : we want to know which object is present in the query image, so we return a ranked list of object clusters instead of image
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Object knowledge from the wisdom of crowds • Database : • Not organized by individual images but by object clusters • We can use partly redundant information to : • Obtain a better understanding of the object appearance • Segment objects • Create more compact inverted indices
Object-specific feature confidence score • Use the feature matches from pair-wise can derive a score for each feature • Only feature which match to many of their counterparts in other image will receive a high score • Many of the photo are taken from varying viewpoint around the object, the background will receive less match
Object-specific feature confidence score f : feature , i : image : set of inlying feature matches for image ij : number of images in the current object cluster o , : parameter set 1 and 1/3 Note : The bounding box is drawn around all feature with confidence higher than
Better indices through object-specific feature sampling • Estimate bounding boxes can help to compact our inverted index of visual word • Removing object clusters taken by a single user
Last step of retrieval stage • Select the best object cluster as a final result • Simple voting with retrieved image for their parent clusters • Normalizing by cluster size is not feasible • Only votes of 5 images per cluster with the highest retrieval scores are counted
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Object-level auto-annotation • Consists of two steps : • Bounding box estimation • Labelling • Bounding box estimation • Estimated in the same way for database images • The query image matched to a number of images in the cluster returned at the top • Labelling • Simply copy the information to serve as labels for the query image from object cluster
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Experiments • Conducted a large dataset collected from Flickr • Collected a challenging test-set of 674 images from Picasa Web-Albums • Estimated bounding boxes cover on average 52% of each images
Efficiency and Precision of Recognition • : baseline, TF*IDF-ranking on 500K visual vocabulary as it is used in other work • : bounding box features + no single user clusters • : all features + no single user clusters • : 66% random features subset + no single user clusters • : 66% random features subset
Annotation precision • Evaluate how well our system localize bounding boxes by measuring the intersection-over-union(IOU) measure for the ground-truth and hypothesis overlap 76.1%
Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions
Conclusions • Presented a full auto-annotation pipeline for holiday snaps • Object-level annotation with bounding box, relevant tags, Wikipedia articles and GPS location