I know what you did last summer: object-level auto-annotation of holiday snaps

I know what you did last summer: object-level auto-annotation of holiday snaps Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool

Outline • Introduction • Automatic object mining • Scalable object cluster retrieval • Object knowledge from the wisdom of crowds • Object-level auto-annotation • Experiments and Results • Conclusions

Intorduction • Most of photo organization tools allow tagging (labeling) with keywords • Tagging is a tedious process • Automated annotation

Auto annotation step • First step : Build database on large-scale data crawling from community photo collections • Second step : Recognition from database

Step detail • The crawling stage : • Create a large database of object model, each object is represented as a cluster of images (object clusters) • Tell us what the cluster contain (labels, GPS location, related content ) • The retrieval stage : • Consists of a large scale retrieval system which is based on local image feature • Optimize this stage

Step detail (2) • The annotation stage : • Estimates the position of object within image (bounding box) • Annotates with text, location, related content from the database

Resulting method differs • Not general annotation of image with words • The annotation happens at the object level, and include textual labels, related web-sites, GPS location • The annotation of a query image happens within seconds Building Taipei 101

Automatic object mining • Geospatial grid is overlaid over the earth, query Flickr to retrieve geo-tagged photo GPS location

Scalable object cluster retrieval • Visual vocabulary technique : Created by clustering the descriptor vectors of local visual features such as SIFT or SURF • Ranked using TF*IDF • Using RANSAC to estimate a homography between candidateand queryimage • Retain only candidate when the number of inliers exceeds a give threshold

TF*IDF D : candidate document (candidate image) contain set of visual word v : visual words (local feature) df(v) : document frequency of visual word v Note : we want to know which object is present in the query image, so we return a ranked list of object clusters instead of image

Object knowledge from the wisdom of crowds • Database : • Not organized by individual images but by object clusters • We can use partly redundant information to : • Obtain a better understanding of the object appearance • Segment objects • Create more compact inverted indices

Object-specific feature confidence score • Use the feature matches from pair-wise can derive a score for each feature • Only feature which match to many of their counterparts in other image will receive a high score • Many of the photo are taken from varying viewpoint around the object, the background will receive less match

Object-specific feature confidence score f : feature , i : image : set of inlying feature matches for image ij : number of images in the current object cluster o , : parameter set 1 and 1/3 Note : The bounding box is drawn around all feature with confidence higher than

Object-specific feature confidence score

Better indices through object-specific feature sampling • Estimate bounding boxes can help to compact our inverted index of visual word • Removing object clusters taken by a single user

Last step of retrieval stage • Select the best object cluster as a final result • Simple voting with retrieved image for their parent clusters • Normalizing by cluster size is not feasible • Only votes of 5 images per cluster with the highest retrieval scores are counted

Object-level auto-annotation • Consists of two steps : • Bounding box estimation • Labelling • Bounding box estimation • Estimated in the same way for database images • The query image matched to a number of images in the cluster returned at the top • Labelling • Simply copy the information to serve as labels for the query image from object cluster

Experiments • Conducted a large dataset collected from Flickr • Collected a challenging test-set of 674 images from Picasa Web-Albums • Estimated bounding boxes cover on average 52% of each images

Efficiency and Precision of Recognition • : baseline, TF*IDF-ranking on 500K visual vocabulary as it is used in other work • : bounding box features + no single user clusters • : all features + no single user clusters • : 66% random features subset + no single user clusters • : 66% random features subset

67%

Annotation precision • Evaluate how well our system localize bounding boxes by measuring the intersection-over-union(IOU) measure for the ground-truth and hypothesis overlap 76.1%

Results

Conclusions • Presented a full auto-annotation pipeline for holiday snaps • Object-level annotation with bounding box, relevant tags, Wikipedia articles and GPS location

Thanks!!!!

I know what you did last summer: object-level auto-annotation of holiday snaps

I know what you did last summer: object-level auto-annotation of holiday snaps

Presentation Transcript

What’s new in GO?

Scalable Multi-Label Annotation

AP Biology Summer Assignment 2013

Annotation

Annotation

Features

Annotation

2014 Summer Institutes Level 1

HFA Auto-Q2 Series Over / Under Box Fill Systems

The Art of Annotation

CORPUS ANNOTATION

Bacterial genome annotation in the AGC group

Ontology-based Annotation

“Summer Sail” Bank Holiday Monday 27 th May

Genome Annotation

Semantic Web - Multimedia Annotation –

Class-level Design

2013 NOFA CoC Technical Assistance Session

Bacterial genome annotation in the AGC group

COP 3330: Object-Oriented Programming Summer 2007