150 likes | 239 Views
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information. Presented by Ivan Chiou. Author. Duc-Tien Dang-Nguyen, Giulia Boato , Alessandro Moschitti , Francesco G.B. De Natale
E N D
Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Presented by Ivan Chiou
Author • Duc-Tien Dang-Nguyen, • Giulia Boato, • Alessandro Moschitti, • Francesco G.B. De Natale • Department to Information and Computer Science –University of Trento – Italy
Abstract • Background: approaching to improve image ranking • Concerned about user annotation, time and location • Propose • To define a novel multimodal similarity measure • Combined visual features, annotated concepts, and geo tagging. • Propose a learning approach based on SVMs(Support Vector Machine).
Introduction • Image-graph based techniques • Vertices represent including visual and semantic information. • Probabilistic models • PLSA(Probabilistic Latent Semantic Analysis) methodology • Visual features • Annotation • GPS coordinates • SVMs, able to learn from the data weight to be assigned. • Random set of image queries • Retrieve a set of images having highest similarity • Judged relevant by human annotators • Train SVMs with examples.
Combining visual, Concept and GPS signals(1/2) • PLSA • User generated multimedia contents • Visual content • Image tagging • Geo location • Producing corresponding topic spaces with reduced dimensions. • Expectation Maximization • Fast on-line retrieval for very large dataset
Combining visual, Concept and GPS signals(2/2) • PLSA – with 100 topics. • Visual feature • SIFT(Scale Invariant Feature Transform) • 128 element descriptor with 2500 salient points. • 2500 salient points (K-Means, training set of 5000 images) • Bag-of-words associating a feature vector with each image. • Image annotation • Consists of all the tags in the dataset, except words used just once or by a single user. • Total number:5500 words • GPS coordination • Calculated as distance between the GPS coordinates of the query and the retrieved images.
Supervised Multimodal Approach(1/2) • Improve retrieval accuracy • Relies on Development Set(DS) • Relevant images • Relevant • Irrelevant • Annotated by users • Proposing SVMs • Two important property • They are robust to overfitting , offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting. • Include additional features in the parameter vector
Supervised Multimodal Approach(2/2) • SVMs: • Multimodal 2(MM2)
Experimental Result(1/5) • 100.000 images of Paris from Flickr. • 2500 SIFT / 50.000 images. • 5.500 tags / 50.000 images. • Maximum two images per user. • Avoid similar images taken by the same photographer. • 100 query images and retrieved top-ranked 9 images • How to judge it is relevant • Half of 72 annotators to consider the image relevant
Experimental Result(2/5) • Result • 900 retrieved images • VS: 305 relevant images • TS: 218 relevant images • VS+TS: 308 relevant images • MM1: 641 • GPS coordinates. • MM2: accuracy: 72% and MAP of 0.78
Experimental Result(4/5) • Figure 4-8 • Improve the basic model when the tag annotation is not reliable • Improve diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)
Conclusion • Presented a novel way to combine visual information with tags and GPS. • Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. • Result confirm that the approaches improve the accuracy.
BACKUP Presented by Ivan Chiou