1 / 15

Presented by Ivan Chiou

Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information. Presented by Ivan Chiou. Author. Duc-Tien Dang-Nguyen, Giulia Boato , Alessandro Moschitti , Francesco G.B. De Natale

marlo
Download Presentation

Presented by Ivan Chiou

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supervised Models for Multimodal Image Retrieval based on Visual, Semantic and Geographic Information Presented by Ivan Chiou

  2. Author • Duc-Tien Dang-Nguyen, • Giulia Boato, • Alessandro Moschitti, • Francesco G.B. De Natale • Department to Information and Computer Science –University of Trento – Italy

  3. Abstract • Background: approaching to improve image ranking • Concerned about user annotation, time and location • Propose • To define a novel multimodal similarity measure • Combined visual features, annotated concepts, and geo tagging. • Propose a learning approach based on SVMs(Support Vector Machine).

  4. Introduction • Image-graph based techniques • Vertices represent including visual and semantic information. • Probabilistic models • PLSA(Probabilistic Latent Semantic Analysis) methodology • Visual features • Annotation • GPS coordinates • SVMs, able to learn from the data weight to be assigned. • Random set of image queries • Retrieve a set of images having highest similarity • Judged relevant by human annotators • Train SVMs with examples.

  5. Combining visual, Concept and GPS signals(1/2) • PLSA • User generated multimedia contents • Visual content • Image tagging • Geo location • Producing corresponding topic spaces with reduced dimensions. • Expectation Maximization • Fast on-line retrieval for very large dataset

  6. Combining visual, Concept and GPS signals(2/2) • PLSA – with 100 topics. • Visual feature • SIFT(Scale Invariant Feature Transform) • 128 element descriptor with 2500 salient points. • 2500 salient points (K-Means, training set of 5000 images) • Bag-of-words associating a feature vector with each image. • Image annotation • Consists of all the tags in the dataset, except words used just once or by a single user. • Total number:5500 words • GPS coordination • Calculated as distance between the GPS coordinates of the query and the retrieved images.

  7. Supervised Multimodal Approach(1/2) • Improve retrieval accuracy • Relies on Development Set(DS) • Relevant images • Relevant • Irrelevant • Annotated by users • Proposing SVMs • Two important property • They are robust to overfitting , offering the possibility to trade-off between generalization and empirical error to tune our model to a more general setting. • Include additional features in the parameter vector

  8. Supervised Multimodal Approach(2/2) • SVMs: • Multimodal 2(MM2)

  9. Experimental Result(1/5) • 100.000 images of Paris from Flickr. • 2500 SIFT / 50.000 images. • 5.500 tags / 50.000 images. • Maximum two images per user. • Avoid similar images taken by the same photographer. • 100 query images and retrieved top-ranked 9 images • How to judge it is relevant • Half of 72 annotators to consider the image relevant

  10. Experimental Result(2/5) • Result • 900 retrieved images • VS: 305 relevant images • TS: 218 relevant images • VS+TS: 308 relevant images • MM1: 641 • GPS coordinates. • MM2: accuracy: 72% and MAP of 0.78

  11. Experimental Result(3/5)

  12. Experimental Result(4/5) • Figure 4-8 • Improve the basic model when the tag annotation is not reliable • Improve diversification retrieval result. (reduce the same pictures with night or day, diff perspective, and diff point of view)

  13. Experimental Result(5/5)

  14. Conclusion • Presented a novel way to combine visual information with tags and GPS. • Proposed a supervised machine learning approach (MM2), based on Support Vector Machines. • Result confirm that the approaches improve the accuracy.

  15. BACKUP Presented by Ivan Chiou

More Related