1 / 17

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

How Many Words is a Picture Worth? Automatic Caption Generation for News Images. Yansong Feng and Mirella Lapata. Ashish Bagate. What this paper is about. Explore the feasibility of automatic caption generation for images in news domain

Download Presentation

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. How Many Words is a Picture Worth?Automatic Caption Generation for News Images YansongFeng and MirellaLapata AshishBagate

  2. What this paper is about • Explore the feasibility of automatic caption generation for images in news domain • Why particularly news domain – training data is available easily and abundantly

  3. Why • Lots of digital images available on the Web • Improved searching • Analysis of the image • Keywords only searches are ambiguous • Targeted queries using longer search strings • Web accessibility

  4. General Approach • Two step process • Analyze the image and build a representation for the same • Run the text generation engine on the image representation, and come up with a natural language representation

  5. Related Work • Hede et al. – not practical because of controlled data set and also manual database creation • Yao et al. – based on just the image • Elzer et al. – what the graphic depicts, little emphasis on graphics generation • These methods use some background information /terminologies

  6. Problem Formulation • For the given image I and the document D, generate a caption C • Training data contains document – image – caption tuples • Caption generation is a difficult task even for humans • A good caption must be succinct, informative, clearly identify the subject of the picture, draw reader to the article

  7. Overview of the method • Similar to Headline generation task • Get the training data (it would be noisy) • Follows two stage approach • Get the keywords from the image (image annotation model) • Generate the caption from the given image words • Use of image features for faithful and meaningful description for the images

  8. Image Annotation • Probabilistic model – well suited for noisy data • Calculate SIFT descriptors of images • Visual words by K means clustering • Get the keywords by LDA • dmix - bag of words representing image – document – caption

  9. Extractive Caption Generation • Not much linguistic analysis is needed • Caption would be a sentence from the document which is maximally similar to description keywords

  10. Types of Similarities • Word Overlap • Cosine Similarity • Probabilistic Similarity • KL divergence – similarity between an image and a sentence is measured by the extent to which they share the same topic distributions

  11. Issues with Extractive Caption Generation • No single sentence can represent the image • Selected caption sentences might be longer than the average length of the sentence • May not be catchy

  12. Abstractive Caption Generation • Word based model • Adapted from headline generation • Caption = the sequence of words that maximizes P

  13. Abstractive Caption Generation • Phrase based model • Caption = the sequence of words that maximizes P

  14. Evaluation…

  15. Evaluation…

  16. Evaluation

  17. Thanks!

More Related