How Many Words is a Picture Worth? Automatic Caption Generation for News Images

How Many Words is a Picture Worth?Automatic Caption Generation for News Images YansongFeng and MirellaLapata AshishBagate

What this paper is about • Explore the feasibility of automatic caption generation for images in news domain • Why particularly news domain – training data is available easily and abundantly

Why • Lots of digital images available on the Web • Improved searching • Analysis of the image • Keywords only searches are ambiguous • Targeted queries using longer search strings • Web accessibility

General Approach • Two step process • Analyze the image and build a representation for the same • Run the text generation engine on the image representation, and come up with a natural language representation

Related Work • Hede et al. – not practical because of controlled data set and also manual database creation • Yao et al. – based on just the image • Elzer et al. – what the graphic depicts, little emphasis on graphics generation • These methods use some background information /terminologies

Problem Formulation • For the given image I and the document D, generate a caption C • Training data contains document – image – caption tuples • Caption generation is a difficult task even for humans • A good caption must be succinct, informative, clearly identify the subject of the picture, draw reader to the article

Overview of the method • Similar to Headline generation task • Get the training data (it would be noisy) • Follows two stage approach • Get the keywords from the image (image annotation model) • Generate the caption from the given image words • Use of image features for faithful and meaningful description for the images

Image Annotation • Probabilistic model – well suited for noisy data • Calculate SIFT descriptors of images • Visual words by K means clustering • Get the keywords by LDA • dmix - bag of words representing image – document – caption

Extractive Caption Generation • Not much linguistic analysis is needed • Caption would be a sentence from the document which is maximally similar to description keywords

Types of Similarities • Word Overlap • Cosine Similarity • Probabilistic Similarity • KL divergence – similarity between an image and a sentence is measured by the extent to which they share the same topic distributions

Issues with Extractive Caption Generation • No single sentence can represent the image • Selected caption sentences might be longer than the average length of the sentence • May not be catchy

Abstractive Caption Generation • Word based model • Adapted from headline generation • Caption = the sequence of words that maximizes P

Abstractive Caption Generation • Phrase based model • Caption = the sequence of words that maximizes P

Evaluation…

Evaluation

Thanks!

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

How Many Words is a Picture Worth? Automatic Caption Generation for News Images

Presentation Transcript

Electron probe microanalysis EPMA

Thank You for Invitng Us!

Chapter 9: Structured Data Extraction

Teaching to the Next Generation SSS (2007)

Chapter 9: Structured Data Extraction

Automatic Indexing

OBJECTIVES

The News About Newspapers

The News About Newspapers

Part One

News Basics

What are Words Worth? Vocabulary Instruction Worth Its Weight in Gold

Picture: Ondřej Žváček

News Basics

Diction

Picture Prompt: “Do Now”

Static and Dynamic Analysis

Automatic tests generation for infinite state systems based on verification technology

Semantics of words and images

Words 151-250 from the Fry List