1 / 12

Unsupervised Learning of Visual Sense Models for Polysemous Words

Unsupervised Learning of Visual Sense Models for Polysemous Words. Kate Saenko Trevor Darrell Deepak. Polysemy. Ambiguity of an individual word or phrase that can be used in different contexts to express two or more meanings Eg Present: right now Present: a gift

dacia
Download Presentation

Unsupervised Learning of Visual Sense Models for Polysemous Words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Learning of Visual Sense Models for Polysemous Words Kate Saenko Trevor Darrell Deepak

  2. Polysemy • Ambiguity of an individual word or phrase that can be used in different contexts to express two or more meanings • Eg • Present: right now • Present: a gift • Visual polysemy refers to different meanings which are visually distinct

  3. Eg: Mouse

  4. Unsupervised learning of object classifiers suffers because of Polysemy • Existing approaches try to filter out unrelated images either through bootstrapping object classifier or clustering the images into coherent components. • But they don’t take into consideration polysemy of words • This paper proposes a unsupervised method which takes into account the word sense.

  5. Idea behind the paper • Input: List of words and their dictionary meaning • Learn a text model of the word sense • Use that model to retrieve images of specific sense • Use these re-ranked images as training data for an object classifier • This classifier can predict the correct sense off the word related to an image

  6. Model • Three main steps: • Discovering latent dimensions • Learning probabilistic models of dictionary sense • Using above sense models to construct sense-specific image classifiers

  7. Latent Text Space • Bag-of-words approach using words near the image link • Bag of words approach used because this text will not follow any grammar and hence tags like POS can’t be used • Uses LDA to discover hidden topics in the data

  8. Create a dataset of text-only webpages returned from regular web search. • LDA model is learnt on this dataset • This is done because using words directly surrounding the images causes problem of overfitting.

  9. Dictionary Sense Model • Relate dictionary sense to topics formed in the previous step. • Dictionary sense obtained from WordNet • From the sense and the topics the likelihood of a particular sense given any topic is computed (P(s|z=j)) • Using this the probability of a particular sense in a document is computed (P(s|d)) • This gives us the sense for each image • From this images can be grouped according to sense

  10. Visual Sense Model • Uses sense model obtained in the above two step to generate training data for an image-based classifier • Use discriminative classifier (SVM) • Re-ranks the images according the probability of that sense • From the re-ranked images selects N highest ranked examples as positive training for SVM

  11. Datasets • Three datasets used: • Images for bass, face, mouse, speaker and watch from Yahoo Image Search • Returned images annotated as either unrelated, partial or good • Second dataset was collected using sense specific search terms • Third dataset was text-only dataset collected using regular web search

  12. Features • Words in webpages are taken with html tags removed, tokenized, stop words removed and stemmed • For words related to images words surrounding the image link is extracted • Window size of 100 words is considered for the above step. • Image features are obtained by first resizing the images to 300 pixels converting it to grayscale and then extracting edge features and scale invariant salient points

More Related