1 / 83

Semantics of words and images

Semantics of words and images. Presented by Gal Zehavi & Ilan Gendelman. What is semantics?. Semantics – “ a branch of philosophy dealing with the relations between signs and what they refer to (their meaning)” (Webster). Content. 1) Motivation. Better segments aggregation.

Download Presentation

Semantics of words and images

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantics of words and images Presented by Gal Zehavi & Ilan Gendelman

  2. What is semantics? • Semantics – “a branch of philosophy dealing with the relations between signs and what they refer to (their meaning)” (Webster)

  3. Content 1) Motivation

  4. Better segments aggregation • Image reconstruction Motivation • Object recognition waterfall bear fish Motivation

  5. More motivation… • Data mining: Content Based Image Retrieval (CBIR) higher precision / higher recall / quicker search • Applications • Biomedicine (X-ray, Pathology, CT, MRI, …) • Government (radar, aerial, trademark, …) • Commercial (fashion catalog, journalism, …) • Cultural (Museums, art galleries, …) • Education and training • Entertainment, WWW (100 billions?!), … Motivation

  6. Auto annotator Moby Dick Even more motivation… • Auto illustrator Ocean Helicopter Shark Man Bridge Mountain Motivation

  7. Content • Motivation • Introduction to Semantics

  8. Different aspects of semantics… Introduction to Semantics

  9. Specific Objects • The elephant played chess Introduction to Semantics

  10. Family of Objects • The tree stood alone on the hill. • This car is as fast as the wind. Introduction to Semantics

  11. Scenarios • A couple on the beach Easy to imagine (statistically clear) but variety is large… Introduction to Semantics

  12. Semantics from Context The captain was on the bridge. Introduction to Semantics

  13. Abstract Semantics • Vacation • Strength ? • Experience Introduction to Semantics

  14. Content • Motivation • Introduction to Semantics • Difficulties

  15. Difficulties in Text-Image Semantics • Ambiguity corals couple, beach, sunset • Level of Abstraction men, bottle, liquid drivers, car race, champagne celebration, winning, happiness M. Schumacher, Formula 1, Dom-Perignon competitors, race, alcohol Difficulties

  16. Content • Motivation • Introduction to Semantics • Difficulties • Possible Approaches

  17. Possible Approaches • Searching images by text only + use existing text search infrastructure + no specific processing - images must be in textual context - missing real image semantics and features • Query by example + relating to user’s visualisation - missing real image semantics • Semantics through user interaction + refined visualisation + higher level of abstraction - a complex user interface is required • Search by image feature similarity + low complexity - missing real image semantics • Search by segment features Possible approaches

  18. Content • Motivation • Introduction to Semantics • Difficulties • Possible Approaches • Models

  19. Models • Object recognition as machine translation • “Object Recognition as Machine translation: Learning a Lexicon for a Fixed Image Vocabulary” P.Duygulu, K.Barnard, N.Freitas and D.Forsyth (2002) • Learning semantics by hierarchical clustering of words and image regions • "Learning the semantics of words and pictures“ K.Barnard, D.Forsyth (2001) • "Clustering Art” K.Barnard, P.Duygulu, D.Forsyth (2001) Models

  20. Content • Motivation • Introduction to Semantics • Difficulties • Possible Approaches • Models • Model #1 : • Object recognition as machine translation

  21. Object recognition as machine translation • Description: learning a lexicon for a fixed image vocabulary Model #1

  22. sky rock mountain/forest sea sand Our goal is describing a scenery using the lexicon we learned. Model #1

  23. I am a great man אני איש גדול\נהדר How do we do it? • By applying a similar method to a language translator • translating words by using many sentences correspondence, and creating a statistic. Model #1

  24. The blob notion • First we segment the image into regions using min-cut method. Model #1

  25. Assigning Regions to Blobs • How do we assign a region to a blob? Eliminate regions < threshold Define a set of features Discretise each feature distribution Cluster the finite dimension vectors Cluster = Blob Model #1

  26. In the article experiment • We discretise the 33 different features using the k – mean algorithm • the flowing features are included *region color *convexity *standard deviation *first moment *region orientation energy *region size *location Model #1

  27. For applying the method to images we need to discretise the image • Using the Corel data base with 371 words and 4500 images that have an 5 to 10 segment each. ~35000 segments – 500 blobs Model #1

  28. The K-means algorithm • Matching the best k values to a continues histogram • Given a distribution of a n – dimensional vector we can create a cluster Model #1

  29. Input : continues data • Some Data K-means Output : discrete data Model #1 Acknowledgments to Andrew W. Moore, Carnegie Mellon University

  30. Randomly guess k center locations • Each data point “belongs” to one center • Each center finds the centroid of its points • Centroid defines the new center K-means • Iterative algorithm • Choose k (i.e 5) • Iterate… Model #1 Acknowledgments to Andrew W. Moore, Carnegie Mellon University

  31. K-means • An example Model #1 Acknowledgments to Andrew W. Moore, Carnegie Mellon University

  32. Now for every image we have a set of blobs • Match with a set of words • So the translation can begin Model #1

  33. Notation The set of blobs b = (bn1 bn2 ……. bnl) where “l” is the size of the blobs string The set of words w = (wn1 wn2 …. wnm) where “m” is the size of the words string The aliment an= (an1 an2 ……. anm) where “n” is the nth image The eventanj=i means that the jth word in the possible translation translates the ith blob Model #1

  34. w1 w2 w3 w4 …. wm b1 b2 b3 …. bl More about aliment space A(w,b) • For b = (b1 b2 ……. bl) and w =(w1 w2 …. wm) So a = (a1 a2 …. am) is a series taking the values 0 – l representing a discreet function Possible a Model #1

  35. The likelihood function We call the conditional probability p(w|b) the likelihood function Since it gives us the probability distribution that a set of words is the translation given a set of blobs. How do we generate it ? Model #1

  36. String size m P(w2|a2,w1,b,m) P(w3|a3,w2,b,m) P(w1|a1,b,m) Possible translation cloud sky sun P(m|b) P(a1|b,m) P(a3|a2,w2,b,m) P(a2|a1,w1,b,m) Set of Blobs given b b1 b2 b3 lexicon:(book) (chair) (sky) (tree) (sun) (fish) (ship) (ring) (sea) (cloud) … w1 w2 w3 P(w|b,a ) =P(m|b) P(a1|b,m) P(w1|a1,b,m) P(a3|a2,w2,b,m) P(w3|a3,w2,b,m) P(a2|a1,w1,b,m)P(w2|a2,w1,b,m) Model #1

  37. Finally we get, without loss of generality : Problem with this formulation is the enormous number of system parameters Model #1

  38. A more simple model Assumptions : • Disregarding the context of the blobs and the words 2) Assume that the aliment is affected only from the position of the translating word 3) Assuming that translating strings could have any length Model #1

  39. String size m t(w3|ba3)=P(w3|a3,w2,b,m) Possible word translation sun P(m|b)=const w1 w2 w3 P(a3|3, b,m)=P(a3|a2,w2,b,m) Set of Blobs given b b1 b2 b3 lexicon:(book) (chair) (sky) (tree) (sun) (fish) (ship) (ring) (sea) (cloud) … Model #1

  40. Our mathematical goal is to manufacture a probability table that represents the probability for a given blob the distribution of all the possible translations Model #1

  41. Model #1

  42. E-step: defining the expectation of the complete-data log likelihood And computing it Model #1

  43. M-step: maximizing the expectation we computed Taking into consideration the following constraints and We can use the LaGrange multipliers to maximize the likelihood function The obtained lagrngian is And the equations needed to be solved for maximization are : Model #1

  44. Solving this set of equations yields a new set that converges iteratively Model #1

  45. Further refinements • Words may not be predicted with highest probability for any blob. • Assigning NULL words when P(word|blob) > threshold Rerunning process Choosing smaller lexicon Model #1

  46. Indistinguishable words • Visually indistinguishable • Practically indistinguishable • Entangled correspondence • polar – bear mare/foals - horse Rerunning process Clustering similar words Model #1

  47. Experimental Results Settings: • 4500 Corel images, with 4-5 keywords each • 371 words vocabulary • typically 5-10 regions each image • 500 blobs • 33 features for each region Model #1 - Results

  48. Original words Original words precision recall Refitted words Refitted words precision recall Clustered words Clustered words precision recall Annotation Recall / Precision 500 test images Only 80 words predicted Model #1 - Results

  49. Original words Clustered words Null threshold 0.2 Prediction rate Prediction rate Prediction rate Light blue – average # of times a blob predicts the word correctly in the right place Dark blue – total # of time a blob predicts the word, which is one of the image keywords Correspondence 100 test images Model #1 - Results

  50. Some Results • Successful results Model #1 - Results

More Related