1 / 31

Inference Network Approach to Image Retrieval

Inference Network Approach to Image Retrieval. Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst. Motivation. Most image retrieval systems assume: Implicit “AND” between query terms Equal weight to all query terms

pierce
Download Presentation

Inference Network Approach to Image Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst

  2. Motivation • Most image retrieval systems assume: • Implicit “AND” between query terms • Equal weight to all query terms • Query made up of single representation (keywords or image) • “tiger grass” => “find images of tigers AND grass where each is equally important” • How can we search with queries made up of both keywords and images? • How do we perform the following queries? • “swimmers OR jets” • “tiger AND grass, with more emphasis on tigers than grass” • “find me images of birds that are similar to this image”

  3. Related Work • Inference networks • Semantic image retrieval • Kernel methods

  4. Inference Networks • Inference Network Framework [Turtle and Croft ‘89] • Formal information retrieval framework • INQUERY search engine • Allows structured queries • phrases, term weighting, synonyms, etc… • #wsum( 2.0 #phrase ( image retrieval ) 1.0 model ) • Handles multiple document representations (full text, abstracts, etc…) • MIRROR [deVries ‘98] • General multimedia retrieval framework based on inference network framework • Probabilities based on clustering of metadata + feature vectors

  5. Image Retrieval / Annotation • Co-occurrence model [Mori, et al] • Translation model [Duygulu, et al] • Correspondence LDA [Blei and Jordan] • Relevance model-based approaches • Cross-Media Relevance Models (CMRM) [Jeon, et al] • Continuous Relevance Models (CRM) [Lavrenko, et al]

  6. Goals • Input • Set of annotated training images • User’s information need • Terms • Images • “Soft” Boolean operators (AND, OR, NOT) • Weights • Set of test images with no annotations • Output • Ranked list of test images relevant to user’s information need

  7. Data • Corel data set† • 4500 training images (annotated) • 500 test images • 374 word vocabulary • Each image automatically segmented using normalized cuts • Each image represented as set of representation vectors • 36 geometric, color, and texture features • Same features used in similar past work † Available at: http://vision.cs.arizona.edu/kobus/research/data/eccv_2002/

  8. Features • Geometric (6) • area • position (2) • boundary/area • convexity • moment of inertia • Color (18) • avg. RGB x 2 (6) • std. dev. of RGB (3) • avg. L*a*b x 2 (6) • std. dev. of L*a*b (3) • Texture (12) • mean oriented energy, 30 deg. increments (12)

  9. Image representation cat, grass, tiger, water representation vector(real, 1 per image segment) annotation vector(binary, same for each segment)

  10. Image Inference Network J fixed(based on image) • J – representation vectors for image, (continuous, observed) • qw – word w appears in annotation, (binary, hidden) • qr – representation vector r describes image, (binary, hidden) • qop – query operator satisfied (binary, hidden) • I – user’s information need is satisfied, (binary, hidden) “Image Network” qw1 … qwk qr1 … qrk qop1 qop2 dynamic(based on query) “Query Network” I

  11. Example Instantiation tiger grass #and #or

  12. What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I

  13. P(qw | J) [ P( tiger | ) ] • Probability term w appears in annotation given image J • Apply Bayes’ Rule and use non-parametric density estimation • Assumes representation vectors are conditionally independent given term w annotates the image ???

  14. How can we compute P(ri | qw)? area of low likelihood area of high likelihood representation vectors associated with image annotated by w training setrepresentation vectors

  15. P(qw | J) [final form] Σ assumed to be diagonal, estimated from training data

  16. Regularized estimates… • P(qw | J) are good, but not comparable across images • Is the 2nd image really 2x more “cat-like”? • Probabilities are relative per image

  17. Regularized estimates… • Impact Transformations • Used in information retrieval • “Rank is more important than value” [Anh and Moffat] • Idea: • rank each term according to P(qw | J) • give higher probabilities to higher ranked terms • P(qw | J) ≈ 1/rankqw • Zipfian assumption on relevant words • a few words are very relevant • a medium number of words are somewhat relevant • many words are not relevant

  18. Regularized estimates…

  19. What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I

  20. P(qr | J) [ P( | ) ] • Probability representation vector observed given J • Use non-parametric density estimation again • Impose density over J’s representation vectors just as we did in the previous case • Estimates may be poor • Based on small sample (~ 10 representation vectors) • Naïve and simple, yet somewhat effective

  21. What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I

  22. Query Operators • “Soft” Boolean operators • #and / #wand (weighted and) • #or • #not • One node added to query network for each operator present in query • Many others possible • #max, #sum, #wsum • #syn, #odn, #uwn, #phrase, etc…

  23. #or( #and ( tiger grass ) ) tiger grass #and #or

  24. Operator Nodes • Combine probabilities from term and image nodes • Closed forms derived from corresponding link matrices • Allows efficient inference within network Par(q) = Set of q’s parent nodes

  25. Results - Annotation

  26. foals (0.46) mare (0.33) horses (0.20) field (1.9E-5) grass (4.9E-6) railroad (0.67) train (0.27) smoke (0.04) locomotive (0.01) ruins (1.7E-5) sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear (9.7E-4) sculpture (6.0E-4)

  27. Results - Retrieval

  28. Future Work • Use rectangular segmentation and improved features • Different probability estimates • Better methods for estimating P(qr | J) • Use CRM to estimate P(qw | J) • Apply to documents with both text and images • Develop a method/testbed for evaluating for more “interesting” queries

  29. Conclusions • General, robust model based on inference network framework • Departure from implied “AND” between query terms • Unique non-parametric method for estimating network probabilities • Pros • Retrieval (inference) is fast • Makes no assumptions about distribution of data • Cons • Estimation of term probabilities is slow • Requires sufficient data to get a good estimate

More Related