310 likes | 450 Views
Inference Network Approach to Image Retrieval. Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst. Motivation. Most image retrieval systems assume: Implicit “AND” between query terms Equal weight to all query terms
E N D
Inference Network Approach to Image Retrieval Don Metzler R. Manmatha Center for Intelligent Information Retrieval University of Massachusetts, Amherst
Motivation • Most image retrieval systems assume: • Implicit “AND” between query terms • Equal weight to all query terms • Query made up of single representation (keywords or image) • “tiger grass” => “find images of tigers AND grass where each is equally important” • How can we search with queries made up of both keywords and images? • How do we perform the following queries? • “swimmers OR jets” • “tiger AND grass, with more emphasis on tigers than grass” • “find me images of birds that are similar to this image”
Related Work • Inference networks • Semantic image retrieval • Kernel methods
Inference Networks • Inference Network Framework [Turtle and Croft ‘89] • Formal information retrieval framework • INQUERY search engine • Allows structured queries • phrases, term weighting, synonyms, etc… • #wsum( 2.0 #phrase ( image retrieval ) 1.0 model ) • Handles multiple document representations (full text, abstracts, etc…) • MIRROR [deVries ‘98] • General multimedia retrieval framework based on inference network framework • Probabilities based on clustering of metadata + feature vectors
Image Retrieval / Annotation • Co-occurrence model [Mori, et al] • Translation model [Duygulu, et al] • Correspondence LDA [Blei and Jordan] • Relevance model-based approaches • Cross-Media Relevance Models (CMRM) [Jeon, et al] • Continuous Relevance Models (CRM) [Lavrenko, et al]
Goals • Input • Set of annotated training images • User’s information need • Terms • Images • “Soft” Boolean operators (AND, OR, NOT) • Weights • Set of test images with no annotations • Output • Ranked list of test images relevant to user’s information need
Data • Corel data set† • 4500 training images (annotated) • 500 test images • 374 word vocabulary • Each image automatically segmented using normalized cuts • Each image represented as set of representation vectors • 36 geometric, color, and texture features • Same features used in similar past work † Available at: http://vision.cs.arizona.edu/kobus/research/data/eccv_2002/
Features • Geometric (6) • area • position (2) • boundary/area • convexity • moment of inertia • Color (18) • avg. RGB x 2 (6) • std. dev. of RGB (3) • avg. L*a*b x 2 (6) • std. dev. of L*a*b (3) • Texture (12) • mean oriented energy, 30 deg. increments (12)
Image representation cat, grass, tiger, water representation vector(real, 1 per image segment) annotation vector(binary, same for each segment)
Image Inference Network J fixed(based on image) • J – representation vectors for image, (continuous, observed) • qw – word w appears in annotation, (binary, hidden) • qr – representation vector r describes image, (binary, hidden) • qop – query operator satisfied (binary, hidden) • I – user’s information need is satisfied, (binary, hidden) “Image Network” qw1 … qwk qr1 … qrk qop1 qop2 dynamic(based on query) “Query Network” I
Example Instantiation tiger grass #and #or
What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I
P(qw | J) [ P( tiger | ) ] • Probability term w appears in annotation given image J • Apply Bayes’ Rule and use non-parametric density estimation • Assumes representation vectors are conditionally independent given term w annotates the image ???
How can we compute P(ri | qw)? area of low likelihood area of high likelihood representation vectors associated with image annotated by w training setrepresentation vectors
P(qw | J) [final form] Σ assumed to be diagonal, estimated from training data
Regularized estimates… • P(qw | J) are good, but not comparable across images • Is the 2nd image really 2x more “cat-like”? • Probabilities are relative per image
Regularized estimates… • Impact Transformations • Used in information retrieval • “Rank is more important than value” [Anh and Moffat] • Idea: • rank each term according to P(qw | J) • give higher probabilities to higher ranked terms • P(qw | J) ≈ 1/rankqw • Zipfian assumption on relevant words • a few words are very relevant • a medium number of words are somewhat relevant • many words are not relevant
What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I
P(qr | J) [ P( | ) ] • Probability representation vector observed given J • Use non-parametric density estimation again • Impose density over J’s representation vectors just as we did in the previous case • Estimates may be poor • Based on small sample (~ 10 representation vectors) • Naïve and simple, yet somewhat effective
What needs to be estimated? J • P(qw | J) • P(qr | J) • P(qop | J) • P(I | J) qw1 … qwk qr1 … qrk qop1 qop2 I
Query Operators • “Soft” Boolean operators • #and / #wand (weighted and) • #or • #not • One node added to query network for each operator present in query • Many others possible • #max, #sum, #wsum • #syn, #odn, #uwn, #phrase, etc…
#or( #and ( tiger grass ) ) tiger grass #and #or
Operator Nodes • Combine probabilities from term and image nodes • Closed forms derived from corresponding link matrices • Allows efficient inference within network Par(q) = Set of q’s parent nodes
foals (0.46) mare (0.33) horses (0.20) field (1.9E-5) grass (4.9E-6) railroad (0.67) train (0.27) smoke (0.04) locomotive (0.01) ruins (1.7E-5) sphinx (0.99) polar (5.0E-3) stone (1.0E-3) bear (9.7E-4) sculpture (6.0E-4)
Future Work • Use rectangular segmentation and improved features • Different probability estimates • Better methods for estimating P(qr | J) • Use CRM to estimate P(qw | J) • Apply to documents with both text and images • Develop a method/testbed for evaluating for more “interesting” queries
Conclusions • General, robust model based on inference network framework • Departure from implied “AND” between query terms • Unique non-parametric method for estimating network probabilities • Pros • Retrieval (inference) is fast • Makes no assumptions about distribution of data • Cons • Estimation of term probabilities is slow • Requires sufficient data to get a good estimate