260 likes | 417 Views
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning. Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie Mellon University. ACM international conference on Multimedia 2004. Introduction. Automatic image annotation
E N D
Effective Automatic Image Annotation ViaA Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie Mellon University ACM international conference onMultimedia 2004
Introduction • Automatic image annotation • learn the correlation between image features and textual words from the examples of annotated images • apply the learned correlation to predicate words for unseen images • Problem • each annotated word for an image is predicated independently from other annotated words for the same image
Introduction • The word-to-word correlation is important particularly when image features are insufficient in determining an appropriate word annotation • sky vs. ocean • if ‘grass’ is very likely to be an annotated word, ‘sky’ will usually be preferred over ‘ocean’
Introduction • They propose a coherent language model for automatic image annotation that takes into account word-to-word correlation • It is able to automatically determine the annotation length for a given image • It can be naturally used for active learning to significantly reduce the required number of annotated image examples
Related work • Machine translation model • Co-occurrence model • Latent space approaches • Graphic models • Classification approaches • Relevance language models
Relevance language model • Idea • first find annotated images that are similar to a test image • use the words shared by the annotations of the similar images to annotate the test image • T: collection of annotated images • Ji={bi,1,bi,2,…,bi,m;wi,1,wi,2,…,wi,n} • Ji: annotated image • bi,j: the number of j-th blob that appears in the i-th image • wi,j: a binary variable indicating whether or not the j-th word appears in the i-th image
Relevance language model • Given an image I={bi,1,bi,2,…,bi,m} • estimate the likelihood for any word to be annotated for I
Coherent language model • Estimate the probability of annotating image I with a set of word {w} (p({w}|I)) • Estimate the probability for using a language model θw to generate annotation words for image I (p(θw|I)) • Θw={p1(θw), p2(θw),…, pn(θw)} • pj(θw)=p(wj=1|θw) • how likely the j-th word will be used for annotation • p(θw|I) ∝ p(I|θw) p(θw)
Coherent language model • Use Expectation-Maximization algorithm to find the optimal solution • E-step • ZI: normalization constant that ensures • M-step • Zw: normalization constant that ensures
Determining Annotation Length • It would be more appropriate to describe annotation words with Bernoulli distributions than multinomial distributions • each word is annotated at most once for an image
Determining Annotation Length • is no longer a constant • a word is used for annotation if and only if the corresponding probability
Active Learning for Automatic Image Annotation • Active learning • selectively sample examples for labeling so that the uncertainty in determining the right model is reduced most significantly • choosing examples that are most informative to a statistical model • For each un-annotated image, they apply the CLMFL (coherent language model with flexible length) model to determine its annotation words and compute its averaged word probability • The un-annotated image with the least averaged word probability is chosen for users to annotate.
Active Learning for Automatic Image Annotation • Select the images that not only are poorly annotated by the current model but also are similar to test images • choose the images that are most similar to the test images from the set of images that the current annotation model cannot produce any annotations
Experiments • Data (Duygulu, et al., 2002) • 5,000 images from 50 Corel Stock Photo CDs • Normalized cut • largest 10 regions are kept for each image • use K-means algorithm to cluster all image regions into 500 different blobs • Each image is annotated with 1 to 5 words, totally 371 distinct words • 4500 images are used as training examples and the rest 500 images are used for testing
Experiments • The quality of automatic image annotation is measured by the performance of retrieving auto-annotated images regarding to single-word queries • Precision • Recall • There are totally 263 distinct words in the annotations of test images • focus on 140 words that appear at least 20 times in the training dataset
Coherent Language Modelvs. Relevance Language Model • Coherent language model is better than relevance language model
Coherent Language Modelvs. Relevance Language Model • Word-to-word correlation has little impact on the very top-ranked words that have been determined by the image features with high confidence • It is much more influential to the words that are not ranked at the very top • For those words, the word-to-word correlation is used to promote the words that are more consistent with the very top-ranked words
Generating Annotationswith Automatically Determined Length • The average length for the generated annotations is about 3 words for each annotation • The CLMFL model performs significantly better than the CLM models when the fixed length is 3 • Two-word queries • 100 most frequent combinations of two words from the annotations of test images and use them as two-word queries
Generating Annotationswith Automatically Determined Length • CLMFL model • the generated annotations are able to reflect the content of images more accurately than the CLM model that uses a fixed annotation length
Generating Annotationswith Automatically Determined Length • In fifth image, ‘water’ does not appear in the annotation that is generated by the CLMFL
Active Learning forAutomatic Image Annotation • First, 1000 annotated images are randomly selected from the training set and used as the initial training examples • Then, the system will iteratively acquire annotations for selected images • For each iteration, at most 20 images from the training pool can be selected for manual annotation • four iterations • at most 80 additional annotated images are acquired • At each iteration, generate annotations for the 500 testing images
Active Learning forAutomatic Image Annotation • Baseline model • randomly selects 20 images for each iteration
Active Learning forAutomatic Image Annotation • The active learning method provides more chance for the annotation model to learn new objects with new words
Conclusion • Coherent language model • takes an advantage of word-to-word correlation • Coherent language model with flexible length • automatically determine the annotation length • Active learning method (based on the CLM model) • effectively reduce the required number of annotated images
Conclusion • Future work • Learning the threshold values from training examples • have thresholds that depend the properties of annotation words • Using different measurements of uncertainty for active learning method