180 likes | 356 Views
Automatic Image Annotation and Retreval using Cross-Media Relevance Models. J.Jeon, V. Lavrenko and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherts. 1096304144 鄭志毅. Introduction(1). What is Image Retrieval?
E N D
Automatic Image Annotation and Retreval using Cross-Media Relevance Models J.Jeon, V. Lavrenko and R. Manmatha Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts Amherts 1096304144 鄭志毅
Introduction(1) • What is Image Retrieval? Given a database of images and a query string (e.g. words), what are the images that are described by the words? Query String: “jet”
Introduction(2) • query by example QBIC(IBM) , PhotoBook (MIT) ,VisualSEEK(UBC)
Introduction(3) • What is Image Annotation(1)?( Object recognition) • each region have a word to describe
Introduction(4) • Given an image, what are the words that describe the image(2)(use a set of word to annotation image)
Outline • Preprocessing • Cross-Media Relevance Model • Experiment • Conclusions
Preprocessing(1)_segment Normalized cuts segmentation b1= vector of image features b2= vector of image features local based(region) Grid segmentation xi = vector of image features x = {x1, x2, …} wi = one word w = {w1, w2, …} = vector of feature vectors = vector of words global based(grid)
Preprocessing(2)_feature extraction • extract each region features all 30 features[22]: area x, y, boundary_len^2/area, convexity, moment-of-inertia (6) color moment: ave RGB (3) (mean) RGB stdev (3) (standard deviation) ave L*a*b (3) (mean) lab stdev (3) (standard deviation) texture : oriented energy, 30 degree increments (12) 30 features blobs
Preprocessing(3)_Clustering to blob • use k-means to cluster each region features(k=500) • get a cluster maps ,and each cluster call “blob” in the maps Blobs Segments k=500 … …
Preprocessing(4)_final • each image I ={b1,b2,b3,…..} (non annotation image) • each image have one or five keyword in training set ,J={b1,b2,b3…bm; w1, w2, …wn};wn is Tf (term frequency)
R Cross Media Relevance Models • Estimating Relevance Model – the joint distribution of words and blobs • Find probability of observing word w and image region bi P(w,b1,…,bm) together(information retrieval, language models:elevance model) • To annotate image with blobs • Grass, tiger, water, road • P(w|b1,b2,b3,b4) • If top three probabilities are for words • grass, water, tiger. • Then annotate image with grass, water, tiger Tiger Water Grass
Relevance Models • Annotation • Joint distribution computed as an expectation over the training set J • Given J, the events are independent
Image Annotation • Compute P(w|I) for different w • Probabilistic Annotation: • Annotate the image with every possible w in the vocabulary with associated probabilities. • take the top (3 or 4) words for every image and annotate images with them.
Image Retrieval • Language Modeling Approach: • Given a Query Q, the probability of drawing Q from image I is • Or using the probabilistic annotation. • Rank images according to this probability.
Experiment Dataset • [22] 5,000 images from 50 Corel Stock Photo cds (4500 tringing set,500 test set) • Segmentation using normalized cuts followed by quantization ensures that there are 1-10 blobs for each image. • Each image was also assigned 1-5 keywords. • 371 words and 500 blobs
for a single word(compare other two models for image annotation) Nc is the number of correctly predicted test images N is the number of all test image predicted by the word Nr is the number of test images actually annotated by the word • precision = recall = Comparison of 3 models: The graph shows mean precisions and recall for 3 models for 70 queries (one word queries)
Annotation examples - CMRM • Retrieval examples – Top 4 images, CMRM Query : Tiger Query : Pillar
Conclusions • large amounts of labeled training and test data • better feature extraction or the use of continuous features will probably improve the results