240 likes | 358 Views
IPAL at ImageClef 2007 Mixing Features, Models and Knowledge. Sheng Gao. IPAL French-Singaporean Joint Lab Institute for Infocomm Research Singapore. IPAL at ImageClef’07. Team members Jean-Pierre Chevallet, IPAL & CNRS, France (photo & medical)
E N D
IPAL at ImageClef 2007Mixing Features, Models and Knowledge Sheng Gao IPAL French-Singaporean Joint Lab Institute for Infocomm Research Singapore
IPAL at ImageClef’07 • Team members • Jean-Pierre Chevallet, IPAL & CNRS, France (photo & medical) • Thi Hoang Diem Le, I2R, Singapore (photo & medical) • Trong Ton Pham, I2R, Singapore (photo) • Joo Hwee Lim, I2R, Singapore (photo)
Outline • Ad-hoc photographic image retrieval (ImageCLEFphoto) • Content based image retrieval (CBIR) system • Text based information retrieval (TBIR) system • Mix-modality retrieval system • Benchmark results • Summary • Ad-hoc medical image retrieval (ImageCLEFmed) • UMLS based medical retrieval system • Benchmark results • Summary
ImageCLEFphoto’07 vs.ImageCLEFphoto’06 • Less text information is available in ImageCLEFphoto’07. • Image text annotations: notes are excluded this year. • Query: only annotation in the title field are used. • Visual information plays more important role in ImageCLEFphoto’07 than ImageCLEFphoto’06. • Query image samples are excluded from the image database. • 244 new images are added in the image database.
01/1311 01/1310 Similar text annotation, different visual representation • Visual evidence plays a critical role in the case. Title: Accommodation Huanchaco - Exterior View Title: Accommodation Huanchaco - Interior View
Low-level visual features • COR: Auto color correlogram, 324-dimension. • HSV: 166-dimensional histogram in HSV(162-dimension,18*3*3) and gray image (4-dimension). • GABOR: 48-dimension including means and variances at 2-scale and 12-orientations in 5x5 grids. • SIFT: 128-dimensional appearance feature. • EDGE: Canny edge histogram, 80-dimension (6-orientations * 16 patches). • HSV_UNI: 96-dimensional HSV histogram, 32-bin per channel. • GABOR_global: 60-dimension including means and variances at 5-scale and 6-orientations in whole image.
Indexing and similarity measure • Indexing • tf-idf: index histogram, each bin treated as a word. • SVD: indexing at the eigen-space (80% eigenvectors are kept). • ISM: Integrated Statistic Model (ISM) based supervised learning is used to learn ranking function (refer to S. Gao, et al., ACM Multimedia’07). • HME: Hidden Maximum Entropy (HME) based supervised learning is used for ranking function (refer to S. Gao, et al. ICME’07 and ICIP’07). • WORD_SVD: index at intermediate concepts • 200 frequent keywords are intermediate concepts. • HME models are trained for 200 concepts. • Indexing image at 200-dim concept space. • WORD_ISM: ISM is used for ranking function at 200-dim feature. • BoV: index with the bag-of-visterm (1000 visterms).
Indexing and similarity measure • Similarity measure • Cosine distance if the image is indexed by one feature vector, e.g. tf-idf, SVD, etc. • Likelihood ratio between the positive and negative models, if supervised learning is used, e.g. ISM, HME.
Visual feature Indexing and ranking Fusion Fusion Tf-idf COR cor SVD HSV Fusion system hsv ISM HME GABOR gabor WORD_SVD SIFT sift WORD_ISM CBIR system • Example fusion structure
TBIR system • Process XML annotation files XML reader Remove stop words Stemming Lexicon
Estimate LM Lexicon TBIR system • Language model based IR: a document-dependent language model is estimated for each document in the database. P(w1|D) P(w2|D) …… P(wn|D) • Ranking according to the probability of the query, Q, is generated by the document-dependent LM. Q: q1,q2,……,qm
Term-document matrix Lexicon TBIR system Eigen- space • Latent Semantic Indexing (LSI) based IR SVD Eigen- space Index in eigen-space • Cosine distance is used for similarity measure in LSI space.
TBIR system • Access Wikipedia to extract external knowledge for query / document expansion. • 4,881,983 pages are downloaded (Wikipedia in English, April 2, 2007 ). • 23,399 animal terms are extracted, which are useful for queries 5, 20 and 35. • 709 geographical terms are extracted, of which only mountain should be useful for queries 4 and 44.
w 1-w Mix-modality system • Linear combination. CBIR Mix-system TBIR • Cross-modality pseudo-relevance feedback (PRF) Query words Query expansion TBIR Top N image document CBIR One scheme: CBIR to boost TBIR
Results • 27 runs are submitted including CBIR, TBIR and mix-modality runs. • Best run MAP: 0.2833 • 6th place among 476 runs; 2nd place among automatic runs. • IPAL_04V_12RUNS WEIGHT: combine 12 CBIR runs using the empirically tuned weights. • IPAL_11TrV_LM_12RUNSVISUAL: LM-based TBIR plus the PRF. • CBIR best run MAP: 0.1204 without PRF, 4th place among CBIR. • 1st run from INAOE (MAP: 0.1925) and the 2nd run from XRCE (MAP: 0.1890)
Results • Our TBIR best run MAP: 0.1806 with automatic feedback, 7th place among TBIR. • 19TiV_WTmM S0M2D0.8C6T6: with a very small thesaurus manually extracted from Wikipedia. It is terms that are not from info boxes but are relevant to this collection. • Using a black and white image detector based on HSV value of image. • Run15: LM, document expansion with automatic Wikipedia. • Top TBIR runs’ MAP: 0.2020 without feedback (Budapest) and 0.2075 with feedback (XRCE).
PRF analysis • CBIR system • Feedback from the TBIR has few effect. • MAP of the HSV-based CBIR (run 02) is only increased to 0.0693 from 0.0684 (run 01). • Combining 12 CBIR PRF runs, MAP is increased to 0.1358 (run 05) from 0.1204 (run 04). • TBIR system • Feedback from CBIR significantly improve MAP. • MAP of the LM-based TBIR is 0.1377 (run 08). With PRF (run 04), MAP reaches 0.2442 (run 11).
Summary on ImageCLEFphoto • Combing rich visual content representations and indexing techniques significantly improve the CBIR system comparing with any individual visual system. • CBIR based pseudo-relevance feedback significantly boost text based search system. • Exploiting external knowledge such as Wikipedia gives an extra bonus, however, it is less effective than expected. Its large size causes confusion due to lake of disambiguation.
Images / texts French German English q TreeTagger c1 c2 cj UMLS XIotamap Metamap cn cm ... Concepts d ImageCLEFmed - Bayesian network based approach • Conceptualization: • Knowledge base: UMLS Metathesaurus (NLM).
q c1 c2 cj cn cm ... d ImageCLEFmed - Bayesian network based approach • Retrieval process • Document Dobserved: P(D)=1
ImageCLEFmed - Bayesian network based approach • Inference via semantic links from document concept nodes to query concept nodes L:Maximum length of UMLS taxonomy l: minimallength of path between 2 concepts pa(c):document concept nodeswhich are parent nodes of c
ImageCLEFmed - Bayesian network based approach • Relevance status value, RSV(q,d) : belief at q
Summary on ImageCLEFmed • Bayesian model approach exploits semantic relationship between documents concepts and query concepts in an unified framework. • It enhances the VSM by using the semantic relatedness between concepts. • Improvements on relationship weighting issue as well as performance of model are our further study.