350 likes | 363 Views
This project overview highlights the PicHunter Bayesian Image Retrieval System, including its target testing and optimized interaction strategy for relevance feedback. It introduces the concept of hidden annotation and reports on psychophysical studies of the system's performance. The paper concludes with a summary of the system's theory, implementation, and psychophysical experiments.
E N D
Project overview • Target Testing and the PicHunter Bayesian Multimedia Retrieval System, I.J. Cox, Matt Miller, S.M. Omohundo, P.N. Yianilos, Proceedings of the Forum on Research & Technology Advances in Digital Libraries, pp 66-75, 1996. • PicHunter: Bayesian Relevance Feedback for Image Retrieval, I.J. Cox, Matt Miller, Stephen Omohundo, P.N. Yianilos, 13th International Conference on Pattern Recognition, Vol.III, Track C, pp.361-369, August 1996. • Introduces PicHunter, the Bayesian framework, and describes a working system including measured user performance. • Hidden Annotation in Content Based Image Retrieval, I.J. Cox, Joumana Ghosn, Matt Miller, T. Papathomas, P.N. Yianilos, IEEE Workshop on Content-Based Access of Image & Video Libraries, pp.76-81, June 1997 • Introduces the idea of ``hidden annotation'', and reports results demonstrating that it improves performance.
Project overview • An Optimized Interaction Strategy for Bayesian Relevance Feedback, I. J. Cox, M. L. Miller, T. Minka, P. N. Yianilos, IEEE International Conference on Computer Vision and Pattern Recognition - CVPR '98, Santa Barbara, CA, pp. 553-558, 1998. • Introduces an improved stochastic image display strategy allowing the system to ``ask better questions.'' • Psychophysical Studies of the Performance of an Image Database Retrieval System, T. Papathomas, T. Conway, I. Cox, J. Ghosn, M. Miller, T. Minka, P. Yianilos, Proceedings of the Human Vision & Electric Imaging III, San Jose, CA Vol 3299, pp. 591-602, January 1998 • Describes Psychophysical studies of the system in a controlled environment.
Project summary • The Bayesian Image Retrieval System, PicHunter: Theorgy, Implementation and Psychophysical Experiements, I. J. Cox, M. L. Miller, T. P. Minka, T. V. Papathomas, P. N. Yianilos, IEEE Transactions on Image Processing, 9, 1, 20-37, (2000)
Introduction • A search consists of • Query • Repeated relevance feedback • To date, emphasis on query phase • better representations, relevance feedback crude or non-existent • Lack of quantitative measures for comparing performance of search algorithms
The main ideas • Bayesian relevance feedback • Learn from human interactions • Model the user's actions, not his/her query • Quantifiable testing • Target testing • Baseline testing • Optimize the image display
Target testing • The user is shown an image from the database. His/her task is to use the system to find it. We measure the number of interactions required. This, then, is easily compared against a simple linear search • Not a perfect model for all intended uses --- but something we can measure and use for comparisons
Features • Pictorial features • Originally 18 global features • % of pixels that are one of 11 colors • Mean color saturation • Median intensity of the image • Image width and height • A measure of global contrast • Two measures of the number of edgels computed at different thresholds
Features • Hidden annotation • Provides semantic labels • 147 attributes • Boolean vector, normalized Hamming distance
Bayesian relevance feedback • At denotes the current user action, • Dt is thecurrent display • H the session history including the current images displayed. Thus, • Ht = {D1, A1, D2, A2,… Dt, At} • T is a target image.
Bayesian relevance feedback • We build a predictive model P(A|T,H) • Then from Bayes rule
Bayesian relevance feedback • Assume time-invariance and same for all users
Absolute-distance model • Only one image, Xq, in the display Dt can be selected at each iteration • The probability of Ti increases or decreases depending on the distance d(Ti, Xq) • P(T=Ti) = P(T=Ti) G(d(Ti, Xq))
Relative-distance model • Let Q={Xq1, Xq2,…XqC} denote the set of selected in images in display Dt and • Let N={Xn1, Xn2 …XnL} denote the set of unselected images • Then we compute the distance difference • d(Ti, Xqk) – d(T1,Xnm) for all pairs {Xqk, Xnm} • The probabilities of images Tc that are closer to Xqk are increased while those closer to Xnm are decreased.
Display updating algorithm • Most probable display • Most informative display (Max. mutual information) • Sampling • Query by example
Experimental setup • Database of 4522 images • 1500 annotated • M/N, A/R, P/S/B • Memory/ no memory (relevance feedback history) • Absolute / relative distance • Pictorial / semantic/ both features
Experimental notation • MRB – memory, relative distance, pictorial and semantic features • MAB – memory, absolute distance, pictorial and semantic features • NRB – no memory, … • NAB • MRS – memory, relative, semantic features • MRP – memory, relative, pictorial features
Baseline testing • Similarity testing • How many images are examined before the user sees a similar image? • Compare to number needed when randomly searching the database
Improved pictorial features • Pictorial features • HSV 64-element histogram • HSV 256-element autocorrelogram • RGB 128-element color coherence vector
Display updating algorithms • Most probable display • Most informative display (Max. mutual information) • Sampling • Query by example
Most Probable Display • Performs quite well • However, greed strategy suffers from “over-learning” • PicHunter “gets stuck” in a local maximum • Display after display of “lions”, say
Most-Informative Display • Try to minimize the total number of iterations required in a search • Try to elicit as much information from the user as possible • Information theory suggests entropy as an estimate of the number of questions one needs to ask to resolve the ambiguity
Most-informative display • Consider the ideal (deterministic) case, in which the display consists of two images
Most-informative display • Generalization to the non-deterministic case
Most informative display • To perform minimization is non-trivial • Perform Monte Carlo simulation • Draw random displays {X1, X2… XND} from the distribution P(T=Ti) • Sampling is a special case of most informative method where only one Monte Carlo sample is drawn
Future directions • More efficient algorithms • Automatic detection of hidden features • Explore slightly richer user interfaces • Explore increased use of online learning