120 likes | 238 Views
Active hashing and its application to image and text retrieval Yi Zhen, Dit -Yan Yeung , Published in DMKD Feb 2012. Presented by Arshad Jamal, Rajesh Dhania , Vinkal Vishnoi. Introduction. Computing similarity plays a fundamental role
E N D
Active hashing and its application to image and text retrievalYi Zhen, Dit-Yan Yeung, Published in DMKD Feb 2012 Presented by Arshad Jamal, Rajesh Dhania, VinkalVishnoi
Introduction • Computing similarity plays a fundamental role • Hashing based methods gained popularity for large-scale similarity search Hashing based Tree based Suitable for low dimensions Data Dependent Data Independent This paper proposes a novel Framework for Active Hashing Unsupervised Semi-supervised
Related work • Locality Sensitive Hashing • Goal is to assign similar binary code for data points that are closer in feature space [Random Linear Projection + Thresh] • Code length could become quite large • Spectral Hashing • Performs spectral decomposition to learn hash functions • Assumes data to be uniformly distributed • Active Learning • Identify and present the most informative unlabeled data to human experts for labeling
Related Work: Semi-supervised Hashing • Given N normalized data points of D dimensions • Learn K Hash functions to generate K-bit binary code • Build two set of point pairs S (Similar), D(Dissimilar) • Together they characterize the semantic similarity • Hash functions are learned by maximizing an objective function,
Limitations of SSH • Point pairs from both S and D sets are considered to be equally important • For multi-class data, the D points picked from closer or farther class contribute same weight • More dissimilar points will spoil the learned hash function C1 C3 C2
Active Hashing (Greedy AH) • Tries to overcome the limitations of SSH by picking most informative points • Algorithm: Three main steps • Given (L, U) labeled and un-labeled data points and candidate set C Select most informative pts A from C Get A labeled by an expert Update L, U, C Train the hash functions based on L & U
Greedy AH: Selecting data points • Based on SSH model hash function • Intuitively, the term indicates the certainty of x • Data certainty (DC): • Data points with smallest f will be the most informative points
Batch mode Active Hashing • Selecting points one by one is inefficient and suboptimal • Set of points are selected and processed to learn a Hash fn. • µ is indicator function deciding about the presence of a point • f is a vector of normalized certainty values in C • K is positive semi-definite similarity matrix defined on C • Choose M examples with largest µ
Experimental evaluation-I • Image retrieval (MNIST dataset): Results reported for different parameter settings • Text Retrieval (20Newsgroups (NEWS) data set) • Random vs BMAH: Performance improvement
Experimental evaluation-II • Image retrieval (MNIST dataset) • BMAH vs GAH: BMAH takes less time
References • Andoni A, Indyk P (2006) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: Proceedings of the 47th annual IEEE symposium on foundations of computer science, FOCS ’06, IEEE Computer Society, Washington, pp 459–468 • Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Koller D, Schuurmans D, Bengio Y, BottouL (eds) Advances in neural information processing systems 21, NIPS 21, The MIT Press, Cambridge, MA, pp 1753–1760 • Wang J,Kumar S, Chang S-F (2010a) Semi-supervised hashing for scalable image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition [46], pp 3424–3431 • Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason 50:969–978 Thanks