160 likes | 388 Views
IEEE Eleventh DSP Workshop, August 3 rd 2004. Clustering Algorithms for Perceptual Image Hashing. Vishal Monga, Arindam Banerjee, and Brian L. Evans. {vishal, abanerje, bevans}@ece.utexas.edu. Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering
E N D
IEEE Eleventh DSP Workshop, August 3rd 2004 Clustering Algorithms for Perceptual Image Hashing Vishal Monga, Arindam Banerjee, and Brian L. Evans {vishal, abanerje, bevans}@ece.utexas.edu Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin http://signal.ece.utexas.edu Research supported by a gift from the Xerox Foundation
Hash Example • Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representationof large digital message Example: sum of ASCII codes forcharacters in name modulo N,a prime number (N = 7) Database name search example
Perceptual Hash: Desirable Properties • Perceptual robustness • Fragility to distinct inputs • Randomization Necessary in security applicationsto minimize vulnerability againstmalicious attacks
Input Image Final Hash Visually Robust Feature Vector Compress(or cluster) Feature Vectors Feature Vector Extraction Hashing Framework • Two-stage hash algorithm • Goal: Retain perceptual significance Let (li, lj) denote vectors in metric space of feature vectors V and 0 < ε < δ, then it is desired Minimizing average distance between clusters inappropriate
Cost Function for Feature Vector Compression • Define joint cost matrices C1 and C2 (nxn) n = total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to • Exponential cost Ensures severe penalty associated if feature vectors far apart “Perceptually distinct” clustered together α > 0, Г > 1 are algorithm parameters
Cost Function for Feature Vector Compression • Define S1 as *S2 is defined similarly • Normalize to get , • Then, minimize “expected” cost p(i) = p(li), p(j) = p(lj)
Basic Clustering Algorithm • Obtainε, δ,set k = 1. Select the data point associated with highest probability mass, label it l1 • Make the first cluster by including all unclustered points ljsuch that D(l1,lj) < ε/2 3. k = k + 1. Select the highest probability data point lk among the unclustered points such that where S is any cluster, C – set of clusters formed till this step • Form the kth cluster Skby including all unclustered points ljsuch that D(lk,lj) < ε/2 5. Repeat steps 3-4 until no more clusters can be formed
Observations • For any (li, lj) in cluster Sk • No errors up to this stage of algorithm Each cluster is at least ε away from any other cluster Within each cluster, maximum distance between any two points is at most ε
Approach 1 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1,2,…, k compute Let S(δ) = {Si such that di ≤δ} • IF S(δ) = {Φ} THEN k = k + 1. Sk = l* is a cluster of its own ELSE for each Siin S(δ) define where denotes the complement of Si i.e. all clusters in S(δ) except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 4. Repeat steps 1 through 3 until all data points are exhausted
Approach 2 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1, 2,…, k, define and β lies in [1/2, 1] Here, denotes the complement of Si i.e. all existing clusters except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 3. Repeat steps 1 and 2 until all data points are exhausted
Summary • Approach 1 Tries to minimize conditioned on = 0 • Approach 2 Smoothly trades off the minimization of vs. via the parameter β β = ½ joint minimization β = 1 exclusive minimization of • Final hash length determined automatically! Given by bits, where k is number of clusters formed Proposed clustering can compress feature vectors in any metric space, e.g. Euclidean, Hamming, and Levenshtein
Clustering Results • Compress binary feature vector of L = 240 bits Final hash length = 46 bits, with Approach 2, β = 1/2 • Value of cost function is orders of magnitude lower for proposed clustering
Conclusion & Future Work • Two-stage framework for image hashing • Feature extraction followed by feature vector compression • Second stage is media independent • Clustering algorithms for compression • Novel cost function for hashing applications • Applicable to feature vectors in any metric space • Trade-offs facilitated between robustness and fragility • Final hash length determined automatically • Future work • Randomized clustering for secure hashing • Information theoretically secure hashing