Clustering Algorithms for Perceptual Image Hashing

IEEE Eleventh DSP Workshop, August 3rd 2004 Clustering Algorithms for Perceptual Image Hashing Vishal Monga, Arindam Banerjee, and Brian L. Evans {vishal, abanerje, bevans}@ece.utexas.edu Embedded Signal Processing Laboratory Dept. of Electrical and Computer Engineering The University of Texas at Austin http://signal.ece.utexas.edu Research supported by a gift from the Xerox Foundation

Hash Example • Hash function: Projects value from set with large (possibly infinite) number of members to set with fixed number of (fewer) members Irreversible Provides short, simple representationof large digital message Example: sum of ASCII codes forcharacters in name modulo N,a prime number (N = 7) Database name search example

Perceptual Hash: Desirable Properties • Perceptual robustness • Fragility to distinct inputs • Randomization Necessary in security applicationsto minimize vulnerability againstmalicious attacks

Input Image Final Hash Visually Robust Feature Vector Compress(or cluster) Feature Vectors Feature Vector Extraction Hashing Framework • Two-stage hash algorithm • Goal: Retain perceptual significance Let (li, lj) denote vectors in metric space of feature vectors V and 0 < ε < δ, then it is desired Minimizing average distance between clusters inappropriate

Cost Function for Feature Vector Compression • Define joint cost matrices C1 and C2 (nxn) n = total number of vectors be clustered, C(li), C(lj) denote the clusters that these vectors are mapped to • Exponential cost Ensures severe penalty associated if feature vectors far apart “Perceptually distinct” clustered together α > 0, Г > 1 are algorithm parameters

Cost Function for Feature Vector Compression • Define S1 as *S2 is defined similarly • Normalize to get , • Then, minimize “expected” cost p(i) = p(li), p(j) = p(lj)

Basic Clustering Algorithm • Obtainε, δ,set k = 1. Select the data point associated with highest probability mass, label it l1 • Make the first cluster by including all unclustered points ljsuch that D(l1,lj) < ε/2 3. k = k + 1. Select the highest probability data point lk among the unclustered points such that where S is any cluster, C – set of clusters formed till this step • Form the kth cluster Skby including all unclustered points ljsuch that D(lk,lj) < ε/2 5. Repeat steps 3-4 until no more clusters can be formed

Observations • For any (li, lj) in cluster Sk • No errors up to this stage of algorithm Each cluster is at least ε away from any other cluster Within each cluster, maximum distance between any two points is at most ε

Approach 1 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1,2,…, k compute Let S(δ) = {Si such that di ≤δ} • IF S(δ) = {Φ} THEN k = k + 1. Sk = l* is a cluster of its own ELSE for each Siin S(δ) define where denotes the complement of Si i.e. all clusters in S(δ) except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 4. Repeat steps 1 through 3 until all data points are exhausted

Approach 2 • Select data point l* among unclustered data points that has highest probability mass • For each existing cluster Si, i = 1, 2,…, k, define and β lies in [1/2, 1] Here, denotes the complement of Si i.e. all existing clusters except Si. Then, l* is assigned to the cluster S* = arg min F(Si) 3. Repeat steps 1 and 2 until all data points are exhausted

Summary • Approach 1 Tries to minimize conditioned on = 0 • Approach 2 Smoothly trades off the minimization of vs. via the parameter β β = ½  joint minimization β = 1  exclusive minimization of • Final hash length determined automatically! Given by bits, where k is number of clusters formed Proposed clustering can compress feature vectors in any metric space, e.g. Euclidean, Hamming, and Levenshtein

Clustering Results • Compress binary feature vector of L = 240 bits Final hash length = 46 bits, with Approach 2, β = 1/2 • Value of cost function is orders of magnitude lower for proposed clustering

Conclusion & Future Work • Two-stage framework for image hashing • Feature extraction followed by feature vector compression • Second stage is media independent • Clustering algorithms for compression • Novel cost function for hashing applications • Applicable to feature vectors in any metric space • Trade-offs facilitated between robustness and fragility • Final hash length determined automatically • Future work • Randomized clustering for secure hashing • Information theoretically secure hashing

Clustering Algorithms for Perceptual Image Hashing

Clustering Algorithms for Perceptual Image Hashing

Presentation Transcript

Clustering Algorithms

Comparing Clustering Algorithms

Clustering Algorithms

Clustering Algorithms

Clustering Algorithms

Fuzzy Clustering Algorithms

Genetic algorithms (GA) for clustering

Robust Perceptual Image Hashing Using Feature Points

Clustering Algorithms

4. IMAGE CLUSTERING

Clustering Algorithms

Clustering Algorithms

Clustering Algorithms

Clustering Algorithms

Clustering Algorithms

Cryptocurrency Hashing Algorithms 2019

Clustering Algorithms

Clustering Algorithms

Shoring Up Hashing Algorithms

MRI Brain Image Segmentation using Fuzzy Clustering Algorithms