460 likes | 1.06k Views
Algorithmic Aspects of Finite Metric Spaces. Moses Charikar Princeton University. x. z. y. Metric Space. A set of points X Distance function d(x,y) d : X [0…) d(x,y) = 0 iff x=y d(x,y) = d(y,x) Symmetric d(x,z) ≤ d(x,y) + d(y,z) Triangle inequality
E N D
Algorithmic Aspects of Finite Metric Spaces Moses Charikar Princeton University
x z y Metric Space • A set of points X • Distance function d(x,y)d : X [0…) • d(x,y) = 0 iff x=y • d(x,y) = d(y,x) Symmetric • d(x,z) ≤ d(x,y) + d(y,z)Triangle inequality • Metric space M(X,d)
Example Metrics: Normed spaces • x = (x1, x2, …, xd) y = (y1, y2, …, yd) • ℓpnorm ℓ1ℓ2(Euclidean)ℓ • ℓpd : ℓpnorm in Rd • Hamming cube {0,1}d
Example Metrics: domain specific • Shortest path distances on graph • Symmetric difference on sets • Edit distance on strings • Hausdorff distance, Earth Mover Distance on sets of n points
Metric Embeddings • General idea: Map complex metrics to simple metrics • Why ? richer algorithmic toolkit for simple metrics • Simple metrics • normed spaces ℓp • low dimensional normed spaces ℓpd • tree metrics • Mapping should not change distances much (low distortion)
Low Distortion Embeddings f • Metric spaces (X1,d1) & (X2,d2),embedding f: X1 X2 has distortion D if ratio of distances changes by ≤ D x,y X1: http://humanities.ucsd.edu/courses/kuchtahum4/pix/earth.jpg http://www.physast.uga.edu/~jss/1010/ch10/earth.jpg
Applications • High dimensional Low dimensional(Dimension reduction) • Algorithmic efficiency (running time) • Compact representation (storage space) • Streaming algorithms • Specific metrics normed spaces • Nearest neighbor search • Optimization problems • General metrics tree metrics • Optimization problems, online algorithms Solve problems on very large data sets in one pass using a very small amount of storage
A (very) Brief History:fundamental results • Metric spaces studied in functional analysis • n point metric embeds into ℓnwithno distortion[Frechet] • n point metric embeds into ℓp with distortion log n [Bourgain ’85] • Dimension reduction fornpoint Euclidean metric with distortion 1+ε[Johnson, Lindenstrauss ’84]
A (very) Brief History:applications in Computer Science • Optimization problems • Application to graph partitioning [Linial, London, Rabinovich ‘95][Arora, Rao, Vazirani ’04] • n point metrics into tree metrics[Bartal ’96 ‘98] [FRT ’03] • Efficient algorithms • Dimension reduction • Nearest neighbor search, Streaming algorithms
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics metric as model
Disclaimer • This is not an attempt at a survey • Biased by my own interests • Much more relevant and related work than I can do justice do in limited time. • Goal: Give glimpse of different applications of finite metric spaces • Core ideas, no messy details
Disclaimer: Community Bias • Theoretical viewpoint • Focus on algorithmic techniques with performance guarantees • Worst case guarantees
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics metric as model
Metric as data • What is the data ? • Mathematical representation of objects (e.g. documents, images, customer profiles, queries). • Sets, vectors, points in Euclidean space, points in a metric space, vertices of a graph. • Metric is part of data
Johnson Lindenstrauss [JL84] • n points in Euclidean space (ℓ2 norm) can be mapped down to O((log n)/2) dimensions with distortion at most 1+. • Quite simple [JL84, FM88, IM98, AV99, DG99, Ach01] • Project onto random unit vectors • projection of (u-v) onto one random vector behaves like Gaussian scaled by ||u-v||2 • Need log n dimensionsfor tight concentration bounds • Even a random {-1,+1} vector works…
Dimension reduction for ℓ2 • Two interesting properties: • Linear mapping • Oblivious – choice of linear mapping does not depend on point set • Many applications … • Making high dimensional problems tractable • Streaming algorithms • Learning mixtures of gaussians [Dasgupta ’99] • Learning robust concepts [Arriaga,Vempala ’99] [Klivans,Servedio ’04]
Dimension reduction for ℓ1 • [C,Sahai ‘02]Linear embeddings are not good for dimension reduction in ℓ1 • There exist n points in ℓ1in n dimensions, such that any linear mapping with distortion needs n/2dimensions
Dimension reduction for ℓ1 • [C, Brinkman ‘03]Strong lower boundsfor dimension reduction in ℓ1 • There exist n points in ℓ1, such that anyembedding with constant distortion needs n1/2dimensions • Alternate, simpler proof [Lee, Naor ’03]
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics Solve problems on very large data sets in one pass using a very small amount of storage metric as model
Frequency Moments [Alon,Matias,Szegedy ‘99] Data stream is sequence of elements in [n] ni : frequency of element i Fk =nik: kth frequency moment F0 = number of distinct elements F2= skewness measure of data stream Goal: Given a data stream, estimate Fk in one pass and sub-linear space
Estimating F2 • Consider a single counter c and randomly chosen xi{ +1, -1} for each i in [n] • On seeing each element i, update c += xi • c = ni •xi • Claim: E[c2] = ni2= F2Var[c2] 2(F2)2(4-wise independence) • Average 1/2 copies of this estimator to get (1+) approximation
Differences between data streams • ni : frequency of element i in stream 1 • mi : frequency of element i in stream 2 • Goal: measure (ni –mi)2 • F2 sketches are additive ni •xi - mi •xi = (ni –mi)•xi • Basically, dimension reduction in ℓ2 norm • Very useful primitivee.g. frequent items [C, Chen, Farach-Colton ’02]
Estimate ℓ1 norms ? [Indyk ’00] • p-stable distribution:Distribution over R such that ni •xi distributed as ( |ni|p)1/p X • Cauchy distribution: c(x)=1/(1+x2) 1-stable • Gaussian distribution 2-stable • As before, c = ni •xi • Cauchy does not have finite expectation ! • Estimate scale factor by taking median
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics metric as model
Similarity Preserving Hash Functions • Similarity function sim(x,y) • Family of hash functions F with probability distribution such that
Applications • Compact representation scheme for estimating similarity • Approximate nearest neighbor search [Indyk,Motwani ’98] [Kushilevitz,Ostrovsky,Rabani ‘98]
Estimating Set Similarity [Broder,Manasse,Glassman,Zweig,’97] [Broder,C,Frieze,Mitzenmacher,’98] • Collection of subsets
Existence of SPH schemes [C ’02] • sim(x,y) admits an SPH scheme if family of hash functions F such that Theorem: If sim(x,y) admits an SPH scheme then 1-sim(x,y) satisfies triangle inequality; embeds into ℓ1 • Rounding procedures for LPs and SDPs yield similarity and distance preserving hashing schemes.
Earth Mover Distance (EMD) LP Rounding algorithms for optimization problem (metric labelling) yield log n approximate estimator for EMD on n points. Implies that EMD embeds into ℓ1 with distortion log n P Q EMD(P,Q)
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics metric as model
U V Graph partitioning problems • Given graph, partition into U,V • Maximum cut maximize |E(U,V)| • Sparsest cut minimize
Mr. Rumsfeld his The secretary he Saddam Hussein Correlation clustering [Cohen,Richman,’02][Bansal,Blum,Chawla,’02] Similar (+) Dissimilar (-) example courtesy Shuchi Chawla Mr. Rumsfeld his The secretary he Saddam Hussein
U V 1 0 0 Graph partitioning as metric problem • Partitioning is equivalent to finding appropriate {0,1} metric • possibly additional constraints • Objective function linear in metric • Find best {0,1} metric cut metric relaxation
Metric relaxation approaches • Max Cut [Goemans,Williamson ’94] • map vertices to points on unit sphere (SDP) • exploit geometry to get good solution(random hyperplane cut) • Sparsest Cut [Linial,London,Rabinovich ’95] • LP gives best metric; need ℓ1 metric • [Bourgain ’84] embeds any metric into ℓ1 with distortion log n • Existential theorem can be made algorithmic • log n approximation • recent SDP based log n approximation[Arora,Rao,Vazirani ’04]
Metric relaxation approaches • Correlation clustering [C,Guruswami,Wirth,’03] [Emanuel,Fiat,’03] [Immorlica,Karger,’03] • Find best [0,1] metric from similarity/dissimilarity data via LP • Use metric to guide clustering • close points in same cluster • distant points in different clusters • “Learning” best metric ? • Note: In many cases, LP/SDP can be eliminated to yield efficient algorithms
Outline metric as data • Dimension reduction • Streaming data model • Compact representation • Finite metrics in optimization • graph partitioning and clustering Embedding theorems for finite metrics metric as model
Some connections to learning • Dimension reduction in ℓ2 : • Learning mixtures of Gaussians [Dasgupta ’99]Random projections make skewed gaussians more spherical, making learning easier • Learning with large margin[Arriaga,Vempala ’99] [Klivans,Servedio ’04]Random projections preserve margin,large margin few dimensions • Kernel methods for SVMs • mappings to ℓ2
Ongoing developments • Notion of intrinsic dimensionality of metric space[Gupta,Krauthgamer,Lee,’03][Krauthgamer,Lee,Mendel,Naor,’04] • Doubling dimension: How many balls of radius R needed to cover ball of radius 2R ? • Complexity measure of metric space • natural parameter for embeddings • Open: Can every metric of constant doubling dimension in ℓ2 be embedded into ℓ2 with O(1) dimensions and O(1) distortion ? • Not true for ℓ1 • related to learning low dimension manifolds, PCA, MDS, LLE, Isomap
Some things I didn’t mention • Approximating general metrics via tree metrics • modified notion of distortion • useful for approximation, online algorithms • Many mathematically appealing questions • Embeddings between normed spaces • Spectral methods for approximating matrices (SVD, LSI) • PCA, MDS, LLE, Isomap
Conclusions • Whirlwind tour of finite metrics • Rich algorithmic toolkit for finite metric spaces • Synergy between Computer Science and Mathematics • Exciting area of active research • range from practical applications to deep theoretical questions • Many more applications to be discovered