1.17k likes | 2.15k Views
Introduction to Fractals and Fractal Dimension. Christos Faloutsos. Intro to Fractals - Outline. Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Discussion - putting fractals to work! Conclusions – practitioner’s guide
E N D
Introduction to Fractalsand Fractal Dimension Christos Faloutsos
Intro to Fractals - Outline • Motivation – 3 problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • Discussion - putting fractals to work! • Conclusions – practitioner’s guide • Appendix: gory details - boxcounting plots Copyright: C. Faloutsos (2001)
Problem #1: GIS - points • Road end-points of Montgomery county: • Q : distribution? • not uniform • not Gaussian • ??? • no rules/pattern at all!? Copyright: C. Faloutsos (2001)
Problem #2 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households ...) - patterns? - attraction/repulsion? - how many ‘spi’ within r from an ‘ell’? Copyright: C. Faloutsos (2001)
Poisson Problem #3: traffic • disk trace (from HP - J. Wilkes); Web traffic #bytes - how many explosions to expect? - queue length distr.? time Copyright: C. Faloutsos (2001)
Common answer for all three: • Fractals / self-similarities / power laws • Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …) Copyright: C. Faloutsos (2001)
Road map • Motivation – 3 problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples and tools • Discussion - putting fractals to work! • Conclusions – practitioner’s guide • Appendix: gory details - boxcounting plots Copyright: C. Faloutsos (2001)
Definitions (cont’d) • Simple 2-D figure has finite perimeter and finite area • area ≈ perimeter2 • Do all 2-D figures have finite perimeter, finite area, and area ≈ perimeter2? Copyright: C. Faloutsos (2001)
What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; infinite length! ... Copyright: C. Faloutsos (2001)
Definitions (cont’d) • Paradox: Infinite perimeter ; Zero area! • ‘dimensionality’: between 1 and 2 • actually: Log(3)/Log(2) = 1.58… Copyright: C. Faloutsos (2001)
Definitions (cont’d) • “self similar” • pieces look like whole independent of scale • “dimension” dim = log(# self-similar pieces) / log(scale) • dim(line) = log(2)/log(2) = 1 • dim(square) = log(4)/log(2) = 2 • dim(cube) = log(8)/log(2) = 3 Copyright: C. Faloutsos (2001)
Q: fractal dimension of a line? A: 1 (= log(2)/log(2)) Intrinsic (‘fractal’) dimension Copyright: C. Faloutsos (2001)
Q: dfn for a given set of points? x y 5 1 4 2 3 3 2 4 Intrinsic (‘fractal’) dimension Copyright: C. Faloutsos (2001)
Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 Fd = slope of (log(nn) vs log(r) ) Intrinsic (‘fractal’) dimension Copyright: C. Faloutsos (2001)
Algorithm, to estimate it? Notice avg nn(<=r) is exactly tot#pairs(<=r) / (2*N) Intrinsic (‘fractal’) dimension including ‘mirror’ pairs Copyright: C. Faloutsos (2001)
Dfn of fd: ONLY for a perfectly self-similar point set: zero area; infinite length! ... =log(n)/log(f) = log(3)/log(2) = 1.58 Copyright: C. Faloutsos (2001)
log(#pairs within <=r ) 1.58 log( r ) Sierpinsky triangle == ‘correlation integral’ Copyright: C. Faloutsos (2001)
Observations: • Euclidean objects have integer fractal dimensions • point: 0 • lines and smooth curves: 1 • smooth surfaces: 2 • fractal dimension -> roughness of the periphery Copyright: C. Faloutsos (2001)
fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale Important properties Copyright: C. Faloutsos (2001)
Road map • Motivation – 3 problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples and tools • Discussion - putting fractals to work! • Conclusions – practitioner’s guide • Appendix: gory details - boxcounting plots Copyright: C. Faloutsos (2001)
Problem #1: GIS points • Cross-roads of Montgomery county: • any rules? Copyright: C. Faloutsos (2001)
1.51 Solution #1 A: self-similarity -> • <=> fractals • <=> scale-free • <=> power-laws (y=x^a, F=C*r^(-2)) • avg#neighbors(<= r ) = r^D log(#pairs(within <= r)) log( r ) Copyright: C. Faloutsos (2001)
1.51 Solution #1 A: self-similarity • avg#neighbors(<= r ) ~ r^(1.51) log(#pairs(within <= r)) log( r ) Copyright: C. Faloutsos (2001)
Examples:MG county • Montgomery County of MD (road end-points) Copyright: C. Faloutsos (2001)
Examples:LB county • Long Beach county of CA (road end-points) Copyright: C. Faloutsos (2001)
Solution#2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(#pairs) log(r) Copyright: C. Faloutsos (2001)
Solution#2: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) Copyright: C. Faloutsos (2001)
spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) Copyright: C. Faloutsos (2001)
r1 r2 r2 r1 spatial d.m. Heuristic on choosing # of clusters Copyright: C. Faloutsos (2001)
spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) Copyright: C. Faloutsos (2001)
spatial d.m. log(#pairs within <=r ) • - 1.8 slope • - plateau! • repulsion!! ell-ell spi-spi -duplicates spi-ell log(r) Copyright: C. Faloutsos (2001)
#bytes time Solution #3: traffic • disk traces: self-similar: Copyright: C. Faloutsos (2001)
20% 80% Solution #3: traffic • disk traces (80-20 ‘law’ = ‘multifractal’) #bytes time Copyright: C. Faloutsos (2001)
Solution#3: traffic Clarification: • fractal: a set of points that is self-similar • multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?) Copyright: C. Faloutsos (2001)
Tape#1 Tape# N time Tape accesses # tapes needed, to retrieve n records? (# days down, due to failures / hurricanes / communication noise...) Copyright: C. Faloutsos (2001)
Tape#1 Tape# N time Tape accesses 50-50 = Poisson # tapes retrieved real # qual. records Copyright: C. Faloutsos (2001)
Fast estimation of fd(s): • How, for the (correlation) fractal dimension? • A: Box-counting plot: log(sum(pi ^2)) pi r log( r ) Copyright: C. Faloutsos (2001)
Definitions • pi : the percentage (or count) of points in the i-th cell • r: the side of the grid Copyright: C. Faloutsos (2001)
Fast estimation of fd(s): • compute sum(pi^2) for another grid side, r’ log(sum(pi ^2)) pi’ r’ log( r ) Copyright: C. Faloutsos (2001)
Fast estimation of fd(s): • etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2 log(sum(pi ^2)) log( r ) Copyright: C. Faloutsos (2001)
Hausdorff or box-counting fd: • Box counting plot: Log( N ( r ) ) vs Log ( r) • r: grid side • N (r ): count of non-empty cells • (Hausdorff) fractal dimension D0: Copyright: C. Faloutsos (2001)
Definitions (cont’d) • Hausdorff fd: r log(#non-empty cells) D0 log(r) Copyright: C. Faloutsos (2001)
Observations • q=0: Hausdorff fractal dimension • q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius) • q=1: Information fractal dimension Copyright: C. Faloutsos (2001)
Observations, cont’d • in general, the Dq’s take similar, but not identical, values. • except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’ Copyright: C. Faloutsos (2001)
Examples:MG county • Montgomery County of MD (road end-points) Copyright: C. Faloutsos (2001)
Examples:LB county • Long Beach county of CA (road end-points) Copyright: C. Faloutsos (2001)
Summary So Far • many fractal dimensions, with nearby values • can be computed quickly (O(N) or O(N log(N)) • (code: on the web) Copyright: C. Faloutsos (2001)
Dimensionality Reduction with FD Problem definition: ‘Feature selection’ • given N points, with E dimensions • keep the k most ‘informative’ dimensions [Traina+,SBBD’00] Copyright: C. Faloutsos (2001)
Dim. reduction - w/ fractals not informative Copyright: C. Faloutsos (2001)
Dim. reduction Problem definition: ‘Feature selection’ • given N points, with E dimensions • keep the k most ‘informative’ dimensions Re-phrased: spot and drop attributes with strong (non-)linear correlations Q: how do we do that? Copyright: C. Faloutsos (2001)