450 likes | 730 Views
Fast Gauss Transform and Applications. Changjiang Yang Ramani Duraiswami Nail A. Gumerov Larry Davis. Outline. Introduction to fast Gauss transform A new multivariate Taylor expansion K-center algorithm for space subdivision Error bound analysis Kernel density estimation
E N D
Fast Gauss Transform and Applications Changjiang Yang Ramani Duraiswami Nail A. Gumerov Larry Davis
Outline • Introduction to fast Gauss transform • A new multivariate Taylor expansion • K-center algorithm for space subdivision • Error bound analysis • Kernel density estimation • Mean-shift algorithm • Experimental results • Conclusions
Fast Gauss Transform (FGT) • Originally proposed by Greengard and Strain (1991) to efficiently evaluate the weighted sum of Gaussians: • FGT is an important variant of Fast Multipole Method (Greengard & Rolklin 1987, Greengard and Strain 1990). • FGT is a counterpart of fast Fourier transform, except that FFT is for the data on regular grid. • Widely used in mathematical physics, computational chemistry, mechanics. Recently applied to RBF interpolation, acoustic scatter problem, and computer vision (Elgammal et al. 2001).
Fast Gauss Transform (cont’d) • The weighted sum of Gaussians is equivalent to the matrix-vector product: • Direct evaluation of the sum of Gaussians requires O(N2) operations. • Fast Gauss transform reduces the cost to O(N log N) operations. Sources Targets
Fast Gauss Transform (cont’d) • The factorization or separation of variables • Space subdivision scheme • Error bound analysis
Factorization of Gaussian • Factorization is one of the key parts of the FGT. • Factorization breaks the entanglements between the sources and targets. • The contributions of the sources are accumulated at the centers, then distributed to each target. Center Sources Sources Sources Targets Targets Targets
Hermite Expansion in 1D • The Gaussian kernel is factorized into Hermite functions where Hermite function hn(x) is defined by • Exchange the order of the summations where An is defined by a.k.a. Moment
Hermite Expansion in Higher Dimensions • The higher dimensional Hermite expansion is the product of 1D Hermite expansion along each dimension. • Total number of terms is O(pd), p is the number of truncation terms. • The number of operations in one factorization is O(pd). h0 h1 h2 D>3 D=1 D=3 D=2
Space Subdivision in FGT • The FGT subdivides the space into uniform boxes and assigns the source points and target points into boxes. • For each box the FGT maintain a neighbor list. • The number of the boxes increases exponentially with the dimensionality. D=1 D=2 D=3 D>3
FGT in Higher Dimensions • The FGT is originally designed to solve the problems in mathematical physics (heat equation, vortex methods, etc), where the dimension is up to 3. • The higher dimensional Hermite expansion is the product of univariate Hermite expansion along each dimension. Total number of terms is O(pd). • The space subdivision scheme in the FGT is uniform boxes. The number of boxes grows exponentially with the dimension. • The exponential dependence on the dimension makes the FGT extremely inefficient in higher dimensions.
Multivariate Taylor Expansions • The Taylor expansion of the Gaussian function: • The first two terms can be computed independently. • The Taylor expansion of the last term is: where =(1, , d) is multi-index. • The multivariate Taylor expansion about center x*: where coefficients C are given by
Reduction from Taylor Expansions • The idea of Taylor expansion of Gaussian kernel is first introduced in the course CMSC878R by Ramani and Nail. • The number of terms in multivariate Hermite expansion is O(pd). • The number of terms in multivariate Taylor expansion is , asymptotically O(dp), a big reduction for large d. Fix p = 10, vary d = 1:20 Fix d = 10, vary p = 1:20
Monomial Orders • Let =(1, , n), =(1, , n), then three standard monomial orders: • Lexicographic order, or “dictionary” order: • Álexiff the leftmost nonzero entry in - is negative. • Graded lexicographic order: • Ágrlexiff 1 · i · ni < 1 · i · ni or (1 · i · ni = 1 · i · ni and Álex). • Graded reverse lexicographic order: • Ágrevlexiff 1 · i · ni < 1 · i · ni or (1 · i · ni = 1 · i · ni and the rightmost nonzero entry in - is positive). • Example: • Let f(x,y,z) = xy5z2 + x2y3z3 + x3, then • w.r.t. lex f is: f(x,y,z) = x3 + x2y3z3 + xy5z2; • w.r.t. grlex f is: f(x,y,z) = x2y3z3 + xy5z2 + x3; • w.r.t. grevlex f is: f(x,y,z) = xy5z2 + x2y3z3 + x3.
The Horner’s Rule • The Horner’s rule (W.G.Horner 1819) is to recursively evaluate the polynomial p(x) = an xn + + a1 x + a0 as: p(x) = (((an x + an-1)x+)x + a0. It costs n multiplications and n additions, no extra storage. • We evaluate the multivariate polynomial iteratively using the graded lexicographic order. It costs n multiplications and n additions, and n storage. x0 x=(a,b,c) x1 x2 x3
An Example of Taylor Expansion • Suppose x = (x1, x2, x3) and y = (y1, y2, y3), then ≈ Constant 19 ops Direct: 29ops × 19 ops Direct:29ops × 19 ops Direct: 29 ops
An Example of Taylor Expansion (Cont’d) Constant 19 ops Direct: 29ops ≈ × 21N ops Direct:31Nops × × 20 ops Direct:30ops ×
Space Subdivision Scheme • The space subdivision scheme in the original FGT is uniform boxes. The number of boxes grows exponentially with the dimensionality. • The desirable space subdivision should adaptively fit the density of the points. • The cell should be as compact as possible. • The algorithm is better to be a progressive one. • Based on the above considerations, we model the task as k-center problem.
K-center Algorithm • The k-center problem is defined to seek the “best” partition of a set of points into clusters (Gonzalez 1985, Hochbaum and Shmoys 1985, Feder and Greene 1988). • The k-center problem is NP-hard but there exists a simple 2-approximation algorithm. Given a set of points and a predefined number k, k-centerclustering is to find a partition S = S1[ S2[[ Sk that minimizes max1 · i · kradius(Si), where radius(Si) is the radius of the smallest disk that covers all points in Si.
Farthest-Point Algorithm • The farthest-point algorithm (a.k.a. k-center algorithm) is a 2-approximation of the optimal solution (Gonzales 1985). • The total running time is O(kn), n is the number of points. It can be reduced to O(n log k) using a slightly more complicated algorithm (Feder and Greene 1988).
Results of k-center Algorithm • The results of k-center algorithm. 40000 points are divided into 64 clusters in 0.48 sec on a 900MHZ PIII PC.
Error Bound of FGT • The total error from the series truncation and the cutoff outside of the neighborhood of targets is bounded by Cutoff error Truncation error rx rx ry Sources Targets
Error Bound Analysis • Increase the number of truncation terms p, the error bound will decrease. • With the progress of k-center algorithm, the radius of the source points rx will decrease, until the error bound is less than a given precision. • The error bound first decreases, then increases with respect to the cutoff radius ry.
Kernel Density Estimation (KDE) • Kernel density estimation (a.k.a Parzen method, Rosenblatt 1956, Parzen 1962) is an important nonparametric technique. • Given a set of observations {x1, , xn}, an estimate of density function is • Some commonly used kernel functions Kernel function Bandwidth Dimensionality Epanechnikov Triangular Gaussian Rectangular
KDE and FGT • In practice, the most widely used kernel is the Gaussian • The density estimate using the Gaussian kernel: • The computational requirement for large datasets is high, especially in higher dimensions. For N points, the computational cost is O(N2). • Fast Gauss transform can reduce the cost to O(N logN).
Mean-Shift Algorithm • Mean-shift algorithm is a general clustering technique based on the kernel density estimation. Originally proposed by Fukunaga and Hostetler in 1975, first applied to computer vision applications by Comaniciu. • Given N data points, a kernel function k(x) and a bandwidth h, for each data point x, the mean-shift algorithm iteratively performs: • Compute the mean-shift vector • Update the current position
Mean-Shift with FGT • Each step of the mean-shift algorithm requires kernel density evaluations whose complexity is quadratic. • The quadratic computational complexity can be reduced to linear order using the FGT. • Mean-shift algorithm is slow, also due to the fact that the steepest ascent method used in mean-shift is very inefficient, especially for the complex density functions. • Second order information is necessary for the mean-shift algorithm to reach the nearest stationary point in a global optimal convergence rate.
(Update direction) (Update Hessian) Mean-Shift with Quasi-Newton Methods • Exact Newton step requires computation of the Hessian matrix of the density which is computationally expensive. • Quasi-Newton methods avoid this by using the function and gradient to approximate the Hessian matrix.
Experimental Result (1) • The speedup of the fast Gauss transform in 4, 6, 8, 10 dimensions (h=1.0).
Experimental Result (2) • The comparison between the direct evaluation, the original fast Gauss transform, and our improved version of the fast Gauss transform (h=0.2). Relative error is under 2%.
Experimental Result (3) • Image segmentation results of the mean-shift algorithm with the Gaussian kernel. Size: 432X294 Time: 7.984 s Size: 481X321 Time: 12.359 s
Experimental Result (4) • The results of the mean-shift with BFGS algorithm
Experimental Result (5) • The results of the mean-shift with BFGS algorithm
Adaptive Bandwidth Selection • KDE is crucially dependent on the choice of bandwidth. • Data-driven bandwidth selection methods are preferred for the choice of the bandwidth. • Rules of thumb (Deheuvels 1977, Silverman 1986) • Least-squares cross-validation (Bownman 1984, Redemo 1982) • Biased cross-validation (Scott and Terrell 1987) • Solved-the-equation plug-in approach (Hall 1980, Sheather 1983,86) • Smoothed bootstrap (Faraway and Jhun 1990, Taylor 1989) • Adaptive bandwidth selection requires a local estimation of the point distribution. • K-center algorithm gives a rough estimation of the point distribution (similar to the histogram).
K-center and Adaptive Bandwidth Selection • The objective function of k-center is to find k spheres as small as possible to enclose all points. • Each partition of the k-center algorithm can be used to estimate the bandwidth locally.
Conclusions • The fast Gauss transform is very effective for speeding up the weighted sums of Gaussians. • The exponential dependence on the dimensionality makes the FGT inefficient in higher dimensions. • Three techniques are applied to speed up the FGT: a new Taylor expansion scheme, a pyramid-like data structure to store and manipulate the expansions, and an adaptive space subdivision technique based on the k-center algorithm. • The FGT is applied to the kernel density estimation. The mean-shift algorithm is chosen as a case study.