520 likes | 758 Views
Dimensionality Reduction Part 2: Nonlinear Methods. Comp 790-090 Spring 2007. Previously…. Linear Methods for Dimensionality Reduction PCA: rotate data so that principal axes lie in direction of maximum variance MDS: find coordinates that best preserve pairwise distances. PCA. Motivation.
E N D
Dimensionality ReductionPart 2: Nonlinear Methods Comp 790-090 Spring 2007
Previously… • Linear Methods for Dimensionality Reduction • PCA: rotate data so that principal axes lie in direction of maximum variance • MDS: find coordinates that best preserve pairwise distances PCA
Motivation • Linear Dimensionality Reduction doesn’t always work • Data violates underlying “linear”assumptions • Data is not accurately modeled by “affine” combinations of measurements • Structure of data, while apparent, is not simple • In the end, linear methods do nothing more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first
Stopgap Remedies • Local PCA • Compute PCA models for small overlapping item neighborhoods • Requires a clustering preprocess • Fast and simple, but results in no global parameterization • Neural Networks • Assumes a solution of a given dimension • Uses relaxation methods to deform given solution to find a better fit • Relaxation step is modeled as “layers” in a network where properties of future iterations are computed based on information from the current structure • Many successes, but a bit of an art
Why Linear Modeling Fails • Suppose that your sample data lies on some low-dimensional surface embedded within the high-dimensional measurement space. • Linear models allow ALLaffine combinations • Often, certaincombinations are atypicalof the actual data • Recognizing this isharder as dimensionalityincreases
What does PCA Really Model? • Principle Component Analysis assumptions • Mean-centered distribution • What if the mean, itself is atypical? • Eigenvectors ofCovariance • Basis vectors alignedwith successive directionsof greatest variance • Classic 1st Orderstatistical model • Distribution is characterizedby its mean and variance (Gaussian Hyperspheres)
Non-Linear Dimensionality Reduction • Non-linear Manifold Learning • Instead of preserving global pairwise distances, non-linear dimensionality reduction tries to preserve only the geometric properties of local neighborhoods • Discover a lower-dimensional“embedding” manifold • Find a parameterizationover that manifold • Linear parameter space • Projection mappingfrom original M-Dspace to d-Dembedding space “reprojection,elevating, or lifting” “projection” Linear Embedding Space
Nonlinear DimRedux Steps • Discover a low-dimensional embedding manifold • Find a parameterization over the manifold • Project data into parameter space • Analyze, interpolate, and compress in embedding space • Orient (by linear transformation) the parameter space to align axes with salient features • Linear (affine) combinations are valid here • In the case of interpolation and compression use “lifting” to estimate M-D original data
Nonlinear Methods • Local Linear Embeddings [Roweis 2000] • Isomaps [Tenenbaum 2000] • These two papers ignited the field • Principled approach (Asymptotically, as the amount of data goes to infinity they have been proven to find the “real” manifold) • Widely applied • Hotly contested
Local Linear Embeddings • First Insight • Locally, at a fine enough scale, everything looks linear
Local Linear Embeddings • First Insight • Find an affine combination the “neighborhood” about a point that best approximates it
Finding a Good Neighborhood • This is the remaining “Art” aspect of nonlinear methods • Common choices • -ball: find all items that lie within an epsilon ball of the target item as measured under some metric • Best if density of items is high and every point has a sufficient number of neighbors • K-nearest neighbors: find the k-closest neighbors to a point under some metric • Guarantees all items are similarly represented, limits dimension to K-1
Within locally linear neighborhoods, each point can be considered as an affine combination of its neighbors Affine “Neighbor” Combinations Imagine cutting out patches from manifold and placing them in lower-dim so that angles between points are preserved. • Weights should still be valid in lower-dimensional embedding space
Find Weights • Rewriting as a matrix for all x • Reorganizing • Want to find W that minimizes , and satisfies “sum-to-one” constraint • Ends up as constrained “least-squares” problem “Unknown W matrix” N N N M N M
Find Linear Embedding Space • Now that we have the weight matrix W, find the linear vector that satisfies the followingwhere W is N x N and X is M x N • This can be found by finding the null space of • Classic problem: run SVD on and find the orthogonal vector associated with the smallest d singular values (the smallest singular value will be zero and represent the system’s invariance to translation)
Numerical Issues • Numerical problems can arise in computing LLEs • The least-squared covariance matrix that arises in the computation of the weighting matrix, W, solution can be ill-conditioned • Regularization (rescale the measurements by adding a small multiple of the Identity to covariance matrix) • Finding small singular (eigen) values is not as well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed • Good (but slow) solvers exist, you have to use them
Results • The resulting parameter vector, yi, gives the coordinates associated with the item xi • The dth embedding coordinate is formed from the orthogonal vector associated with thedst singular value of A.
Reprojection • Often, for data analysis, a parameterization is enough • For interpolation and compression we might want to map points from the parameter space back to the “original” space • No perfect solution, but a few approximations • Delauney triangulate the points in the embedding space, find the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weights • Interpolate by using a radially symmetric kernel centered about the desired parameter setting • Works, but mappings might not be one-to-one
LLE Example • 3-D S-Curve manifold with points color-coded • Compute a 2-D embedding • The local affine structure is well maintained • The metric structure is okay locally, but can drift slowly over the domain (this causes the manifold to taper)
LLE Failures • Does not work on to closed manifolds • Cannot recognize Topology
Isomap • An alternative non-linear dimensionality reduction method that extends MDS • Key Observation:On a manifold distances are measured using geodesic distances rather than Euclidean distances Small Euclidean distance Large geodesic distance
Problem: How to Get Geodesics • Without knowledge of the manifold it is difficult to compute the geodesic distance between points • It is even difficult if you know the manifold • Solution • Use a discrete geodesic approximation • Apply a graph algorithm to approximate the geodesic distances
Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm
Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm
Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm
Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm
Isomap algorithm • Compute fully-connected neighborhood of points for each item • Can be k nearest neighbors or ε-ball • Neighborhoods must be symmetric • Test that resulting graph is fully-connected, if not increase either K or • Calculate pairwise Euclidean distances within each neighborhood • Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points • Run MDS on resulting distance matrix
Isomap Results • Find a 2D embedding of the 3D S-curve (also shown for LLE) • Isomap does a good job of preserving metric structure (not surprising) • The affine structure is also well preserved
Isomap Failures • Isomap also has problems on closed manifolds of arbitrary topology
Non-Linear Example • A Data-Driven Reflectance Model (Matusik et al, Siggraph2003) • Bidirectional Reflectance Distribution Functions(BRDF) • Define ratio of the reflected radiance in a particular direction to the incident irradiance from direction. • Isotropic BRDF
Measurement • Modeling Bidirectional Reflectance Distribution Functions(BRDFs)
Measurement • A “fast” BRDF measurement device inspired by Marshner[1998]
Measurement • 20-80 million reflectance measurements per material • Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins
Measurement • 20-80 million reflectance measurements per material • Each tabulated BRDF entails 90x90x180x3=4,374,000 measurement bins
Nickel Hematite Gold Paint Pink Felt Rendering from Tabulated BRDFs • Even without further analysis, our BRDFs are immediately useful • Renderings made with Henrik Wann Jensen’s Dali renderer
BRDFs as Vectors in High-Dimensional Space • Each tabulated BRDF is a vector in 90x90x180x3 =4,374,000 dimensional space 180 Unroll 90 90 4,374,000
20 mean 5 10 30 45 60 all Linear Analysis (PCA) Eigenvalue magnitude • Find optimal “linear basis”for our data set • 45 componentsneeded to reduce residue to under measurement error 120 100 60 80 40 20 0 Dimension
Problems with Linear Subspace Modeling • Large number of basis vectors (45) • Some linear combinations yield invalid or unlikely BRDFs (outside convex hull)
Problems with Linear Subspace Modeling • Large number of basis vectors (45) • Some linear combinations yield invalid or unlikely BRDFs (inside convex hull)
Results of Non-LinearManifold Learning • At 15 dimensions reconstruction error is less than 1% • Parameter count similar to analytical models Error 5 10 15 Dimensionality
Non-Linear Advantages • 15-dimensional parameter space • More robust than linear model • More extrapolations are plausible Linear Model Extrapolation Non-linear Model Extrapolation