Dimensionality Reduction

Dimensionality Reduction June 2013 Computer Graphics Course

What is high dimensional data? Images Videos Documents Most data, actually!

How many dimensions? • Images – dimension 3·X·Y • This is the number of bytes in the image file • We can treat each byte as a dimension • Each image is a point in high dimensional space • Which space? • “space of images of size X·Y”

How many dimensions? • But we can describe an image using less bytes! • “Blue sky, green grass, yellow road…” • “Drawing of a kong-fu rat”

Why do Dimensionality Reduction? • Visualization: Understanding the structure of data

Why do Dimensionality Reduction? • Visualization: Understanding the structure of data • Fewer dimensions are easy to describe and find correlations (rules) • Compression of data for efficiency • Clustering • Discovering similarities between elements

Why do Dimensionality Reduction? • Curse of dimensionality • 100000000000 • 010000000000 • 001000000000 • 000100000000 • … • All these vectors are the sameEuclidean distance from each other • But some dimensions could be “worth more” • Can you work with 1,000 images of 1,000,000 dimensions?

How to reduce dimensions? • Image features: • Average colors • Histograms • FFT based features (Frequency space) • More… • Video features • Document features • Etc…

How to reduce dimensions? • Feature dimension is still quite high (512, 1024, etc) • What now?

Linear Dimensionality Reduction • Simplest way: Project all points on a plane (2D) ora lower dimension sub-space

Linear Dimensionality Reduction • Simplest way: Project all points on a plane (2D) • Only one question:Which plane is the best? • PCA (SVD)

Linear Dimensionality Reduction • Simplest way: Project all points on a plane (2D) • Only one question:Which plane is the best? • PCA (SVD) • For specificapplications: • CCA (correlation) • LDA (data with labels) • NMF (non-negative components) • ICA (multiple sources)

Non-Linear Dimensionality Reduction • What if data is not linear? • No plane will work here

Non-Linear Dimensionality Reduction • MDS – MultiDimensionalScaling • Use only distances between elements • Try to reconstruct element positions from distancessuch that: • Reconstruction can happen in 1D, 2D, 3D, … • More dimensions = less error

Non-Linear Dimensionality Reduction

Non-Linear Dimensionality Reduction • MDS – MultiDimensionalScaling • Classical MDS: an algebraic solution • Construct a squared proximity matrix using some normalization (“double centering”) • Extract d largest eigenvectors / eigenvalues • Multiply each eigenvector with sqrt(eigenvalue) • Each row is the coordinates of its corresponding point

Non-Linear Dimensionality Reduction • MDS – MultiDimensionalScaling • Classical MDS: an algebraic solution e1 e2 e3 e4 e5 x1 Each vector adds a dimension to the mapping x2 x3 x4 x5 …

Non-Linear Dimensionality Reduction • Non-metric MDS: Optimization problem • Example: Sammon’s projection • Start from random positions for each element • Define stress of the system: • In each step, move towards positions that reduce the stress (gradient descent) • Continue until convergence

Non-Linear Dimensionality Reduction • Spectral embedding: • Create a graph of nearest neighbors • Compute the graph laplacian (relates to probability of walking on each edge in a random walk) • Compute Eigenvalues – why? • Computing Eigenvalues is like multiplying the matrix by itself many many times (towards infinity), which is like performing random walks over and over until we reach a stable point • Again, the eigenvectors are the coordinates • Does not preserve distances like MDS – instead it groups together points that are likely neighbors

Non-Linear Dimensionality Reduction • Other non-linear methods • Locally Linear Embedding (LLE): express each point as a linear combination of its neighbors • Isomap: Takes adjacency graph as input, and calculate MDS of the geodesic distances (distances on the graph) • Self Organizing Maps (SOM): Next part…

Self Organizing Maps& recent applications June 2013 Computer Graphics Course

Self Organizing Maps (SOM) • Originated from neural networks • Created by Kohonen, 1982 • Also known as Kohonen Maps • TeuvoKohonen: A Finnish researcher, learning and neural networks • Due to SOM, became the most cited Finnish scientist! • More than 8,000 citations • So what is it?

What is a SOM? • A type of neural network • What is a neuron? • A function with several inputs and one output • In this case – usually a linear combination of the input according to weights

What is a SOM? no connection (feedback/feed forward) between neurons neurons weights (mik) input (xk)

Training a SOM • Start from random weights • For each input X(t) at iteration t: • Find the Best Matching Cell (BMC) (also called Best Matching Unit or BMU) for X(t) • Update weights for each neuron close to the BMU • Weights are updated according to a decaying learning rate and radius

Training a SOM X(1) neurons (mi) BMC(1) X(2) BMC(2)

Training a SOM – The Math • Best Matching Cell: mc for whichis minimal • Another option for BMC: maximal dot product x(t)Tmc(t) • Weight adaptation: • is a learning rate dependant of both the time and the distance of mi from the BMC mc

Training a SOM – The Math • Example (motion map): distance between BMC and mi learning rate kernel width maximum number of iterations height and width of the neuron map

Presenting a SOM • Option 1: at each node present the data that relates to vector mi (3D data, colors, continuous spaces) • So for a color map with 3 inputs,if a neuron weights are (0.7, 0.2, 0.3) we would show a reddish color with 0.7 red component , 0.2 green component and 0.3 blue component • For a map of points on the plane with 2 inputs, we would draw a point for each neuron in position (Wx, Wy)

Presenting a SOM • Option 1: at each node present the data that relates to vector mi (3D data, colors, continuous spaces)

Presenting a SOM • Option 2: give each neuron a representation from the training set X which is closest to vector mi

More Examples

Motion Map • Motion Map: Image-based Retrieval and Segmentation of Motion Data • Sakamato, Kuriyama, Kenko • SCA: Symposium on Computer Animation 2004 • Goal: Presenting the user with a grid of postures in order to select a clip of motion data from a large database • Perform clustering on the SOM instead of the abstract data

Motion Map • Example results: 436 posture samples from 55K frames of 51 motion files

Motion Map • Example results: Clustering based on SOM

Motion Map - Details • A map of posture samples is created from all motion files together • Each sample similarity to its closest sample is over a given threshold to reduce computation time • A standard SOM is calculated • Each posture is then connected to a hash table of the motion files that contain similar postures • Clustering the SOM enables display of a simplified map to the user (next page)

Motion Map - Details • Simplified map after SOM clustering: 17 dance styles

Procedural Texture Preview • Eurographics 2012 • Goal: Present the user with a single image which shows all possibilities of a procedural texture • Method overview: • Selecting candidate vectors of parameters which maximize completeness, variety and smoothness • Organizing the candidates in a SOM • Synthesis of a continuous map

Procedural Texture Preview • Results texture parameters thumbnails of random parameters texture preview in a single image

Procedural Texture Preview - Details • Selecting candidates for the parameters map using the following optimizations:C = a set of dense samplesX = the candidates in the parameter map • Completeness: minimize • Variety: maximize • Smoothness: minimize

Procedural Texture Preview - Details • A standard SOM will jointly optimize the completeness and the smoothness • To optimize the variety as well, the SOM implementation switches between minimizing Ev and maximizing Ec • Instead of regular learning rate, at each step the candidates (weights vectors) are replaced by a new candidate according to the above optimizations

Procedural Texture Preview - Details • After the candidate selection, an image is synthesized which smoothly combines all selected candidates • Stitching is done using standard patch based texture synthesis methods (Graphcut Textures, Kwarta et al, TOG 2003)

Procedural Texture Preview • Some more results

That’s all folks! • Questions?

Dimensionality Reduction