The Curse of Dimensionality

The Curse of Dimensionality Atul Santosh Tirkey Y7102

Curse of Dimensionality • A term coined by – Richard E. Bellman. • It refers to the problem caused by the exponential increase in the volume associated with the adding extra dimension to mathematical space. • The basic problems associated with increase of dimensionality are – • There aren’t enough observations to make good estimates. • Adding more features can increase the noise, and hence the error

Examples • To sample a unit distance with an accuracy of 0.01 distance between the points 100 evenly spaced points would suffice. • But an equivalent sampling of 10-dimensional unit hypercube with a lattice of spacing of 0.01 between adjacent points would require 10^20 sample points. • comparison of the volume of the hypercube of side 2r and sphere of radius r. • volume of sphere is - • Volume of the cube is -

Dimensionality Reduction

Principle Components Analysis invented by Karl Pearson in 1901 • Idea is to project onto to the subspace which accounts for most of the variance. • Data is projected onto to the eigenvectors of the covariance matrix associated with the • Steps Involved • Calculating the covariance matrix • Calculate the eigenvector and eigenvalues of the covariance matrix • Choosing the components and forming the feature vector • Deriving the new data set

Principle component analysis Projection of data along one eigenvector Original data set

Fisher Linear Discriminant Needed because directions of maximum variance maybe useless for classification

Fisher Linear Discriminant Main Idea – Find projection to a line s.t. samples from different classes are well separated.

Fisher Linear Discriminant – Methodology Let µ1 and µ2 be the projection of classes 1 and 2 on a particular line. If Z1, Z2 … Zn are samples then sample mean is Scatter is defined as Let yi = VtZi Target is to find v which makes J(v) large to gurantee that the classes are well separated.

After solving we can find the solution as

Taking on the curse of dimensionality in Joint distributions using neural networks SamyBenigo and YoshuaBenigo IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000 In this paper they propose a new architecture for modeling high dimensional data that requires comparatively less parameters, using a multilayer neural network to represent the joint distribution of the variables as the product of conditional distributions.

Proposed Architecture one can see the network as a kind of auto encoder in which the variable is predicted based on the previous variables The neural network represents the parameterized function The above log probability is computed as the sum of conditional log probabilities. where gi(Z1,…,Zi-1) is the vector-valued output of the ith group of output units, and it gives the value of the parameters of the distribution of Zi, when Z1=z1, Z2=z2,….Zi-1=zi-1 .

Proposed Architecture

Proposed Architecture In the discrete case, we have Where gi,i’ is the ith output element of the vector gi In this example, a softmax output for the ith group may be used to force these parameters to be positive and sum to one, i.e.

Proposed Architecture Hidden units activations may be computed as follows where the c’s are biases and the vj,j’k,k’’s are the weights of the hidden layer [from input unit to hidden unit ] and z’kk’ is the kth element of the vectorial input representation for the value Zk=zk.

Proposed Architecture To optimize the parameters we have simply used gradient-based optimization methods, using conjugate or stochastic (on-line) gradient, to maximize a MAP (maximum a posteriori) criterion, that is the sum of a logprior and the total loglikelihood. They have used a “weight decay” logprior, which gives a quadratic penalty to the parameters θ the inverse variances γi are chosen proportional to the number of weights incoming into a neuron

Questions • what is PCA ? • What is Fisher linear discriminant? • How is FLD better than PCA in classification? • How are weights calculated in the new proposed architecture

The Curse of Dimensionality

The Curse of Dimensionality

Presentation Transcript

Graph Theory vs The Curse of Dimensionality

Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality

The curse of Tutankhamen

THE CURSE OF THE CROSS

The Curse of Raven Lake

Curse of the Pharaoh

The Curse Of The Cowboy

Curse of Dimensionality

Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization

Curse of the Expert

Concept of Dimensionality

Combating the Curse of “More”

50 th Anniversary of The Curse of Dimensionality

THE CURSE OF THE FALL

Curse of the Explorers

The Curse of Fatherlessness

The Curse of Tippecanoe

The Curse Of Jericho

Overcoming the Curse of Dimensionality with Reinforcement Learning

Taming the Curse of Dimensionality: Discrete Integration by Hashing and Optimization

Curse of the Pharaohs