Estimating Intrinsic Dimension

Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James

Outline • Introduction • Nearest Neighborhood Estimators • Regression Estimator • Maximum Likelihood Estimator • Revised Maximum Likelihood Estimator • Comparison • Summary 2

Intrinsic Dimension Definition • The least number of parameters required to generate a dataset • Minimum number of dimensions that describes a dataset without significant loss of feature 3

z x y Ex 1: Intrinsic Dimension Flatten (Unroll) y x Int Dim = 2 4

Ex 2: Intrinsic Dimension 1 28 56 28 X 28 One Image: 784 Dimensional

No Loop Top & Bottom Loop Ex 2: Intrinsic Dimension [Isomap Project, J. Tenenbaum & J. Langford, Stanford] Int Dim = 2 6

Applications • Biometrics • Facial Recognition, Fingerprints, Iris • Genetics 7

Why do we need to reduce dimensionality? • Low dimensional datasets are more efficient • Not even supercomputers can handle very high-dimensional matrices • Data in 1,2 and 3 dimensions can be visualized 8

Ex: Facial Recognition in MN • 5 Million People • 2 Images per Person (Front and Profile) • 1028 X 1028 Pixels per Image (1 Megapixel) • Total Memory Required: • n = 5,000,000 • p = (2)(1028)(1028)= 2.11 Million Dimensions • Matrix Size: (5 x 106)(2.11 x 106) = 10 billion cells • Memory: 2(10 x 1012) = 20 x 1012 = 20 Terabytes

Intrinsic Dimension Estimators Objective: To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension 10

Intrinsic Dimension Estimators Project Description: Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators. 11

Intrinsic Dimension Estimators Note: Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds. 12

Intrinsic Dimension Estimators Nearest-Neighbor Methods • Regression Estimator K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979 • Maximum Likelihood Estimator E. Levina, & P. Bickel, 2005 D. MacKay and Z. Ghahramani, 2005 13

Distance Matrix The distance from x2to x3 Di,j: Euclidean distance from xi to xj 14

Nearest Neighbor Matrix The distance between x2 and the kth NN to x2 Ti,k: Euclidean distance between xi and the kth NN to xi 15

Notation • m: Intrinsic Dimension • p: Dimension of the Raw Dataset • n: Number of Observations • f(x): density pdf for observation x • Tx,k or Tk: distance from observation x to kth NN • N(t,x): # obs within dist t of observation x 16

N(t,x) = 3 t Notation p = 2 m = 1 N = 12 t2 x t1 t3 17

NN Regression Estimator Density of Distance to kth NN (Single Observation, appx as Poisson) 1 Expected Distance to kth NN (Single Observation) 2a Sample-Averaged Distance to kth NN 2b Expected Distance to Sample-Averaged kth NN 3

Trinomial Distribution Binomial Distribution Regression Estimator Distance to Kth NN pdf • Assumptions • f(x) is constant • n is large • f(x)Vt is small 19

Regression Estimator Approximate as Poisson Expected distance to Kth NN

Gk,m Cn Estimate m using simple linear regression 21

Ex: Swiss Roll Dataset m=0.49 22

Datasets Gaussian Sphere Raw Dim = 3 Int Dim = 3 Swiss Roll Raw Dim = 3 Int Dim = 2 Dbl Swiss Roll Raw Dim = 3 Int Dim = 2 Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5 23

ResultsRegression Estimator ~ 3.0 ~ 2.0 ~ 2.0 ~ 3.5 FACES K = N / 100 24

NN Maximum Likelihood Estimator Counting Process Binomial (appx as Poisson) 1 Joint Counting Probability Joint Occurrence Density 2 Log-likelihood Function 3 4

Maximum Likelihood Estimator N(t,x) = # Counts within Distance t of x # Counts btw Distance r and s is BIN 26

Maximum Likelihood Estimator

Joint pdf of Distances to K NN 28

Log-Likelihood Function 29

E. Levina & P. Bickel Averaging over N observations Averaging inverses over N observations (Using MLE) D. MacKay & Z. Ghahramani 30

ResultsMLE Estimator (Revised MacKay & Ghahramani) ~ 3.0 ~ 2.0 ~ 2.1 ~ 3.5 FACES K = N / 100 31

Comparison 32

Comparison 33

Comparison 34

Comparison 35

Comparison 36

Comparison 37

Isomap 38

Summary • The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small • As intrinsic dimension increases, the estimators become more dependent on K • Distribution type does not appear to be highly influential when the intrinsic dimension is small 39

Thank You! • Dr. Kang James & Dr. Barry James • Dr. Steve Trogdon

Example Swiss Roll Data Int Dim = 2

Estimating Intrinsic Dimension

Estimating Intrinsic Dimension

Presentation Transcript

The Intrinsic Dimension of Metric Spaces

Intrinsic Evil

Intrinsic Acne Treatment

Intrinsic Elements

Intrinsic Motivation

Intrinsic Motivation

Intrinsic Motivation

Dimension

Techniques for Estimating Intrinsic Resolution

Dimension

Estimation of the Intrinsic Dimension

Maximum likelihood estimation of intrinsic dimension

The Intrinsic Silicon

Intrinsic semiconductor

Embedding Metric Spaces in Their Intrinsic Dimension

Intrinsic Freestanding

INTRINSIC RV Trial

Intrinsic Safety Overview

Intrinsic Images

Intrinsic Viscosity

Intrinsic Apoptosis Pathway

Estimating