1 / 41

Estimating Intrinsic Dimension

Estimating Intrinsic Dimension. Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James. Outline. Introduction Nearest Neighborhood Estimators Regression Estimator Maximum Likelihood Estimator Revised Maximum Likelihood Estimator Comparison Summary. 2.

tana
Download Presentation

Estimating Intrinsic Dimension

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Estimating Intrinsic Dimension Justin Eberhardt UMD, Mathematics and Statistics Advisor: Dr. Kang James

  2. Outline • Introduction • Nearest Neighborhood Estimators • Regression Estimator • Maximum Likelihood Estimator • Revised Maximum Likelihood Estimator • Comparison • Summary 2

  3. Intrinsic Dimension Definition • The least number of parameters required to generate a dataset • Minimum number of dimensions that describes a dataset without significant loss of feature 3

  4. z x y Ex 1: Intrinsic Dimension Flatten (Unroll) y x Int Dim = 2 4

  5. Ex 2: Intrinsic Dimension 1 28 56 28 X 28 One Image: 784 Dimensional

  6. No Loop Top & Bottom Loop Ex 2: Intrinsic Dimension [Isomap Project, J. Tenenbaum & J. Langford, Stanford] Int Dim = 2 6

  7. Applications • Biometrics • Facial Recognition, Fingerprints, Iris • Genetics 7

  8. Why do we need to reduce dimensionality? • Low dimensional datasets are more efficient • Not even supercomputers can handle very high-dimensional matrices • Data in 1,2 and 3 dimensions can be visualized 8

  9. Ex: Facial Recognition in MN • 5 Million People • 2 Images per Person (Front and Profile) • 1028 X 1028 Pixels per Image (1 Megapixel) • Total Memory Required: • n = 5,000,000 • p = (2)(1028)(1028)= 2.11 Million Dimensions • Matrix Size: (5 x 106)(2.11 x 106) = 10 billion cells • Memory: 2(10 x 1012) = 20 x 1012 = 20 Terabytes

  10. Intrinsic Dimension Estimators Objective: To find a simple formula that uses nearest neighbor (NN) information to quickly estimate intrinsic dimension 10

  11. Intrinsic Dimension Estimators Project Description: Through simulation, we will compare the effectiveness of three proposed NN intrinsic dimension estimators. 11

  12. Intrinsic Dimension Estimators Note: Traditional methods for estimating Intrinsic Dimension, such as PCA, fail on non-linear manifolds. 12

  13. Intrinsic Dimension Estimators Nearest-Neighbor Methods • Regression Estimator K. Pettis, T. Bailey, A. Jain & R. Dubes, 1979 • Maximum Likelihood Estimator E. Levina, & P. Bickel, 2005 D. MacKay and Z. Ghahramani, 2005 13

  14. Distance Matrix The distance from x2to x3 Di,j: Euclidean distance from xi to xj 14

  15. Nearest Neighbor Matrix The distance between x2 and the kth NN to x2 Ti,k: Euclidean distance between xi and the kth NN to xi 15

  16. Notation • m: Intrinsic Dimension • p: Dimension of the Raw Dataset • n: Number of Observations • f(x): density pdf for observation x • Tx,k or Tk: distance from observation x to kth NN • N(t,x): # obs within dist t of observation x 16

  17. N(t,x) = 3 t Notation p = 2 m = 1 N = 12 t2 x t1 t3 17

  18. NN Regression Estimator Density of Distance to kth NN (Single Observation, appx as Poisson) 1 Expected Distance to kth NN (Single Observation) 2a Sample-Averaged Distance to kth NN 2b Expected Distance to Sample-Averaged kth NN 3

  19. Trinomial Distribution Binomial Distribution Regression Estimator Distance to Kth NN pdf • Assumptions • f(x) is constant • n is large • f(x)Vt is small 19

  20. Regression Estimator Approximate as Poisson Expected distance to Kth NN

  21. Gk,m Cn Estimate m using simple linear regression 21

  22. Ex: Swiss Roll Dataset m=0.49 22

  23. Datasets Gaussian Sphere Raw Dim = 3 Int Dim = 3 Swiss Roll Raw Dim = 3 Int Dim = 2 Dbl Swiss Roll Raw Dim = 3 Int Dim = 2 Faces: Raw Dimension = 4096, Int Dim ~ 3 to 5 23

  24. ResultsRegression Estimator ~ 3.0 ~ 2.0 ~ 2.0 ~ 3.5 FACES K = N / 100 24

  25. NN Maximum Likelihood Estimator Counting Process Binomial (appx as Poisson) 1 Joint Counting Probability Joint Occurrence Density 2 Log-likelihood Function 3 4

  26. Maximum Likelihood Estimator N(t,x) = # Counts within Distance t of x # Counts btw Distance r and s is BIN 26

  27. Maximum Likelihood Estimator

  28. Joint pdf of Distances to K NN 28

  29. Log-Likelihood Function 29

  30. E. Levina & P. Bickel Averaging over N observations Averaging inverses over N observations (Using MLE) D. MacKay & Z. Ghahramani 30

  31. ResultsMLE Estimator (Revised MacKay & Ghahramani) ~ 3.0 ~ 2.0 ~ 2.1 ~ 3.5 FACES K = N / 100 31

  32. Comparison 32

  33. Comparison 33

  34. Comparison 34

  35. Comparison 35

  36. Comparison 36

  37. Comparison 37

  38. Isomap 38

  39. Summary • The regression and revised MLE estimators share similar characteristics when intrinsic dimension is small • As intrinsic dimension increases, the estimators become more dependent on K • Distribution type does not appear to be highly influential when the intrinsic dimension is small 39

  40. Thank You! • Dr. Kang James & Dr. Barry James • Dr. Steve Trogdon

  41. Example Swiss Roll Data Int Dim = 2

More Related