1 / 35

Kernel Analysis of Deep Networks

This paper explores the role of depth in deep learning networks and measures their goodness of representations using kernel methods. It investigates the simplicity, dimensionality, and accuracy of representations, and provides insights into the effects of architecture on representation. The experimental results and analysis shed light on the black box nature of deep learning and offer potential future directions for research.

mmueller
Download Presentation

Kernel Analysis of Deep Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By: Gregoire Montavon Mikio L. Braun Klaus-Robert Muller (Technical University of Berlin) JMLR 2011 Presented by: Behrang mehrparvar (University of Houston) April 8th, 2014 Kernel Analysis ofDeep Networks

  2. Roadmap • Deep Learning • Goodness of Representations • Measuring goodness • Role of architecture

  3. Deep Learning? • Distributed representation • Less examples in regions • Capture global structure • Depth • Efficient representation • Abstraction • Higher-level features • Flexibility • Incorporate prior knowledge

  4. Distributed Representation [1]

  5. Depth [2]

  6. Abstraction [?]

  7. Problem Specification • Deep Learning is still a Black Box! • Theoretical aspect • e.g. studying depth in sum-product networks • Analytical arguments • e.g. analysis of depth • Experimental results • e.g. performance in application domains • Visualization • e.g. measuring invariance

  8. Kernel Methods • Decouples learning algorithms from data representation • Kernel operator: • Measures similarity between points • All the prior knowledge of the learning problem • In this paper: • Not a learning machine • Abstraction tool to model the deep network

  9. Kernel Methods (cont.) • Kernel Methods • model the deep network • Used to quantify ... • the goodness of representations • the evolution of good representations

  10. Hypothesis • Simpler and more accurate representation throughout the depth • Structure of the network (restrictions) define the speed of how representations are formed • Evolution from dist. of pixels to dist. of classes

  11. Problem Specification • Problem: Role of depth in goodness of representation • Challenge: Definition and Measurement for goodness • Solution: • Simplicity • Dimensionality: number of kernel PCs • Number of local variations • Accuracy • Classification error

  12. Hypothesis (Cont.)

  13. Method • Train the deep network • Infer the representation of each layer • Apply kernel PCA on each layer representations • Project data points on first d eigenvectors • Analyze the results

  14. Method (Analysis)

  15. Why Kernels? • Incorporating prior knowledge • Measurable simplicity and accuracy • Theoretical framework and convergence bounds [3] • Flexibility

  16. Dimensionality and Complexity

  17. Dimensionality and Complexity (cont.)

  18. Intuition • Accuracy • Task-relevant information • Simplicity • Number of allowed local variations in the inputs space • However, does not explain domain-specific regularities • Robust to number of samples • vs. number of support vectors

  19. Effects of Kernel mapping

  20. Experiment setup • Datasets • MNIST • CIFAR • Tasks • Supervised learning • Transfer learning • Architectures • Multilayer perceptron (MLP) • Pretrained multilayer perceptron (PMLP) • Convolutional neural networks (CNN)

  21. Effect of Settings

  22. Effect of Depth (Hyp. 1)

  23. Observation • Higher layers • More accurate representations • More simple representations

  24. Architectures • Multilayer Perceptrons • No preconditioning on learning problem • Prior: NONE • Pretrained Multilayer perceptrons • Better represents the underlying representation • Contains a certain part of soluton • Prior: generative model of input • Convolutional Neural Networks • Prior: Spatial invariance

  25. Multilayer Perceptron [4]

  26. Convolutional Neural Networks [4]

  27. Effect of Architecture (Hyp. 2)

  28. Observation • MNIST: • MLP: Discriminating is solved greedily • PMLP and CNN: postpone to last layers • CIFAR • MLP: Doesn't discriminate till last layer • PMLP and CNN: spread it to more layers WHY?! • Good observation, but no explanation! • Hints: dataset, priors, etc. ?

  29. Effect of Architecture (Cont.)

  30. Observation • Regularities in PMLP and CNN • Facilitate the construction of a structured solution • Controls the rate of discrimination at every level

  31. Label Contribution of PCs

  32. Comments • Strengths • Important and interesting problem • Simple and intuitive approach • Well designed experiments • Good analysis of results • Weaknesses • Too many observations • e.g. role of sigma in scale invariance • explaining observations

  33. Future works? • Experiments on Unsupervised Learning • Explaining the results • Analysis on biological neural systems?!

  34. References • Bengio, Yoshua, and Olivier Delalleau. "On the expressive power of deep architectures." Algorithmic Learning Theory. Springer Berlin Heidelberg, 2011. • Poon, Hoifung, and Pedro Domingos. "Sum-product networks: A new deep architecture." Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on. IEEE, 2011. • Braun, Mikio L., Joachim M. Buhmann, and Klaus-Robert Müller. "On relevant dimensions in kernel feature spaces." The Journal of Machine Learning Research 9 (2008): 1875-1908. • http://deeplearning.net/

  35. Thanks ...

More Related