570 likes | 1.42k Views
Deep Learning Tutorial. Mitesh M. Khapra IBM Research India (Ideas and material borrowed from Richard Socher’s tutorial @ ML Summer School 2014 Yoshua Bengio’s tutorial @ ML Summer School 2014 & Hugo Larochelle’s lecture videos & slides). Roadmap. What? Why? How? Where?.
E N D
Deep Learning Tutorial Mitesh M. Khapra IBM Research India (Ideas and material borrowed from Richard Socher’s tutorial @ ML Summer School 2014 YoshuaBengio’s tutorial @ ML Summer School 2014 & Hugo Larochelle’s lecture videos & slides)
Roadmap • What? • Why? • How? • Where?
Roadmap • What are Deep Neural Networks? • Why? • How? • Where?
Roadmap • What are Deep Neural Networks? • Why should I be interested in Deep Learning? • How? • Where?
Roadmap • What are Deep Neural Networks? • Why should I be interested in Deep Learning? • How do I make a Deep Neural Network work? • Where?
Roadmap • What are Deep Neural Networks? • Why should I be interested in Deep Learning? • How do I make train a Deep Neural Network work? • Where?
Roadmap • What are Deep Neural Networks? • Why should I be interested in Deep Learning? • How do I train a Deep Neural Network? • Where can I find additional material?
A typical machine learning example data label number of positive words, number of negative words, length of review, author name, bag of words, etc. feature vector feature extraction
next A typical machine learning example data label
So, where does deep learning fit in? • Machine Learning • hand crafted features • optimize weights to improve prediction • Representation Learning • automaticallylearn features • Deep Learning • automaticallylearn multiple levels of features From RicharSocher’s tutorial @ ML Summer School, Lisbon
back The basic building block single artificial neuron
Okay, so what can I use it for? • For binary classification problems by treating • Works when data is linearly separable (image from Hugo Larochelles’sslides)
What are its limitations? • Fails when data is not linearly separable…. (images from Hugo Larochelles’sslides) • …unless the input is suitably transformed
A neural network for XOR Wait…., are you telling me that I will always have to meditate on the data and then decide the transformation/network ? No, definitely not. The XOR example is only to give the intuition. The key takeaway is that by adding more layers you can make the data separable. A multi-layered neural network Lets spend some more time in understanding this ….
(graphs from Pascal Vincent’s slides) Capacity of a multi-layer network
Capacity of a multi-layer network (image from Pascal Vincent’s slides)
Capacity of a multi-layer network In particular, we can find a separator for the XOR problem (images from from Pascal Vincent’s slides) • Universal Approximation Theorem (Hornik, 1991) : • “a single hidden layer neural network with a linear output unit can approximate any continuous function arbitrary well, given enough hidden units”
Lets take a minute here… If “a single hidden layer neural network” is enough then why go deeper? Hand-crafted featuresrepresentations Automatically learned featuresrepresentations … … …
Multiple layers = multiple levels of features But why would I be interested in learning multiple levels of representations ? Lets see where the motivation comes from…
The brain analogy Layer 1 representation nose Layer 2 representation mouth eyes face Layer 3 representation (idea from Hugo Larochelle’s slides)
YAWN!!!! Enough With the Brain Tampering Just tell me Why should I be interested In Deep Learning?(“Show Me the Money”)
(from Y. Bengio’s MLSS 2014 slides) Used in a wide variety of applications
Industrial Scale Success Stories Speech Recognition Object Recognition Face Recognition Cross Language Learning Machine Translation Text Analytics Disclaimer: Some nodes and edges may be missing due to limited public knowledge Dramatic improvements reported in some cases
(from Y. Bengio’s MLSS 2014 slides) Some more success stories
Let me see if I understand this correctly… • Speech Recognition, Machine Translation, etc. are more than 50 years old • Single artificial neurons have been around for more than 50 years No, even deep neural networks have been around for many, many years but prior to 2006 training deep nets was unsuccessful 50+ years?
(from Y. Bengio’s MLSS 2014 slides) So what has changed since 2006? • New methods for unsupervised pre-training have been developed • More efficient parameter estimation methods • Better understanding of model regularization • Faster machines and more data help DL more than other algorithms
recap single artificial neuron
Switching to slides corresponding to lecture 2 from Hugo Larochelle’s course http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html
Some pointers to additional material • http://deeplearning.net/ • http://info.usherbrooke.ca/hlarochelle/neural_networks/content.html