1 / 13

Lecture 8 Why deep?

Lecture 8 Why deep?. We explain deep learning from two aspects Experimental evidence Theoretical proof. 1. Experiments show deeper is better. Not surprised, more parameters, better performance?.

foleyj
Download Presentation

Lecture 8 Why deep?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 8 Why deep? • We explain deep learning from two aspects • Experimental evidence • Theoretical proof

  2. 1. Experiments show deeper is better Not surprised, more parameters, better performance? Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

  3. Fat + Short v.s. Thin + Tall The same number of parameters Which one is better? …… …… …… Deep Shallow

  4. Fat + Short v.s. Thin + Tall Why? Seide, Frank, Gang Li, and Dong Yu. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." Interspeech. 2011.

  5. They call this “modularization” Girls with long hair Classifier 1 Boys with long hair Classifier 2 Image weak Lacking data Girls with short hair Classifier 3 Boys with short hair Classifier 4

  6. Modularization Each basic classifier can have sufficient training examples. Intuitive example: Boy or Girl? Image Basic Classifier Long or short? Classifiers for the attributes

  7. Modularization or deeper reasons? can be trained by little data Girls with long hair Classifier 1 Boy or Girl? Classifier 2 Boys with long hair Image Little data fine Basic Classifier Girls with short hair Classifier 3 Long or short? Boys with short hair Classifier 4 Sharing by the following classifiers as module

  8. Hidden nodes, modularization, features → Less training data? …… The modularization or hidden nodes is automatically learned …… …… …… …… …… …… The most basic classifiers Use 2nd layer as module …… Use 1st layer as module to build classifiers

  9. Image understanding • Levels of features …… …… …… …… …… …… …… Use 2nd layer as module for objects The most basic classifiers Use 1st layer as module to build classifiers Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)

  10. 2. Theoretical proof: Deeper is better M. Telgarsky: The benefit of depth in neural networks, 2016. • We give an informal argument of the Telgarsky’s proof. • Claim 1. Few oscillations can’t fit many oscillations. Proof by picture. Stars mark disagree regions * * * * * * * * * * * * * * * * * * * * * * * * * * * *

  11. Claim 2.ReLU can make exponentially many oscillations (1/2,1) ReLU(x) := max {0,x}, and Let h(x) := ReLU(ReLU(2x) – ReLU(4x-2)) (0,0) (1,0) h(x) 2x x ε [0, ½] 2(1-x) x ε [½, 1] 0 otherwise h(x) = h  h  h (x) h  h (x) h has 1 peak  hk has 2k-1 peaks.

  12. Claim 3. Few layers implies few oscillations. f g f is s-affine, g is t-affine s-affine + t-affine ≤ (s+t−1)-affine --- same layer s-affine  t-affine ≤ (st)-affine --- composition = next layer ReLU is 2-affine, after k levels it is exp(k) affine. Hence with O(1) layers, one needs exponentially many nodes to approximate k layers.

  13. What does this mean? • There exists a function that can be learned by a “deep” neural network with a polynomial number of nodes, but it needs exponentially many nodes for any “shallow” neural network. • Open Question: However, this only says there exists a function, but does not tell us what function. This function might be something we do not care.

More Related