1 / 100

Lecture 5: Review of Neural Networks

This lecture delves into the basic concepts of artificial neural networks, which are simplified parametric models inspired by the human brain. Neural networks consist of neurons that store information to be learned through weights or states. Various architectures cater to different tasks and are scalable to handle massive data. Key components such as activation functions, loss functions, output units, and architecture are explored. The lecture emphasizes the importance of non-linearity in activation functions and the need to maintain gradients through hidden units. It compares linear models with deep learning, highlighting the efficiency and capacity of deep learning in encoding prior beliefs and generalizing well. The concept of the universal approximation theorem is discussed, suggesting that deeper neural networks can better represent functions with improved accuracy compared to shallow networks.

dorothyw
Download Presentation

Lecture 5: Review of Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5: Review of Neural Networks

  2. Artificial Neural Networks Machine Learning Algorithm Very simplified parametric models of our brain Networks of basic processing units: neurons Neurons store information which has to be learned (the weights or the state for RNNs) Many types of architectures to solve different tasks They can scale to massive data ... ... ... ... ... 99.9% dog

  3. Outline Anatomy of a NN Design choices Learning

  4. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  5. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  6. Anatomy of artificial neural network (ANN) neuron node output input Y X

  7. Anatomy of artificial neural network (ANN) neuron node output input Affine transformation Activation Y X h We will talk later about the choice of activation function.

  8. Anatomy of artificial neural network (ANN) Input layer hidden layer output layer Loss function Output function We will talk later about the choice of the output layer and the loss function.

  9. Anatomy of artificial neural network (ANN) Input layer hidden layer 2 output layer hidden layer 1

  10. Anatomy of artificial neural network (ANN) Input layer hidden layer n output layer hidden layer 1 … … We will talk later about the choice of the number of layers.

  11. Anatomy of artificial neural network (ANN) Input layer hidden layer n 3 nodes output layer hidden layer 1, 3 nodes …

  12. Anatomy of artificial neural network (ANN) Input layer hidden layer n output layer hidden layer 1, m nodes m nodes … … We will talk later about the choice of the number of nodes.

  13. Anatomy of artificial neural network (ANN) Input layer hidden layer n output layer hidden layer 1, m nodes m nodes Number of inputs d … … Number of inputs is specified by the data

  14. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  15. Activation function The activation function should: Ensures not linearity Ensure gradients remain large through hidden unit Common choices are Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut softplus tanh swish

  16. Activation function The activation function should: Ensures not linearity Ensure gradients remain large through hidden unit Common choices are Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut softplus tanh swish

  17. Beyond Linear Models Linear models • Can be fit efficiently (via convex optimization) • Limited model capacity Alternative: Where is a non-linear transform

  18. Traditional ML Manually engineer • Domain specific, enormous human effort Generic transform • Maps to a higher-dimensional space • Kernel methods: e.g. RBF kernels • Over fitting: does not generalize well to test set • Cannot encode enough prior information

  19. Deep Learning Directly learn • where are parameters of the transform defines hidden layers Can encode prior beliefs, generalizes well

  20. Activation function The activation function should: Ensures not linearity Ensure gradients remain large through hidden unit Common choices are Sigmoid Relu, leaky ReLU, Generalized ReLU, MaxOut softplus tanh swish

  21. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  22. Loss Function Cross-entropy between training data and model distribution (i.e. negative log-likelihood) (y|x) Do not need to design separate loss functions. Gradient of cost function must be large enough

  23. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  24. Output Units

  25. Link function X X Y OUTPUT UNIT Y

  26. Output Units

  27. Output Units

  28. Link function multi-class problem X X Y OUTPUT UNIT

  29. Output Units

  30. Output Units

  31. Output Units

  32. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

  33. Architecture

  34. Architecture (cont)

  35. Architecture (cont)

  36. Architecture (cont)

  37. Architecture (cont)

  38. Architecture (cont)

  39. Architecture (cont)

  40. Universal Approximation Theorem Think of Neural Network as function approximation. NN: One hidden layer is enough to represent an approximation of any function to an arbitrary degree of accuracy So why deeper? • Shallow net may need (exponentially) more width • Shallow net may overfit more

  41. What does an astronomer blow with gum? Hubbles

  42. Auditors: Volunteers

  43. Better Generalization with Depth (Goodfellow 2017)

  44. Large, Shallow Nets Overfit More (Goodfellow 2017)

  45. Why layers? Representation Representation Matters

  46. Learning Multiple Components

  47. Depth = Repeated Compositions

  48. Review of Feed Forward Artificial Neural Networks Anatomy of a NN Design choices • Activation function • Loss function • Output units • Architecture Learning

More Related