1.03k likes | 1.07k Views
Deep Learning Tutorial. Courtesy of Hung- yi Lee. Machine Learning Basics. Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed. Labeled Data. Machine Learning algorithm. Training. Prediction. Learned model. Data.
E N D
Deep Learning Tutorial Courtesy of Hung-yi Lee
Machine Learning Basics Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed Labeled Data Machine Learning algorithm Training Prediction Learned model Data Prediction Methods that can learn from and make predictions on data
Types of Learning Supervised: Learning with a labeled training set Example: email classificationwith already labeled emails Unsupervised: Discover patterns in unlabeled data Example: clustersimilar documents based on text Reinforcement learning: learn to act based on feedback/reward Example: learn to play Go, reward: win or lose class A class A Classification Regression Clustering Anomaly Detection Sequence labeling … http://mbjoseph.github.io/2013/11/27/measure.html
ML vs. Deep Learning Most machine learning methods work well because of human-designed representationsand input features ML becomes just optimizing weights to best make a final prediction
What is Deep Learning (DL) ? A machine learning subfield of learning representationsof data. Exceptional effective at learning patterns. Deep learning algorithms attempt to learn (multiple levels of) representation by using a hierarchy of multiple layers If you provide the system tons of information, it begins to understand it and respond in useful ways. https://www.xenonstack.com/blog/static/public/uploads/media/machine-learning-vs-deep-learning.png
Why is DL useful? • Manually designed features are often over-specified, incompleteand take a long time to design and validate • Learned Features are easy to adapt, fast to learn • Deep learning provides a very flexible, (almost?) universal, learnable framework for representing world, visual and linguistic information. • Can learn both unsupervised and supervised • Effective end-to-end joint system learning • Utilize large amounts of training data In ~2010 DL started outperforming other ML techniques first in speech and vision, then NLP
Image Classification: A core task in ComputerVision (assume given set of discretelabels) {dog, cat, truck, plane,...} cat This imageby Nikitais licensed under CC-BY2.0 7 Lecture 2-
The Problem: SemanticGap What the computersees An image is just a big grid of numbers between [0,255]: e.g. 800 x 600 x3 (3 channelsRGB) This imageby Nikitais licensed under CC-BY2.0 8 Lecture 2-
Challenges: Viewpointvariation All pixels changewhen the cameramoves! This imageby Nikitais licensed under CC-BY2.0 9 Lecture 2-
Challenges:Illumination This imageis CC0 1.0publicdomain This imageis CC0 1.0publicdomain This imageis CC0 1.0publicdomain This imageis CC0 1.0publicdomain Lecture 2-
Challenges:Deformation This imageby Tom Thaiis licensed under CC-BY2.0 This imageby sare bearis licensed under CC-BY2.0 This imageby Umberto Salvagnin is licensed under CC-BY2.0 This imageby Umberto Salvagnin is licensed under CC-BY2.0 Lecture 2-
Challenges:Occlusion This image by jonssonis licensed under CC-BY2.0 This imageis CC0 1.0publicdomain This imageis CC0 1.0publicdomain Lecture 2-
Challenges: BackgroundClutter This imageis CC0 1.0publicdomain This imageis CC0 1.0publicdomain Lecture 2-
Challenges: Intraclassvariation This imageis CC0 1.0publicdomain Lecture 2- April 5,2018
LinearClassification Lecture 2- April 5,2018
RecallCIFAR10 50,000 training images each image is32x32x3 10,000 testimages. Lecture 2- 18
ParametricApproach Image 10 numbersgiving classscores f(x,W) Array of 32x32x3 numbers (3072 numberstotal) W parameters orweights Lecture 2- April 5,2018
f(x,W) =Wx Parametric Approach: LinearClassifier Image 10 numbersgiving classscores f(x,W) Array of 32x32x3 numbers (3072 numberstotal) W parameters orweights Lecture 2- 018
Parametric Approach: LinearClassifier 3072x1 f(x,W) =Wx 10x1 10x3072 f(x,W) Image 10 numbersgiving classscores Array of 32x32x3 numbers (3072 numberstotal) W parameters orweights Lecture 2- April 5,2018
Parametric Approach: LinearClassifier 3072x1 f(x,W) = Wx+ b 10x1 Image 10x1 10x3072 f(x,W) 10 numbersgiving classscores Array of 32x32x3 numbers (3072 numberstotal) W parameters orweights Lecture 2- April 5,2018
Example with an image with 4 pixels, and 3 classes(cat/dog/ship) Stretch pixels intocolumn Catscore + = Dogscore Shipscore Inputimage b W Lecture 2-
Example with an image with 4 pixels, and 3 classes(cat/dog/ship) AlgebraicViewpoint f(x,W) =Wx Lecture 2-
Example with an image with 4 pixels, and 3 classes(cat/dog/ship) Inputimage AlgebraicViewpoint f(x,W) =Wx W -1.2 1.1 3.2 b -96.8 437.9 61.95 Score Lecture 2-
Interpreting a LinearClassifier Lecture 2- April 5,2018
Interpreting a Linear Classifier: GeometricViewpoint f(x,W) = Wx +b Array of 32x32x3 numbers (3072 numberstotal) Plot created using WolframCloud Cat imageby Nikitais licensed under CC-BY2.0 Lecture 2-
Hard cases for a linearclassifier Class1: First and thirdquadrants Class1: 1 <= L2 norm <=2 Class 2: Everythingelse Class 1: Threemodes Class 2: Everythingelse Class2: Second and fourthquadrants Lecture 2-
Linear Classifier: ThreeViewpoints VisualViewpoint AlgebraicViewpoint GeometricViewpoint Hyperplanes cutting upspace Onetemplate perclass f(x,W) =Wx Lecture 2-
Deep learning attracts lots of attention. • Google Trends 2007 2009 2011 2013 2015
How the Human Brain learns • In the human brain, a typical neuron collects signals from others through a host of fine structures called dendrites. • The neuron sends out spikes of electrical activity through a long, thin stand known as an axon, which splits into thousands of branches. • At the end of each branch, a structure called a synapse converts the activity from the axon into electrical effects that inhibit or excite activity in the connected neurons.
A Neuron Model • When a neuron receives excitatory input that is sufficiently large compared with its inhibitory input, it sends a spike of electrical activity down its axon. Learning occurs by changing the effectiveness of the synapses so that the influence of one neuron on another changes. • We conduct these neural networks by first trying to deduce the essential features of neurons and their interconnections. • We then typically program a computer to simulate these features.
A Simple Neuron • An artificial neuron is a device with many inputs and one output. • The neuron has two modes of operation; • the training mode and • the using mode.
A Simple Neuron (Cont.) • In the training mode, the neuron can be trained to fire (or not), for particular input patterns. • In the using mode, when a taught input pattern is detected at the input, its associated output becomes the current output. If the input pattern does not belong in the taught list of input patterns, the firing rule is used to determine whether to fire or not. • The firing rule is an important concept in neural networks and accounts for their high flexibility. A firing rule determines how one calculates whether a neuron should fire for any input pattern. It relates to all the input patterns, not only the ones on which the node was trained on previously.
Part I: Introduction of Deep Learning What people already knew in 1980s
Example Application • Handwriting Digit Recognition Machine “2”
Handwriting Digit Recognition Input Output y1 y2 is 1 0.1 …… is 2 0.7 The image is “2” y10 …… …… 0.2 is 0 16 x 16 = 256 Each dimension represents the confidence of a digit. Ink → 1 No ink → 0
Example Application • Handwriting Digit Recognition y1 y2 Machine …… “2” …… y10 In deep learning, the function is represented by neural network
Element of Neural Network Neuron … Activation function weights bias
Neural Network neuron Input Layer 1 Layer 2 Layer L Output y1 …… …… y2 …… …… …… …… …… …… yM Output Layer Input Layer Hidden Layers Deep means many hidden layers
Example of Neural Network 0.98 4 1 1 -2 1 0.12 -1 -2 -1 1 0 Sigmoid Function
Example of Neural Network 0.98 0.62 4 0.86 2 3 1 1 -1 -1 -2 0 2 -2 1 0 0 0.12 -2 -1 -1 0.83 -2 0.11 -1 -1 4 1
Example of Neural Network 0.72 0.51 0.73 2 3 1 0 -1 -1 -2 1 0 0 -2 2 0 0.5 -2 -1 -1 0.85 0.12 0 -1 4 1 Different parameters define different function
Matrix Operation 0.98 4 1 1 -2 1 0.12 -1 -2 -1 1 0
Neural Network y1 …… W1 WL W2 y2 …… bL b2 b1 …… …… …… …… …… x a1 y a2 …… yM x a1 bL b2 b1 W1 W2 WL + + + aL-1
Neural Network y1 …… W1 WL W2 y2 …… bL b2 b1 …… …… …… …… …… x a1 y a2 …… yM Using parallel computing techniques to speed up matrix operation y x x bL b1 b2 W1 WL W2 … … + + + x
Softmax • Softmax layer as the output layer Ordinary Layer In general, the output of network can be any value. May not be easy to interpret
Softmax • Probability: • Softmax layer as the output layer Softmax Layer 0.88 3 20 0.12 1 2.7 ≈0 0.05 -3
How to set network parameters …… y1 is 1 0.1 …… y2 0.7 is 2 Softmax …… …… …… …… y10 is 0 0.2 16 x 16 = 256 Set the network parameters such that …… Ink → 1 No ink → 0 How to let the neural network achieve this Input: y1 has the maximum value Input: y2 has the maximum value
Training Data • Preparing training data: images and their labels “1” “0” “4” “5” “3” “2” “9” “1” Using the training data to find the network parameters.
Cost Given a set of network parameters , each example has a cost value. “1” 1 …… y1 0.2 0 …… y2 0.3 Cost …… …… …… …… …… 0 …… y10 0.5 target Cost can be Euclidean distance or cross entropy of the network output and target
Total Cost For all training data … Total Cost: NN NN How bad the network parameters is on this task yR xR y1 y2 y3 x1 x2 x3 NN …… …… Find the network parameters that minimize this value …… …… NN