510 likes | 527 Views
Artificial Intelligence is transforming the world. Deep Learning, an integral part of this new Artificial Intelligence paradigm, is becoming one of the most sought after skills. Learn more about Deep Learning and its Evolution.<br>
E N D
Lecture Series: AI is the New Electricity Deep Learning - SCOPING, EVOLUTION & FUTURE TRENDS Dr. Chiranjit Acharya AILABS Academy J-3, GP Block, Sector V, Salt Lake City, Kolkata, West Bengal 700091 Presented at AILABS Academy, Kolkata on April 18th 2018 Confidential, unpublished property of aiLabs. Do not duplicate or distribute. Use and distribution limited solely to authorized personnel. (c) Copyright 2018
A Journey into Deep Learning ▪Cutting edge technology ▪Garnered traction in both industry and academics ▪Achieves near-human-level performance in many pattern recognition tasks ▪Excels in ▪structured, relational data ▪unstructured rich-media data such as image, video, audio and text AILABS (c) Copyright 2018 2
A Journey into Deep Learning ▪What is Deep Learning? Where is the “deepness”? ▪Where does Deep Learning come from? ▪What are the models and algorithms of Deep Learning? ▪What is the trajectory of evolution of Deep Learning? ▪What are the future trends of Deep Learning? AILABS (c) Copyright 2018 3
A Journey into Deep Learning AILABS (c) Copyright 2018 4
Artificial Intelligence Holy Grail of AI Research ▪Understanding the neuro-biological and neuro- physical basis of human intelligence ▪science of intelligence ▪Building intelligent machines which can think and act like humans ▪engineering of intelligence AILABS (c) Copyright 2018 5
Artificial Intelligence Facets of AI Research ▪knowledge representation ▪Reasoning ▪natural language understanding ▪natural scene understanding AILABS (c) Copyright 2018 6
Artificial Intelligence Facets of AI Research ▪natural speech understanding ▪problem solving ▪Perception ▪Learning ▪planning AILABS (c) Copyright 2018 7
Machine Learning Basic Doctrine of Learning ▪learning from examples Outcome of Learning ▪rules of inference for some predictive task ▪embodiment of the rules = model ▪model is an abstract computing device •kernel machine, decision tree, neural network AILABS (c) Copyright 2018 8
Machine Learning Connotations of Learning ▪process of generalization ▪discovering nature/traits of data ▪unraveling patterns and anti-patterns in data AILABS (c) Copyright 2018 9
Machine Learning Connotations of Learning: ▪knowing distributional characteristics of data ▪identifying causal effects and propagation ▪identifying non-causal co variations & correlations AILABS (c) Copyright 2018 10
Machine Learning Design Aspects of Learning System ▪ Choose the training experience ▪ Choose exactly what is to be learned, i.e. the target function / machine ▪ Choose objective function & optimality criteria ▪ Choose a learning algorithm to infer the target function from the experience. AILABS (c) Copyright 2018 11
Learning Work Flow ▪Stage 1: Feature Extraction, Feature subset selection, Feature Vector Representation ▪Stage 2: Training / Testing Set Creation and Augmentation ▪Stage 3: Training the Inference Machine ▪Stage 4: Running the Inference Machine on Test Set ▪Stage 5: Stratified Sampling and Validation AILABS (c) Copyright 2018 12
Feature Extraction / Selection low-level parts mid-level parts Cognitive Elements high-level parts additional descriptors Domain Expert Corpus Knowledge Engineer Sparse Sparse Coder Representation AILABS (c) Copyright 2018 13
Training Set Augmentation Existing Existing Training Set training set Sparse Samples Representation Random Sampler Reviewer Augmented training set AILABS (c) Copyright 2018 14
Training and Prediction / Recognition Prediction / Recognition Model Adaptive Learner Training Set Unlabelled Residual Corpus Predicted / Recognized Corpus Prediction/ Recognition Model AILABS (c) Copyright 2018 15
Sampling , Validation & Convergence Human Reviewed Stratified sub- samples Predicted Corpus Stratified sub- samples Reviewer Stratified Sampler Precision & Recall Calculator Go back to Training Set Augmentation End of Relevance Scoring No Yes Converged ? AILABS (c) Copyright 2018 16
Evolution of Connectionist Models 1943: Artificial neuron model (McCulloch & Pitts) ▪ "A logical calculus of the ideas immanent in nervous activity" ▪ simple artificial “neurons” could be made to perform basic logical operations such as AND, OR and NOT ▪ known as Linear Threshold Gate ▪ NO learning AILABS (c) Copyright 2018 17
Evolution of Connectionist Models 1943: Artificial neuron model (McCulloch & Pitts) w1j x1 n w2j y s w x b ( ) f s j j x2 j ij i j yj 0 i wnj xn bj AILABS (c) Copyright 2018 18
Evolution of Connectionist Models 1957: Perceptron model (Rosenblatt) ▪ invention of learning rules inspired by ideas from neuroscience if Σ inputi * weighti> threshold, output = +1 if Σ inputi * weighti< threshold, output = -1 ▪ learns to classify input into two output classes ▪ Sigmoid transfer function: boundedness, graduality 1 as y x 0 as y x AILABS (c) Copyright 2018 19
Evolution of Connectionist Models 1943: Artificial neuron model (McCulloch & Pitts) w1j x1 n w2j y s w x b ( ) f s j j x2 j ij i j yj 0 i wnj 1 js 1 e xn bj AILABS (c) Copyright 2018 20
Evolution of Connectionist Models 1960s: Delta Learning Rule (Widrow & Hoff) 1 2 2 ˆ y ( ) E y Define the error as the squared residuals summed over all training cases: ▪ n n n ˆ y w E E w 1 2 n n Now differentiate to get error derivatives for weights ▪ ˆ y n i i n ˆ y ( ) x y , i n n n n The batch delta rule changes the weights in proportion to their error derivatives summed over all training cases ▪ E w w i i AILABS (c) Copyright 2018 21
Evolution of Connectionist Models 1969: Minsky's objection to Perceptrons ▪ Marvin Minsky & Seymour Papert: Perceptrons ▪ Unless input categories are linearly separable, a perceptron cannot learn to discriminate between them. ▪ Unfortunately, it appeared that many important categories were not linearly separable. AILABS (c) Copyright 2018 22
Evolution of Connectionist Models 1969: Minsky's objection to Perceptrons Perceptrons are good at linear classification but ... x1 1 1 1 1 1 1 1 1 1 x2 AILABS (c) Copyright 2018 23
Evolution of Connectionist Models 1969: Minsky's objection to Perceptrons Perceptrons are incapable of simple nonlinear classification like XOR (1) x1 (1) (0) X1 X2 Output 0 0 0 0 1 1 1 0 1 1 1 0 (0) (0) (1) (XOR operation) x2(1) (0) AILABS (c) Copyright 2018 24
Universal Approximation Theorem Existential Version (Kolmogorov) ▪ There exists a finite combination of superposition and addition of continuous functions of single variables which can approximate any continuous, multivariate function on compact subsets of R^d. Constructive Version (Cybenko) ▪ The standard multilayer feed-forward network with a single hidden layer, containing finite number of hidden neurons, is a universal approximator among continuous functions on compact subsets of R^d, under mild assumptions on the activation function. AILABS (c) Copyright 2018 25
Evolution of Connectionist Models 1986: Backpropagation for Multi-Layer Perceptrons (Rumelhart, Hinton & Williams) ▪ solution to Minsky's objection regarding perceptron's limitation ▪ nonlinear classification is achieved by fully connected, multilayer, feedforward networks of perceptrons (MLP) ▪ MLP can be trained by backpropagation ▪ Two-pass algorithm ▪ forward propagation of activation signals from input to output ▪ backward propagation of error derivatives from output to input AILABS (c) Copyright 2018 26
Evolution of Connectionist Models 1986: Backpropagation for Multi-Layer Perceptrons (Rumelhart, Hinton & Williams) Layer 1 Layer 2 Input Output y1 1x 2x y2 … … … … … … … yM … x N Input Layer Hidden Layer Output Layer AILABS (c) Copyright 2018 27
Evolution of Connectionist Models 1986: Backpropagation for Multi-Layer Perceptrons (Rumelhart, Hinton & Williams) ▪ solution to Minsky's objection regarding perceptron's limitation ▪ nonlinear classification is achieved by fully connected, multilayer, feedforward networks of perceptrons (MLP) ▪ MLP can be trained by backpropagation ▪ Two-pass algorithm ▪ forward propagation of activation signals from input to output ▪ backward propagation of error derivatives from output to input AILABS (c) Copyright 2018 28
Machine Learning Example Handwriting Digit Recognition Machine “2” AILABS (c) Copyright 2018 29
Handwriting Digit Recognition Input Output y1 0.1 is 1 y1 1x 2x y2 y2 0.7 is 2 The image is “2” … … … … y10 y1 0.2 is 0 x 256 16 x 16 = 256 Each output represents the confidence of a digit. Color → 1 No color → 0 AILABS (c) Copyright 2018 30
Example Application Handwriting Digit Recognition y1 1x 2x y2 Machine “2 ” y10 … … … … x 256 AILABS (c) Copyright 2018 31
Evolution of Connectionist Models 1989: Convolutional Neural Network (LeCun) neuron Layer 1 Layer 2 Layer L Input Output y1 … … … … 1x 2x y2 … … … … … … … … … yM … … … x N Output Layer Input Layer Hidden Layers Deep means many hidden layers AILABS (c) Copyright 2018 32
Convolutional Neural Network ▪ Input can have very high dimension. Using a fully-connected neural network would need a large amount of parameters. ▪ CNNs are a special type of neural network whose hidden units are only connected to local receptive field. The number of parameters needed by CNNs is much ▪ ▪ smaller. Example: 200x200 image a)fully connected: 40,000 hidden units => 1.6 billion parameters b)CNN: 5x5 kernel (filter), 100 feature maps => 2,500 parameters AILABS (c) Copyright 2018 33
Convolution Operation Patc h AILABS (c) Copyright 2018 34
Convolution Operation in CNN Input: an image (2-D array): x Convolution kernel (2-D array of learnable parameters): w Feature map (2-D array of processed data): s Convolution operation in 2-D domains: ▪ ▪ ▪ ▪ AILABS (c) Copyright 2018 35
Convolution Filters AILABS (c) Copyright 2018 36
Convolution Operation with Filters C AILABS (c) Copyright 2018 37
Convolution Layers Convolution Layer Channels Feature Maps AILABS (c) Copyright 2018 38
3 Stages of a Convolutional Layer AILABS (c) Copyright 2018 39
Non Linear Stage Tanh(x) ReLU AILABS (c) Copyright 2018 40
Evolution of Connectionist Models 2006: Deep Belief Networks (Hinton), Stacked Auto-Encoders (Bengio) neuron Layer 1 Layer 2 Layer L Input Output y1 … … … … 1x 2x y2 … … … … … … … … … yM … … … x N Output Layer Input Layer Hidden Layers Deep means man y hidden layers AILABS (c) Copyright 2018 41
Deep Learning Traditional pattern recognition models use hand-crafted features and relatively simple trainable classifier. “Simple” Trainable Classifier hand-crafted feature extractor output This approach has the following limitations: • It is very tedious and costly to develop hand-crafted features ▪ The hand-crafted features are usually highly dependents on one application, and cannot be transferred easily to other applications AILABS (c) Copyright 2018 42
Deep Learning Deep learning = representation learning Seeks to learn automatically through multiple stage of feature learning process. hierarchical representations (i.e. features) High-level features Mid-level features Trainable classifier Low-level features output Feature visualization of convolutional net trained on ImageNet (Zeiler and Fergus, 2013) AILABS (c) Copyright 2018 43
Learning Hierarchical Representations High-level features Mid-level features Trainable classifier Low-level features output Increasing level of abstraction Hierarchy of representations with increasing level of abstraction. Each stage is a kind of trainable nonlinear feature transformation Image recognition Pixel → edge → motif → part → object Text Character → word → word group → clause → sentence → story AILABS (c) Copyright 2018 44
Pooling Common pooling operations: Max pooling Report the maximum output within a rectangular neighborhood. Average pooling Report the average output of a rectangular neighborhood (possibly weighted by the distance from the central pixel). AILABS (c) Copyright 2018 45
CiFAR10 CiFAR10 AILABS (c) Copyright 2018 46
Deep CNN on CiFAR10 Deep CNN on CiFAR10 AILABS (c) Copyright 2018 47
Deep CNN on CiFAR10 Deep CNN on CiFAR10 AILABS (c) Copyright 2018 48
Deep CNN on CiFAR10 Deep CNN on CiFAR10 AILABS (c) Copyright 2018 49
Future Trends ▪ Different and wider range of problems are being addressed ▪ natural language understanding ▪ natural scene understanding ▪ natural speech understanding ▪ Feature learning is being investigated at deeper level ▪ Manifold learning ▪ Reinforcement learning ▪ Integration with other paradigms of machine learning AILABS (c) Copyright 2018 50