170 likes | 185 Views
ECE 517 Reinforcement Learning in Artificial Intelligence Lecture 21: Deep Machine Learning. November 8 , 2010. Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010. RL and General AI.
E N D
ECE 517 Reinforcement Learning in Artificial IntelligenceLecture 21: Deep Machine Learning November 8, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering and Computer Science The University of Tennessee Fall 2010
RL and General AI • RL seems like a good AI framework • Some pieces are missing • Long/short term memory: what is the optimal value (or cost-to-go) function to be used? • How do we treat multi-dimensional reward signals? • How do we deal with high-dimensional inputs (observations)? • How to generalize to address an near-infinite state space? • How long will it take to train such a system? • If we want to use hardware – how do we go about doing it? • Storage capacity – human brain ~1014 synapses (i.e. weights) • Processing power - ~1011 neurons • Communications – fully or partially connected architectures
Why Deep Learning? • Mimicking the way the brain represents information is a key challenge • Deals efficiently with high-dimensionality • Handle multi-modal data fusion • Capture temporal dependencies spanning large scales • Incremental knowledge acquisition • The challenge with high-dimensionality • Real-world problem • Curse of dimensionality (Bellman) • Spatial and temporal dependencies • How to represent key features??
Main application: classification ROI detection Feature Extraction Classification 106 104 102 • Hard (unsolved) problem due to … • High-dimensionality data • Distortions (noise, rotation, displacement, perspective, lighting conditions, etc.) • Partial observability • Mainstream approach …
The power of hierarchical representation • Core idea: partition high-dimensional data to small patches, model them and discover dependencies between them • Decomposes the problem • Suggests a trade off more scope less detail • Key ideas: • Basic cortical circuit • Massively parallel architecture • Discovers structure based onregularities in the observations • Multi-modal • Goal: situation/state inference
The power of hierarchical representation (con’t) • Hypothesis: the brain represents information using a hierarchical architecture that comprises basic cortical circuits • Effective way of dealing with large-scale POMDPs • DL – state inference • RL – for decision making under uncertainty • Suggest a semi-supervised learningframework • Unsupervised – learns structure ofnatural data • Supervised – mapping states toclasses
The Deep Learning Theory • Basic idea is to decompose the large image into smaller images that can each be modeled • The hierarchy is one of abstraction • Higher levels of the state represent more abstract notions • The higher the layer the more scope it encompasses and less detail it offers • Multi-scale spatial-temporal context representation • Lower levels interpret or control limited domains of experience, or sensory systems • Connections from the higher level states predispose some selected transitions in the lower-level state machines
Inspiration: Role of Cerebral Cortex • The cerebral cortex (aka neocortex), made up of four lobes, is involved in many complex cognitive functions including: memory, attention, perceptual awareness, "thinking", language and consciousness • The cortex is the primary brain subsystem responsible for learning … • Rich in neurons (>80% in human brain) • It is the one embedding the hierarchicalauto-associative memory architecture • Receives sensory information from many different sensory organs e.g.: eyes, ears,etc. and processes the information • Areas that receive that particularinformation are called sensory areas
Deep Machine Learning – general framework The lower layers predict short-term sequences As you go higher in the hierarchy – “less accuracy, broader perspective” Analogy to a general commanding an army, or poem being recited “Surprise” sequences should propagate up to the appropriate layer
DL for Invariant Pattern Recognition • Initial focus on the visual cortex • Offers an invariant visual pattern recognition in the visual cortex • Recognizing objects despite different scaling, rotations and translations is something humans perform without conscious effort • Lighting conditions, various noises (additive, multiplicative) • Currently difficult for machines learning to achieve • The approach taken is that geometric invariance is linked to motion • When we look at an object, the patterns on our retina change a lot while the object (cause) remains the same • Thus, learning persistent patterns on the retina would correspond to learning objects in the visual world • Associating patterns with their causes corresponds to invariant pattern recognition
DL for Invariant Pattern Recognition (cont’) • Each level in the system hierarchy has several modules that model cortical regions • A module can have several children and one parent, thus modules are arranged in a tree structure • The bottom most level is called level 1 and the level number increases as you go up in the hierarchy • Inputs go directly to the modules at level 1 • The level 1 modules have small receptive fieldscompared to the size of the total image, i.e., these modules receive their inputs from a small patch of the visual field • Several such level 1 modules tile the visual field, possibly with overlap
General System Architecture • Thus a level 2 module covers more of the visual field compared to a level 1 module. However, a level 2 module gets it information only through a level 1 module • This pattern is repeated in the hierarchy • Receptive field sizes increase as one goes up the hierarchy • The module at the root of the tree covers the entire visual field, by pooling inputs from its child modules
Learning Framework • Let Xn(1)and Xn(2) denote the sequence of inputs to modules 1 and 2 • Learning occurs in three phases: • First, each module learns the most likely sequences of its inputs • Second, each module passes an index of its most-likely observed input sequence • Third, each module learns the most frequent “coincidences” of indices originating from the lower layer modules • Next …
Contextual Embedding (if exists) • Feedback loop from layer 2 back to layer 1 (its children) • This feedback provides contextual inference (from higher layers) • This stage is initiated once the level 2 module has formed its alphabet, Yk • Lower layer nodes eventually learn the CPD matrix P(X(1)|Y)
Bayesian Network Obtained • Bottom layer random variables correspond to quantizations on input patterns • The r.v. at the intermediate layers represent object parts that move together persistently • R.V. at the top layer correspond to objects
Learning algorithm (cont.) • After the system has learned (seen many example) and obtained the CPD at each layer, we seek where I is the image observed. • A Bayesian Belief Propagation method is typically used to determine the above, based on hierarchy of beliefs • Drawbacks of current schemes • No “natural” spatiotemporal information representation • Layer-by-layer training is needed • Modality independent (most current schemes limited to image data sets)
Alternative Explanations for Biological Phenomena • Physiological experiments found that neurons sometimes respond to illusory contours in a Kanizsa figure • In other words, a neuron responds to a contour that does not exist in its receptive field • Possible interpretation: activity of a neuron represents the probability that a contour should be present • Originates from its own state and the state information of higher-level neurons • DL based explanation for this phenomena • Contrary to current hypothesis that assume “signal subtraction” occurs for some reason