1 / 27

Today’s Lecture

Today’s Lecture. Administrative Details Learning Decision trees: cleanup & details Belief nets Sub-symbolic learning Neural networks. Administrativia. Assignment 3 now available. Due in 1 week. <http://www.cim.mcgill.ca/~dudek/424/game.pdf>

Download Presentation

Today’s Lecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s Lecture • Administrative Details • Learning • Decision trees: cleanup & details • Belief nets • Sub-symbolic learning • Neural networks CS-424 Gregory Dudek

  2. Administrativia • Assignment 3 now available. Due in 1 week. <http://www.cim.mcgill.ca/~dudek/424/game.pdf> • The game for the final project has been defined. It’s description can be found via the course home page at:<http://www.cim.mcgill.ca/~dudek/424/game.pdf> • Examine the game now, try it, see what good strategies might be. • Note: the normal late policy will not apply to the project. • You **must** submit the electronic (executable) on time, or it may not be evaluated (i.e. you get zero)! • It must run on LINUX. Be certain to compile and test it on one of the linux machines in the lab well before it’s due. • If you are developing on another platform, regularly test on linux during development. CS-424 Gregory Dudek

  3. ID3 (more…) • Last class, discussed using entropy to select a question for building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5 • Recap: • Entropy for classification into sets p+ and p- is I(p+,p-) • Consider information gain per attribute A. • For each subtree Ai we have a few bits given by the distribution of cases on the subtree I(pi+,pi-). To fully classify all sub-cases, we need Remainder(A) = i weight(i) I(pi+,pi-) • Thus, info gain is what’s left: Gain = I(p+,p-) - Remainder(A) CS-424 Gregory Dudek

  4. Final thoughts on entropy... • Provoked by the seminar yesterday. • Idea: • Entropy tells you about unpredictability, or randomness. • When selecting a question, the one with highest entropy will carry the most information with respect to what you knew, because the answer is hardest to predict. • When asking if we know about something, then high entropy is bad • Consider the PDF of a robot’s position estimate… More entropy means more uncertainty. CS-424 Gregory Dudek

  5. Training & testing • When constructing a learning system, we what to generalize to new example. (already discussed ad nauseam) • How can we tell if it’s working? • Look at how well we do with our training data. • But…. What if we just learned a “quirk” of the data? • Overfitting? Bad features? • (tank classification example; table lookup) • Look at how we do on a set of examples never used for any training: a “test set”. • But… what if we can’t afford the data? • Cross validation: learner L on training set X e(L:X) = i in XS=X-i error(L(S) on case i)2 CS-424 Gregory Dudek

  6. Simple functions? Is there a fixed circuit network topology that can be used to represent a family of functions? Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks. CS-424 Gregory Dudek

  7. Belief networks (ch. 15) - briefly • We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to chapter 19. You should read 15.3. This will be cursory coverage only. • A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets). • A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node. • Each node has an associated conditional probability table quantifying the effects of it’s “parents”. • No directed cycles. CS-424 Gregory Dudek

  8. Why? • Objective: • Compute probabilities of variables of interest, query variables • Given observations of specific phenomena in the world, evidence variables. • The net you get depends on how you construct it, not just the problem and probabilities. • Seek compactness (fewer links, tighter clusters): called locally structured nets. CS-424 Gregory Dudek

  9. See overheads... CS-424 Gregory Dudek

  10. Not in text Issues • Where do the probabilities come from? • They can be learned or inferred from data. • Where does the causal structure come from (the topology)? • It’s (sometimes) very hard to learn. • Problem: lots of alternative topologies are possible. What’s really cause and what’s effect? • Did it really rain because I brought my umbrella? Can a computer infer this (or the opposite) just from weather data? • Both these topics are current research areas. CS-424 Gregory Dudek

  11. Neural Networks? Artificial Neural Nets a.k.a. Connectionist Nets (connectionist learning) a.k.a. Sub-symbolic learning a.k.a. Perceptron learning (a special case) CS-424 Gregory Dudek

  12. Networks that model the brain? • Note there is an interesting connection to Bayes nets: it isn’t considered in the book. • Something to reflect on. • Idea: model intelligence withour “jumping ahead” to symbolic representations. • Related to earliest work on cybernetics. CS-424 Gregory Dudek

  13. The idealized neuron • Artificial neural networks come in several “flavors”. • Most of based on a simplified model of a neuron. • A set of (many) inputs. • One output. • Output is a function of the sum on the inputs. • Typical functions: • Weighted sum • Threshold • Gaussian CS-424 Gregory Dudek

  14. Not in text Why neural nets? • Motives: • We wish to create systems with abilities akin to those of the human mind. • The mind is usually assumed to be be a direct consequence of the structure of the brain. • Let’s mimic the structure of the brain! • By using simple computing elements, we obtain a system that might scale up easily to parallel hardware. • Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations. • Fault tolerance CS-424 Gregory Dudek

  15. Not in text CS-424 Gregory Dudek

  16. Not in text CS-424 Gregory Dudek

  17. Real and fake neurons • Signals in neurons are coded by “spike rate”. • In ANN’s, inputs can be either: • 0 or 1 (binary) • [0,1] • [-1,1] • R (real) • Each input Ii has an associated real-valued weight wi • Learning by changing weights at synapses. CS-424 Gregory Dudek

  18. Not in text CS-424 Gregory Dudek

  19. Brains • The brain seems divided into functional areas • These are often seen as analogous to modules in a software system. • Why would it be like this? (2 possible answers) • Evolution: incremental improvement easier in a modular system. • Advantage of combining complementary solutions. • It isn’t! CS-424 Gregory Dudek

  20. Not in text Inductive bias? • Where’s the inductive bias? • In the topology and architecture of the network. • In the learning rules. • In the input and output representation. • In the initial weights. CS-424 Gregory Dudek

  21. Not in text Simple neural models • Oldest ANN model is McCulloch-Pitts neuron [1943] . • Inputs are +1 or -1 with real-valued weights. • If sum of weighted inputs is > 0, then the neuron “fires” and gives +1 as an output. • Showed you can comput logical functions. • Relation to learning proposed (later!) by Donald Hebb [1949]. • Perceptron model [Rosenblatt, 1958]. • Single-layer network with same kind of neuron. • Firing when input is about a threshold: ∑xiwi>t . • Added a learning rule to allow weight selection. CS-424 Gregory Dudek

  22. Perceptron nets CS-424 Gregory Dudek

  23. Perceptron learning • Perceptron learning: • Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors) • Have a set of desired output values associated with these inputs. • This is supervisedlearning. • Problem: how to adjust the weights to make the actual outputs match the training examples. • NOTE: we to not allow the topology to change! [You should be thinking of a question here.] • Intuition: when a perceptron makes a mistake, it’s weights are wrong. • Modify them so make the output bigger or smaller, as desired. CS-424 Gregory Dudek

  24. Learning algorithm • Desired Ti Actual output Oi • Weight update formula (weight from unit j to i): Wj,i = Wj,I + k* xj * (Ti - Oi) Where k is the learning rate. • If the examples can be learned (encoded), then the perceptron learning rule will find the weights. • How? Gradient descent. Key thing to prove is the absence of local minima. CS-424 Gregory Dudek

  25. Perceptrons: what can they learn? • Only linearly separable functions [Minsky & Papert 1969]. • N dimensions: N-dimensional hyperplane. CS-424 Gregory Dudek

  26. More general networks • Generalize in 3 ways: • Allow continuous output values [0,1] • Allow multiple layers. • This is key to learning a larger class of functions. • Allow a more complicated function than thresholded summation [why??] Generalize the learning rule to accommodate this: let’s see how it works. CS-424 Gregory Dudek

  27. The threshold • The key variant: • Change threshold into a differentiable function • Sigmoid, known as a “soft non-linearity” (silly). M = ∑xiwi O = 1 / (1 + e -k M ) CS-424 Gregory Dudek

More Related