220 likes | 418 Views
Inductive Learning of Rules. Mushroom Edible? Spores Spots Color Y N Brown N Y Y Grey Y N Y Black Y N N Brown N Y N White N Y Y Brown Y Y N Brown N N Red . Don’t try this at home. Types of Learning. What is learning?
E N D
Inductive Learning of Rules MushroomEdible? Spores Spots Color Y N Brown N Y Y Grey Y N Y Black Y N N Brown N Y N White N Y Y Brown Y Y N Brown N N Red Don’t try this at home...
Types of Learning • What is learning? • Improved performance over time/experience • Increased knowledge • Speedup learning • No change to set of theoretically inferable facts • Change to speed with which agent can infer them • Inductive learning • More facts can be inferred
Mature Technology • Many Applications • Detect fraudulent credit card transactions • Information filtering systems that learn user preferences • Autonomous vehicles that drive public highways (ALVINN) • Decision trees for diagnosing heart attacks • Speech synthesis (correct pronunciation) (NETtalk) • Data mining: huge datasets, scaling issues
Defining a Learning Problem • Experience: • Task: • Performance Measure: A program is said to learn from experience E with respect to task T and performance measure P, if it’s performance at tasks in T, as measured by P, improves with experience E.
Example: Checkers • Task T: • Playing checkers • Performance Measure P: • Percent of games won against opponents • Experience E: • Playing practice games against itself
Example: Handwriting Recognition • Task T: • Performance Measure P: • Experience E: Recognizing and classifying handwritten words within images
Example: Robot Driving • Task T: • Performance Measure P: • Experience E: Driving on a public four-lane highway using vision sensors
Example: Speech Recognition • Task T: • Performance Measure P: • Experience E: Identification of a word sequence from audio recorded from arbitrary speakers ... noise
Issues • What feedback (experience) is available? • What kind of knowledge is being increased? • How is that knowledge represented? • What prior information is available? • What is the right learning algorithm? • How avoid overfitting?
Choosing the Training Experience • Credit assignment problem: • Direct training examples: • E.g. individual checker boards + correct move for each • Indirecttraining examples: • E.g. complete sequence of moves and final result • Which examples: • Random, teacher chooses, learner chooses • Supervised learning • Reinforcement learning • Unsupervised learning
Choosing the Target Function • What type of knowledge will be learned? • How will the knowledge be used by the performance program? • E.g. checkers program • Assume it knows legal moves • Needs to choose best move • So learn function: F: Boards -> Moves • hard to learn • Alternative: F: Boards -> R
The Ideal Evaluation Function • V(b) = 100 if b is a final, won board • V(b) = -100 if b is a final, lost board • V(b) = 0 if b is a final, drawn board • Otherwise, if b is not final V(b) = V(s) where s is best, reachable final board Nonoperational… Want operational approximation of V: V
How Represent Target Function • x1 = number of black pieces on the board • x2 = number of red pieces on the board • x3 = number of black kings on the board • x4 = number of red kings on the board • x5 = number of black pieces threatened by red • x6 = number of red pieces threatened by black V(b) = a + bx1 + cx2 + dx3 + ex4 + fx5 + gx6 Now just need to learn 7 numbers!
Target Function • Profound Formulation: Can express any type of inductive learning as approximating a function • E.g., Checkers • V: boards -> evaluation • E.g., Handwriting recognition • V: image -> word • E.g., Mushrooms • V: mushroom-attributes -> {E, P} • Inductive bias
Theory of Inductive Learning • Suppose our examples are drawn with a probability distribution Pr(x), and that we learned a hypothesis f to describe a concept C. • We can define Error(f) to be: • where D are the set of all examples on which f and C disagree.
PAC Learning • We’re not perfect (in more than one way). So why should our programs be perfect? • What we want is: • Error(f) < e, for some chosen e. • But sometimes, we’re completely clueless: (hopefully, with low probability). What we really want is: • Prob ( Error(f) > e) < d. • As the number of examples grows, e and d should decrease. • We call this Probably approximately correct.
Definition of PAC Learnability • Let C be a class of concepts. • We say that C is PAC learnable by a hypothesis space H if: • there is a polynomial-time algorithm A, • a polynomial function p, • such that for every C in C, every probability distribution Pr, and e and d, • if A is given at least p(1/e, 1/d) examples, • then A returns with probability 1-d a hypothesis whose error is less than e. • k-DNF, and k-CNF are PAC learnable.
Version Spaces: A Learning Alg. • Key idea: • Maintain most specific and most general hypotheses at every point. Update them as examples come in. • We describe objects in the space by attributes: • faculty, staff, student • 20’s, 30’s, 40’s. • male, female • Concepts: boolean combination of attribute-values: • faculty, 30’s, male, • female, 20’s.
Generalization and Specializ... • A concept C1 is more general than C2 if it describes a superset of the objects: • C1={20’s, faculty} is more general than C2={20’s, faculty, female}. • C2 is a specialization of C1. • Immediate specializations (generalizations). • The version space algorithm maintains the most specific and most general boundaries at every point of the learning.
Example T faculty student male female 20’s 30’s male, fac male,stud female,fac female,stud fac,20’s fac, 30’s male,fac,20 male,fac,30 fem,fac,20 male,stud,30