350 likes | 605 Views
CS 9633 Machine Learning Explanation Based Learning. Analytical Learning. Inductive learning Given a large set of examples generalize to find features that distinguish positive and negative examples Examples include NNs, GAs, Decision trees, support vector machines, etc.
E N D
Analytical Learning • Inductive learning • Given a large set of examples generalize to find features that distinguish positive and negative examples • Examples include NNs, GAs, Decision trees, support vector machines, etc. • Problem is that they perform poorly with very small training sets • Analytical learning: combines examples and domain model
Learning by People • People can often learn a concept from a single example. • They appear to do this by analyzing the example in terms of previous knowledge to determine the most relevant features. • Some inductive algorithms use domain knowledge to increase the hypothesis space • Explanation based learning uses domain knowledge to decrease the size of the hypothesis space.
Example Positive example of: Chess positions in which black will lose its queen within two moves
Inductive Learning: Given a hypothesis space H, set of training examples D, desired output is hypothesis consistent with training examples. Analytical Learning: Given hypothesis space H, set of training examples D, and a domain theory B, the desired output is hypothesis consistent with B and D. Inductive versus Analytical Learning
SafeToStack Problem Instances • Instance Space: Each instance describes a pair of objects represented by the predicates • Type (Ex. Box, Endtable, …) • Color • Volume • Owner • Material • Density • On
SafeToStack Hypothesis Space • Hypothesis space H is a set of Horn clause rules. • The head of each rule is a literal containing the target predicate SafeToStack • The body of each rule is a conjunction of literals based on • The predicates used to describe the instances • Additional general purpose predicates like: • LessThan • Equal • Greater • Additional general purpose functions like: • Plus • Minus • Times SafeToStack(x,y) Volume(x,vx) Volume(y,vy) LessThan(vx,vy)
SafeToStack Target Concept SafeToStack(x,y)
SafeToStack Training Examples • SafeToStack(Obj1, Obj2) • On(Obj1, Obj2) • Type(Obj1,Box) • Type(Obj2, Endtable) • Color(Obj1, Red) • Color(Obj2, Blue) • Volume(Obj1, 2) • Owner(Obj1, Fred) • Owner(Obj2, Louise) • Density(Obj1, 0.3) • Material(Obj1, Cardboard) • Material(Obj2, Wood)
SafeToStack Domain Theory B • SafeToStack(x,y) Fragile(y) • SafeToStack(x,y) Lighter(x,y) • Lighter(x,y) Weight(x,wx) Weight(y, wy) LessThan(wx,wy) • Weight(x,w) Volume(x,v) Density(x,d) Equal(w, times(v,d)) • Weight(x,5) Type(x,Endtable) • Fragile(x) Material(x, Glass)
Analytical Learning Problem • We must provide a domain theory sufficient to explain why observed positive examples satisfy the target concept. • The domain theory is a set of Horn clauses.
Learning with Perfect Domain Theories • Prolog EBG is an example system. • Domain theory must be: • Correct • Complete with respect to target concept and instance space
Reasonableness of Perfect Domain Theories • In some cases it is feasible to develop a perfect domain theory (chess is an example). Can help improve the performance of search intensive planning and optimization problems. • It is often not feasible to develop a perfect domain theory. Must be able to generate plausible explanations
Prolog-EBL (see Table 11.2 for details) • For each new positive training example not yet covered by a learned Horn clause, form a new Horn clause by • Explaining the new positive training example by “proving” its truth • Analyzing this explanation to determine an appropriate generalization • Refine the current hypothesis by adding a new Horn clause to cover this positive example as well as other similar instances.
1. Explaining the Training Example • Provide a proof that the training example satisfies the target concept. • If the domain theory is correct and complete, use a proof procedure like resolution. • If the domain theory is not correct and complete, must extend “proof procedure” to allow plausible approximate arguments.
Lighter(Obj1,Obj2) SafeToStack(Obj1,Obj2) Weight(Obj1,0.6) LessThan(0.6,5) Weight(Obj2,5) Type(Obj2,Endtable) Volume(Obj1,2) Density(Obj1,0.3) Equal(0.6,2*0.3)
EndTable Type Obj2 On Material Wood Owner Obj1 Density Volume 0.3 Color Louise 2 Blue Type Material Color Owner Box Cardboard Red Fred
2. Generating a General Rule • General rule from domain theory SafeToStack(x,y)Volume(x,2)Density(x,0.3)Type(y, EndT) • Note that we omitted the leaf nodes that are always satisfied independent of x and y • Equal(0.6, times(2,0.3)) • LessThan(0.6, 5) • However, we would like an even more general rule
Weakest Preimage • Goal is to compute the most general rule that can be justified by the explanation. • We do this by computing the weakest preimage • Definition: the weakest preimage of a conclusion C with respect to proof P is the most general set of assertions A, such that A entails C according to P.
Most General Rule • The most general rule that can be justified by the explanation is: SafeToStack(x,y)Volume(x,vx)Density(x,dx)Equal(wx,times(vx,dx))LessThan(wx,5) Type(y, EndT) • Use general procedure called regression to generate this rule • Start with the target concept with respect to the final step in the explanation • Generate weakest preimage of the target concept with respect to the preceding step • Terminate after iterating over all steps in the explanation.
SafeToStack(Obj1,Obj2) SafeToStack(x,y)
SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y)
SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy)
SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy) Volume(Obj1,2) Density(Obj1, 0.3) Equal(0.6,2*0.3) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,wv) Weight(y,wy)
SafeToStack(Obj1,Obj2) SafeToStack(x,y) Lighter(Obj1,Obj2) Lighter(x,y) Weight(Obj1,0.6) Weight(x,wx) LessThan(0.6,5) LessThan(wx,wy) Weight(Obj2,5) Weight(y,wy) Volume(Obj1,2) Density(Obj1, 0.3) Equal(0.6,2*0.3) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,wy) Weight(y,wy) Type((obj2, EndT) Volume(x,xv) Density(x,dx) Equal(wx,vx*dx) LessThan(wx,5) Type(y,EndT)
3. Refine the Current Hypothesis • The current hypothesis is the set of Horn clauses learned so far. • At each stage, a new positive example is picked that is not yet covered by the current hypothesis and a new rule is developed to cover it. • Only positive examples are covered by the rules. • Instances not covered by the rules are classified as negative (negation-as-failure approach)
EBL Summary • Individual examples are explained (proven) using prior knowledge • Attributes included in the proof are considered relevant. • Regression is used to generalize the rule. • Generality of learned clauses depends on the formulation of the domain theory, the order in which examples are encountered, and other instances that share the same explanation. • Assumes domain theory is complete and correct.
Different Perspectives on EBL • EBL is a theory-guided generalization of examples. • EBL is an example-guided reformulation of theories. Rules created that • Follow deductively from the domain theory • Classify the observed training examples in a single inference step • EBL is just a restating of what the learner already knows (knowledge compilation)
Inductive Bias of EBL • Domain theory • Algorithm (sequential covering) used to choose among alternative Horn clauses. • Generalization procedure favors small sets of Horn clauses.
EBL for Search Strategies • Requirement for correct and complete domain theory is often difficult to meet, but can often be met in complex search tasks. • This type of learning is called speedup learning. • Can use EBL to learn efficient sequences of operators (evolve meta-operators)