Learning Sets of Rules

Learning Sets of Rules • Introduction • Sequential Covering Algorithms • First Order Rules • Induction as Inverted Deduction • Inverting Resolution • Summary

First Order Rules • We now consider rules that have variables (first order rules). • More specifically we will learn first-order Horn theories. • Learning first order rules is also known as inductive logic programming. First Order Horn Clauses. Rules that have one or more preconditions and one single consequent. Predicates may have variables.

Example Consider the following example: We wish to learn the relation: Daughter(x,y) which means x is the daughter of y. We are given the following example: Name1 Mother1 Father1 Male1 Female1 Sharon Louise Bob False True Name2 Mother2 Father2 Male2 Femal2 Daughter1,2 Bob Nora Victor True False True

Example Now, if we have many of these examples then we could learn the following relation: If Father (x,y) and Female(y) then Daughter(x,y) This is more powerful than the propositional approach: If (Father1=Bob) and (Name2 = Bob) and (Femal1 = True) Then (Daughter1,2) = True The advantage lies on our ability to express relations among attribute values.

Terminology Expressions contain the following: Constants: Bob, Louise, etc. Variables: x,y, etc. Predicates: Daughter, Father Functions: age

Terminology Term: a constant, variable, or function applied to a term (Bob, x, age(Bob)) Literal: a predicate or its negation (Married(Bob,Louise)) Clause: a disjunction of literals. Horn Clause: a clause containing at most one positive literal: H V ~L1 V … V ~Ln

Terminology The following Horn clause H V ~L1 V … V ~Ln is equivalent to H  (L1 ^ … ^ Ln ) Where H is the consequent and the conjunction of literals (L1 ^ … ^ Ln) is the body or antecedent. Question: how do we learn sets of first order rules?

Learning Sets of First Order Rules • A popular algorithm is FOIL (Quinlan 1990). • The method is very similar to sequential covering. • FOIL(target-predicate, predicates, examples) • Pos  Those examples where target-predicate is true • Neg  Those examples where target-predicate is false • Learned-Rules  {} • While Pos do • Learn a new rule NewRule • Learned-Rules  Learned-Rules + NewRule • Pos  Pos – {members of Pos covered by NewRule} • Return Learned-Rules

Learning New Rules • NewRule • NewRule  If {} then target-predicate • CoveredNeg  Neg • While CoveredNeg do • candidate-literals  new literals for NewRule • BestLiteral  argmax Foil_Gain L in • candidate-literals (L,NewRule) • c. Add BestLiteral to preconditions of NewRule • d. CoveredNeg  subset of CoveredNeg satisfied by NewRule • End While

Considerations • FOIL learns rules that predict when the target is true; sequential covering learns both rules that are true and false. • FOIL performs a hill-climbing search; sequential covering performs a beam search. • FOIL rules are more expressive than Horn Clauses, why? because the precondition can have negated literals.

Considerations • Foil does a specific to general search while looking for one rule and forming the disjunction of more rules. • Foil does a general to specific search on each rule by starting with a NULL precondition and adding more literals (hill-climbing).

Generating Specializations • Assume our current rule is as follows: • P(x1, x2, …, xk)  L1 … Ln • Where each Li is a literal and P(x1, x2, …, xk) is the head or • postcondition. FOIL considers new literals Ln+1 to add to the • rule such as • Predicates: Q(v1,…,vr) where Q is a predicate and vi is an existing • or new variable (at least one vi must be already present). • Functions: Equal(xj,xk) where xj and xk are present in the rule. • Negated literals.

Example We wish to learn the target predicate GrandDaughter(x,y) Our predicates are Father(x,y) and Female(x) Our constants are Victor, Sharon, Bob, and Tom. We start with the most general rule: GrandDaughter(x,y) 

Example Possible literals we could add: Equal(x,y), Female(x), Female(y), Father(x,y) … and their negations Assume we find the best choice is GrandDaughter(x,y)  Father(y,z)

Example We add the best candidate literal and continue adding literals until we generate a rule like the following: GrandDaughter(x,y)  Father(y,z) ^ Father(z,x) ^ Female(x) At this point we remove all positive examples covered by the rule and begin the search for a new rule.

Choosing the Best Literal Consider the target predicate: GrandDaughter(x,y)  Consider all bindings. Example {x/Bob, y/Sharon}

Choosing the Best Literal Now compare rule R before adding a literal and after adding a literal. Foil_Gain(L,R) = t [ log2 (p1 / p1 + n1) - log2 (p0 / p0 + n0) ] t: positive bindings of rule R still covered after adding literal L po: positive bindings of rule R no: negative bindings of rule R p1: positive bindings of rule R’ no: negative bindings of rule R’

Learning Recursive Rule Sets What happens if we include the target predicate in the list of possible predicates? Then FOIL could consider it too as a candidate literal. Example: If Parent(x,y) then Ancestor(x,y) If Parent(x,z) and Ancestor(z,y) then Ancestor(x,y)

Induction as Inverted Deduction What is the difference between induction and deduction? Induction: Inference from specific to general. Deduction: Inference from general to specific. Induction can be cast as a deduction problem as follows: We wish to learn a target function f(x) that deductively follows from the hypothesis h, instance xi, and background knowledge B: B ^ h ^ xi |-- f(xi)

Example Learn target Child(u,v) meaning u is the child of v. Positive example: Child(Bob, Sharon) Given instance: Male(Bob), Female(Sharon), Father(Sharon,Bob) Background knowledge: Parent(u,v)  Father(u,v) Two hypotheses satisfying the constraint are: h1: Child(u,v)  Father(v,u) h2: Child(u,v)  Parent(v,u) h2 needs background knowledge and illustrates the problem of constructive induction.

Inverting Resolution Automated deduction uses the resolution rule (Robinson 1965). L: propositional literal P,R: propositional clauses The resolution rule is as follows: P V L ~L V R ___________ P V R

Example C1: PassExam V ~KnowMaterial C2: KnowMaterial V ~Study C: PassExam V ~Study If you know C1 and C how can you induce C2?

Inverting Resolution Suppose we have two clauses: C1: B V D C2: ?? C: A V B 1. A literal in C but not in C1 must be present in C2: A 2. A literal in C1 but not in C must be the literal removed by resolution: ~D Hence C2: A V ~D. There are other solutions, what are those solutions? Inverse resolution is not deterministic.

Summary • Sequential covering learns a disjunctive set of rules by first learning a rule and then removing those positive examples covered by the rule, continuing the process until all positive examples are covered. • Examples of sequential covering are AQ and CN2 family of programs.

Summary • Learning first-order Horn clauses is the problem of inductive logic programming. • FOIL applies sequential covering to first order rules. • Induction can be seen as the inverse of deduction; programs exist to do this form of induction.

Learning Sets of Rules