Methods for Learning Rule Sets in Computer Science

Chap. 10Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과

Learning Disjunctive Sets of Rules • Method 1 • Learn Decision Tree • Translate Tree into Rules • Method 2 • Genetic Algorithm • Method 3 • Learn Rule Sets Directly • Sequential Covering Algorithm

Sequential Covering Algorithm (1) • SEQUENTIAL-COVERING(Target_attribute, Attributes, Examples, Threshold) • Learned_rules {} • Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) • while PERFORMANCE(Rule, Examples) > Threshold, do • Learned_rules Learned_rules + Rule • Examples  Examples - {examples correctly classified by Rule} • Rule  LEARN-ONE-RULE(Target_attribute, Attributes, Examples) • Learned_rules sort Learned_rules according to PERFORMANCE over Examples • return Learned_rules

Sequential Covering Algorithm (2) 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat • Greedy Search • No Guarantee of Best Set of Rules

Learn-One-Rule (1) • General to Specific Search • Greedy Depth-First Search • No Backtracking • Begin with most general rule • Greedily Adding Attribute Test • one which most improve rule performance • High Accuracy, Incomplete Coverage

Learn-One-Rule (2)

General to Specific Beam Search • To Reduce Risk of Suboptimal Choice • Maintain a List of k best candidates

Learning Rule Sets (1) • Sequential Covering Algorithm • Learn One Rule at a time • Partition Data by Attribute-Value Pair • ID3 • Learn Entire Set of Disjunctive Rules • Partition Data by Attribute • If data is plentiful, sequential covering will be better.

Learning Rule Sets (2) • Sequential Covering Algorithm • General to Specific Search • Single Maximally General Hypothesis • Generate then Test Search • Robust : Impact of Noisy Data is Minimized • Find-S Algorithm • Specific to General Search • Example-Driven

Learning Rule Sets (3) • Rule Post-Pruning as Decision Tree • Rule PERFORMANCE • relative frequency • m-estimate of accuracy • entropy (= Information Gain)

Learning First-Order Rules • Motivation for First-Order Rules • More Expressive • Inductive Logic Programming (ILP) • Inductive Learning of First-order Rules  Automatic Inferring PROLOG Program • First-Order Horn Clauses • Horn Clause • a Clause Containing at most One Positive Literal • H L1   Ln • H  (L1    Ln)

Learning Sets of First-Order Rules: FOIL • FOIL • Natural Extension of SEQUENTIAL-COVERING & LEARN-ONE-RULE • Literals cannot contain Function Symbols. • Body of Rule May be Negated.

FOIL (1)

FOIL (2) • Seek Rules that Predict When the Target is TRUE • Hill-Climbing Search • Outer Loop • Generalize Current Disjunctive Hypothesis • Specific to General Search • Inner Loop • Hypothesis Space Consists of Conjunctions of Literals • General to Specific, Hill-Climbing Search

Generating Candidate Specializations in FOIL (1) • Suppose Current Rule is P(x1, x2, , xk)  L1  Ln • New Literal Ln+1 that fit one of the following Forms: • Q(v1, , vr) • Q : Predicate name occurring in Predicates • vi : new variable or variables present in the rule • At least one vi must already exist in current rule • Equal(xj, xk) • negation of either of the above forms

Generating Candidate Specializations in FOIL (2) • Example • Begin with most general rule • GrandDaughter(x, y)  • Generate Following Literals as Candidate • Equal(x, y), Female(x), Female(y), Father(x, y), Father(x, z), Father(z, x), Father(y, z), Father(z, y), and negation of these literals. • Suppose that Father(y, z) be most promising • GrandDaughter(x, y)  Father(y, z) • Iterate • GrandDaughter(x, y)  Father(y, z)  Father(z, x)  Female(y)

Guiding Search in FOIL • To Select the Most Promising Literal • Consider Performance of Rule Over Training Data • Consider All Possible Bindings of Each Variable

Guiding Search in FOIL • Information Gain in FOIL where • L is the candidate literal to add to rule R • p0 = number of positive bindings of R • n0 = number of negative bindings of R • p1 = number of positive bindings of R+L • n1 = number of negative bindings of R+L • t is the number of positive bindings of R also covered by R+L • Reduction of Number of Bits due to L

Induction As Inverted Deduction (1) • Induction is Finding h such that where • xi is ith training instance • f(xi) is target function value for xi • B is other background knowledge

Induction As Inverted Deduction (2) • Designing Inverse Entailment Operators O(B, D) = h such that • Minimum Description Length Principle • to choose hypothesis among hypotheses which satisfying • Practical Difficulty • Do not Allow Noisy Training Data • No. of Hypotheses satisfying is so large • Complexity of Hypothesis Space Increases as B is Increased.

Deduction : Resolution Rule • P L L  R P R 1. Given initial clauses C1 and C2, find a literal L from clause C1 such that L occurs in clause C2. 2. Form the resolvent C by including all literals from C1 and C2, except for L and L. C = (C1- {L})  (C2- {L })

Inverse Resolution Operator • Not Deterministic • Multiple C2 such that C1 and C2 produce C • Prefer Shorter One 1. Given initial clause C1 and C, find a literal L that occurs in C1, but not in C2. 2. Form the second clause C2 by including the following literals C2= (C- (C1 - {L})  {L }

Rule-Learning Algorithm Based on Inverse Entailment Operators • Use Sequential Covering Algorithm 1. Select Training Example <xi, f(xi)> yet not covered 2. Apply Inverse Resolution to Generate Hypotheses hi That Satisfy 3. Iterate

First-Order Resolution •  is a Unifying Substitution for two Literals L1 and L2, if L1  = L2 . 1. Find a literal L1 from C1, literal L2 from C2, and substitution such  that L1  = L2 . 2.From the Resolvent C by including all literals from C1 and C2, except L1  and L2 . C = (C1- {L1})  (C2- {L2})

Inverting First-Order Resolution (1) • C = (C1- {L1}1)  (C2- {L2})2 where  = 12 • By Definition, L2 = L112-1, • C2 = (C - (C1 - {L1})1)2-1  {L112-1}

Inverting First-Order Resolution (2) • Training Data D = GrandChild(Bob, Shanon), Background Info. B = {Father(Shannon, Tom), Father(Tom, Bob)}.

Methods for Learning Rule Sets in Computer Science

Methods for Learning Rule Sets in Computer Science

Presentation Transcript

Chap 10. Sensitivity Analysis

Learning set of rules

Chap 10. Temperature

Disjoint Sets Data Structure (Chap. 21)

10 Rules of “Lean”

Chap. 10 – Africa

Learning Sets of Rules

Action Learning Sets

Inductive Learning of Rules

Chap 10-Chemistry

Learning Sets of Rules

Learning rules from incomplete training examples by rough sets

Learning Sets of Rules

Chapter 10 Learning Sets Of Rules

Chap.10 Predation

Chap. 10, p. 240

Learning Sets of Rules