270 likes | 291 Views
Explore sequential covering algorithm, genetic algorithm, and more to learn sets of rules. Sequential covering optimizes rule sets for accuracy. Incorporate greedy search for rule performance improvement.
E N D
Chap. 10Learning Sets of Rules 박성배 서울대학교 컴퓨터공학과
Learning Disjunctive Sets of Rules • Method 1 • Learn Decision Tree • Translate Tree into Rules • Method 2 • Genetic Algorithm • Method 3 • Learn Rule Sets Directly • Sequential Covering Algorithm
Sequential Covering Algorithm (1) • SEQUENTIAL-COVERING(Target_attribute, Attributes, Examples, Threshold) • Learned_rules {} • Rule LEARN-ONE-RULE(Target_attribute, Attributes, Examples) • while PERFORMANCE(Rule, Examples) > Threshold, do • Learned_rules Learned_rules + Rule • Examples Examples - {examples correctly classified by Rule} • Rule LEARN-ONE-RULE(Target_attribute, Attributes, Examples) • Learned_rules sort Learned_rules according to PERFORMANCE over Examples • return Learned_rules
Sequential Covering Algorithm (2) 1. Learn one rule with high accuracy, any coverage 2. Remove positive examples covered by this rule 3. Repeat • Greedy Search • No Guarantee of Best Set of Rules
Learn-One-Rule (1) • General to Specific Search • Greedy Depth-First Search • No Backtracking • Begin with most general rule • Greedily Adding Attribute Test • one which most improve rule performance • High Accuracy, Incomplete Coverage
General to Specific Beam Search • To Reduce Risk of Suboptimal Choice • Maintain a List of k best candidates
Learning Rule Sets (1) • Sequential Covering Algorithm • Learn One Rule at a time • Partition Data by Attribute-Value Pair • ID3 • Learn Entire Set of Disjunctive Rules • Partition Data by Attribute • If data is plentiful, sequential covering will be better.
Learning Rule Sets (2) • Sequential Covering Algorithm • General to Specific Search • Single Maximally General Hypothesis • Generate then Test Search • Robust : Impact of Noisy Data is Minimized • Find-S Algorithm • Specific to General Search • Example-Driven
Learning Rule Sets (3) • Rule Post-Pruning as Decision Tree • Rule PERFORMANCE • relative frequency • m-estimate of accuracy • entropy (= Information Gain)
Learning First-Order Rules • Motivation for First-Order Rules • More Expressive • Inductive Logic Programming (ILP) • Inductive Learning of First-order Rules Automatic Inferring PROLOG Program • First-Order Horn Clauses • Horn Clause • a Clause Containing at most One Positive Literal • H L1 Ln • H (L1 Ln)
Learning Sets of First-Order Rules: FOIL • FOIL • Natural Extension of SEQUENTIAL-COVERING & LEARN-ONE-RULE • Literals cannot contain Function Symbols. • Body of Rule May be Negated.
FOIL (2) • Seek Rules that Predict When the Target is TRUE • Hill-Climbing Search • Outer Loop • Generalize Current Disjunctive Hypothesis • Specific to General Search • Inner Loop • Hypothesis Space Consists of Conjunctions of Literals • General to Specific, Hill-Climbing Search
Generating Candidate Specializations in FOIL (1) • Suppose Current Rule is P(x1, x2, , xk) L1 Ln • New Literal Ln+1 that fit one of the following Forms: • Q(v1, , vr) • Q : Predicate name occurring in Predicates • vi : new variable or variables present in the rule • At least one vi must already exist in current rule • Equal(xj, xk) • negation of either of the above forms
Generating Candidate Specializations in FOIL (2) • Example • Begin with most general rule • GrandDaughter(x, y) • Generate Following Literals as Candidate • Equal(x, y), Female(x), Female(y), Father(x, y), Father(x, z), Father(z, x), Father(y, z), Father(z, y), and negation of these literals. • Suppose that Father(y, z) be most promising • GrandDaughter(x, y) Father(y, z) • Iterate • GrandDaughter(x, y) Father(y, z) Father(z, x) Female(y)
Guiding Search in FOIL • To Select the Most Promising Literal • Consider Performance of Rule Over Training Data • Consider All Possible Bindings of Each Variable
Guiding Search in FOIL • Information Gain in FOIL where • L is the candidate literal to add to rule R • p0 = number of positive bindings of R • n0 = number of negative bindings of R • p1 = number of positive bindings of R+L • n1 = number of negative bindings of R+L • t is the number of positive bindings of R also covered by R+L • Reduction of Number of Bits due to L
Induction As Inverted Deduction (1) • Induction is Finding h such that where • xi is ith training instance • f(xi) is target function value for xi • B is other background knowledge
Induction As Inverted Deduction (2) • Designing Inverse Entailment Operators O(B, D) = h such that • Minimum Description Length Principle • to choose hypothesis among hypotheses which satisfying • Practical Difficulty • Do not Allow Noisy Training Data • No. of Hypotheses satisfying is so large • Complexity of Hypothesis Space Increases as B is Increased.
Deduction : Resolution Rule • P L L R P R 1. Given initial clauses C1 and C2, find a literal L from clause C1 such that L occurs in clause C2. 2. Form the resolvent C by including all literals from C1 and C2, except for L and L. C = (C1- {L}) (C2- {L })
Inverse Resolution Operator • Not Deterministic • Multiple C2 such that C1 and C2 produce C • Prefer Shorter One 1. Given initial clause C1 and C, find a literal L that occurs in C1, but not in C2. 2. Form the second clause C2 by including the following literals C2= (C- (C1 - {L}) {L }
Rule-Learning Algorithm Based on Inverse Entailment Operators • Use Sequential Covering Algorithm 1. Select Training Example <xi, f(xi)> yet not covered 2. Apply Inverse Resolution to Generate Hypotheses hi That Satisfy 3. Iterate
First-Order Resolution • is a Unifying Substitution for two Literals L1 and L2, if L1 = L2 . 1. Find a literal L1 from C1, literal L2 from C2, and substitution such that L1 = L2 . 2.From the Resolvent C by including all literals from C1 and C2, except L1 and L2 . C = (C1- {L1}) (C2- {L2})
Inverting First-Order Resolution (1) • C = (C1- {L1}1) (C2- {L2})2 where = 12 • By Definition, L2 = L112-1, • C2 = (C - (C1 - {L1})1)2-1 {L112-1}
Inverting First-Order Resolution (2) • Training Data D = GrandChild(Bob, Shanon), Background Info. B = {Father(Shannon, Tom), Father(Tom, Bob)}.