1 / 17

Efficient Rule Induction Algorithm by William W. Cohen Overview

This paper discusses a rule-based learning algorithm that efficiently handles large noisy datasets and rivals traditional symbolic learning methods. Discover the Sequential Covering Algorithm, Pruning Techniques, IREP Algorithm, and Evolution of Ripper. Goal: Develop a competitive rule learning algorithm.

mccarthyj
Download Presentation

Efficient Rule Induction Algorithm by William W. Cohen Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Effective Rule Induction By William W. Cohen

  2. Overview • Rule Based Learning • Rule Learning Algorithm • Pruning Techniques • Modifications to IREP • Evolution of Ripper • Conclusion

  3. Goal of the Paper • The goal of this paper is to develop a rule learning algorithm that perform efficiently on a large noisy datasets and are competitive in generalization performance with more mature symbolic learning methods, such as decision trees.

  4. Concepts to Refresh • Overfit and simplify strategy - Separate and Conquer • Pruning

  5. Separate and Conquer • General Idea: 1. Learn one rule that covers certain number of positive examples 2. Remove those examples covered by the rule 3. Repeat until no positive examples are left.

  6. Sequential Covering Algorithm • Sequential-Covering(class,attributes,examples,threshold T) • RuleSet = 0 • Rule = Learn-one-rule(class,attributes,examples) • While (performance(Rule) > T) do • a. RuleSet += Rule • b. Examples = Examples \ {examples classified correctly by Rule} • c. Rule = Learn-one-rule(class,attributes,examples) • Sort RuleSet based on the performance of the rules • Return RuleSet

  7. Pruning • Why do we need pruning? • Techniques of pruning: 1. Reduced Error Pruning 2. Grow 3. Incremental Reduced Error Pruning

  8. IREP Algorithm

  9. How to build a rule in IREP? • First the uncovered examples are randomly partitioned into two subsets, a growing set and a pruning set. • Next a rule is grown.The implementation of a Grow rule is a propositional version of FOIL.

  10. Grow Rule • It begins with an empty conjunction of lconditions and considers adding to this any condition of the form An=v,Ac<=@ or Ac>=@ where An is a nominal attribute and v is a legal value for An or Ac is a continuous attribute and 2 is some value for Ac that occurs in the training data.

  11. Grow Rule • Grow rule repeatedly adda the conditions that maximizes FOIL’s information gain criterion until the rule covers no negative examples from the growing dataset.

  12. Pruning • After growing,the rule is immediately pruned by deleting any final sequence of conditions from the rule, and chooses the deletion that maximizes the function v(Rule,PrunePos,PruneNeg)= p+(N-n) / P+N

  13. IREP • IREP algorithm works for - Two-class problems - Multiple classes - Handles missing attributes

  14. Experiments with IREP The First Graph

  15. CPU times for C4.5,IREP and RIPPER2

  16. Improvements to IREP • Improvement in IREP needs modifications 1.The Real Value Metric 2. The stopping criterion 3. Rule optimization

  17. Evolution of RIPPER • First IREP* is used to obtain the initial rule set.This rule set is next optimized and finally rules are added to cover any remaining positive examples using IREP*.This leads to a new algorithm , namely RIPPER

More Related