1 / 31

RIPPER Fast Effective Rule Induction

RIPPER Fast Effective Rule Induction. Machine Learning 2003 Merlin Holzapfel & Martin Schmidt Mholzapf@uos.de Martisch@uos.de. Rule Sets - advantages. easy to understand usually better than decision Tree learners representable in first order logic > easy to implement in Prolog

Thomas
Download Presentation

RIPPER Fast Effective Rule Induction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RIPPERFast Effective Rule Induction Machine Learning 2003 Merlin Holzapfel & Martin Schmidt Mholzapf@uos.deMartisch@uos.de

  2. Rule Sets - advantages • easy to understand • usually better than decision Tree learners • representable in first order logic • > easy to implement in Prolog • prior knowledge can be added

  3. Rule Sets - disadvantages • scale poorly with training set size • problems with noisy data • likely in real-world data • goal: • develop rule learner that is efficient on noisy data • competitive with C4.5 / C4.5rules

  4. Problem with Overfitting • overfitting also handles noisy cases • underfitting is too general • solution pruning: • reduced error pruning (REP) • post pruning • pre pruning

  5. Post Pruning (C4.5) • overfit & simplify • construct tree that overfits • convert tree to rules • prune every rule separately • sort rules according accuracy • consider order when classifying • bottom - up

  6. Pre pruning • some examples are ignored during concept generation • final concept does not classify all training data correctly • can be implemented in form of stopping criteria

  7. Reduced Error Pruning • seperate and conquer • split data in training and validation set • construct overfitting tree • until pruning reduces accuracy • evaluate impact on validation set of pruning a rule • remove rule so it improves accuracy most

  8. Time Complexity • REP has a time complexity of O(n4) • initial phase of overfitting alone has a complexity of O(n²) • alternative concept Grow: • faster in benchmarks • time complexity still O(n4) with noisy data

  9. Incremental Reduced Error Pruning - IREP • by Fürnkranz & Widmer (1994) • competitive error rates • faster than REP and Grow

  10. How IREP Works • iterative application of REP • random split of sets  bad split has negative influence (but not as bad as with REP) • immediately pruning after a rule is grown (top-down approach)  no overfitting

  11. Cohens IREP Implementation • build rules until new rule results in too large error rate • divide data (randomly) into growing set(2/3) and pruning set(1/3) • grow rule from growing set • immediately prune rule • Delete final sequence of conditions • delete condition that maximizes function v until no deletion improves value of v • add pruned rule to ruleset • delete every example covered by rule (p/n)

  12. Cohens IREP - Algorithm

  13. IREP and Multiple Classes • order classes according to increasing prevalence (C1,....,Ck) • find rule set to separate C1 from other classes IREP(PosData=C1,NegData=C2,...,Ck) • remove all instances learned by rule set • find rule set to separate C2 from C3,...,Ck ... • Ck remains as default class

  14. IREP and Missing Attributes • handle missing attributes: • for all tests involving A • if attribute A of an instance is missing test fails

  15. Differences Cohen <> Original • pruning: final sequence <> single final condition • stopping condition: error rate 50% <> accuracy(rule) < accuracy(empty rule) • application: missing attributes, numerical variables, multiple classes <> two-class problems

  16. Time Complexity IREP: O(m log² m),m = number of examples (fixed number of classification noise)

  17. 37 Benchmark Problems

  18. Generalization Performance • IREP performs worse on benchmark problems than C4.5rules • won-lost-tie ratio: 11-23-3 • error ratio • 1.13 excluding mushroom • 1.52 including mushroom

  19. Improving IREP • three modifications: • alternative metric in pruning phase • new stopping heuristics for rule adding • post pruning of whole rule set (non-incremental pruning)

  20. the Rule-Value Metric • old metric not intuitive R1: p1 = 2000, n1 = 1000 R2: p1 = 1000, n1 = 1 metric preferes R1 (fixed P,N) leads to occasional failure to converge • new metric (IREP*)

  21. Stopping Condition • 50%-heuristics often stops too soon with moderate sized examples • sensitive to the ‘small disjunct problem‘ • solution: • after a rule is added, the total description length of rule set and missclassifications (DL=C+E) • If DL is d bits larger then the smallest length so far stop (min(DL)+d<DLcurrent) • d = 64 in Cohen‘s implementation  MDL (Minimal Description Length) heuristics

  22. IREP* • IREP* is IREP, improved by the new rule-value metric and the new stopping condition • 28-8-1 against IREP • 16-21-0 against C4.5rules error ratio 1.06 (IREP 1.13) respectively 1.04 (1.52) including mushrooms

  23. Rule Optimization • post prunes rules produced by IREP* • The rules are considered in turn • for each rule R, two alternatives are constructed • Ri‘ new rule • Ri‘‘ based on Ri • final rule is chosen according to MDL

  24. RIPPER • IREP* is used to obtain a rule set • rule optimization takes place • IREP* is used to cover remaining positive examples  Repeated Incremental Pruning to Produce Error Reduction

  25. RIPPERk • apply steps 2 and 3 k times

  26. RIPPER Performance • 28-7-2 against IREP*

  27. Error Rates RIPPER obviously is competitive

  28. Efficency of RIPPERk • modifications do not change complexity

  29. Reasons for Efficiency • find model with IREP* and then improve • effiecient first model with right size • optimization takes linear time • C4.5 has expensive optimization improvement process • to large initial model • RIPPER is especially more efficient on large noisy datasets

  30. Conclusions • IREP is efficient rule learner for large noisy datasets but performs worse than C4.5 • IREP improved to IREP* • IREP* improved to RIPPER • k iterated RIPPER is RIPPERk • RIPPERk more efficient and performs better than C4.5

  31. References • Fast Effective Rule Induction William W. Cohen [1995] • Incremental Reduced Error Pruning J. Fürnkranz & G. Widmer [1994] • Efficient Pruning Methods William W. Cohen [1993]

More Related