1 / 27

Rule-Based Classifiers

Rule-Based Classifiers. Rule-Based Classifier. Classify records by using a collection of “if…then…” rules Rule: ( Condition )  y where Condition is a conjunctions of attributes y is the class label LHS : rule antecedent or condition RHS : rule consequent

robertochoa
Download Presentation

Rule-Based Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule-Based Classifiers

  2. Rule-Based Classifier • Classify records by using a collection of “if…then…” rules • Rule: (Condition)  y • where • Condition is a conjunctions of attributes • y is the class label • LHS: rule antecedent or condition • RHS: rule consequent • Examples of classification rules: • (Blood Type=Warm)  (Lay Eggs=Yes)  Birds • (Taxable Income < 50K)  (Refund=Yes)  Evade=No

  3. Rule-based Classifier (Example) R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians

  4. Application of Rule-Based Classifier • A rule rcovers an instance x if the attributes of the instance satisfy the condition of the rule R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians The rule R1 covers a hawk => Bird The rule R3 covers the grizzly bear => Mammal

  5. Rule Coverage and Accuracy • Coverage of a rule: • Fraction of records that satisfy the antecedent of a rule • Accuracy of a rule: • Fraction of records that satisfy both the antecedent and consequent of a rule (over those that satisfy the antecedent) (Status=Single)  No Coverage = 40%, Accuracy = 50%

  6. Decision Trees vs. rules From trees to rules. • Easy: converting a tree into a set of rules • One rule for each leaf: • Antecedent contains a condition for every node on the path from the root to the leaf • Consequent is the class assigned by the leaf • Straightforward, but rule set might be overly complex

  7. Decision Trees vs. rules • From rules to trees • More difficult: transforming a rule set into a tree • Tree cannot easily express disjunction between rules • Example: • If a and b then x • If c and d then x • Corresponding tree contains identical subtrees (Þ“replicated subtree problem”)

  8. A tree for a simple disjunction

  9. How does Rule-based Classifier Work? R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians A lemur triggers rule R3, so it is classified as a mammal A turtle triggers both R4 and R5 A dogfish shark triggers none of the rules

  10. Desiderata for Rule-Based Classifier • Mutually exclusive rules • No two rules are triggered by the same record. • This ensures that every record is covered by at most one rule. • Exhaustive rules • There exists a rule for each combination of attribute values. • This ensures that every record is covered by at least one rule. • Together these properties ensure that every record is covered by exactly one rule.

  11. From Decision Trees To Rules (again) Rules are mutually exclusive and exhaustive Rule set contains as much information as the tree

  12. Rules • Non mutually exclusive rules • A record may trigger more than one rule • Solution? • Ordered rule set • Non exhaustive rules • A record may not trigger any rules • Solution? • Use a default class

  13. Ordered Rule Set • Rules are ranked ordered according to their priority (e.g. based on their quality) • An ordered rule set is known as a decision list • When a test record is presented to the classifier • It is assigned to the class label of the highest ranked rule it has triggered • If none of the rules fired, it is assigned to the default class R1: (Give Birth = no)  (Can Fly = yes)  Birds R2: (Give Birth = no)  (Live in Water = yes)  Fishes R3: (Give Birth = yes)  (Blood Type = warm)  Mammals R4: (Give Birth = no)  (Can Fly = no)  Reptiles R5: (Live in Water = sometimes)  Amphibians

  14. Building Classification Rules: Sequential Covering • Start from an empty rule • Grow a rule using some Learn-One-Rule function • Remove training records covered by the rule • Repeat Step (2) and (3) until stopping criterion is met

  15. This approach is called a covering approach because at each stage a rule is identified that covers some of the instances

  16. Example: generating a rule • Possible rule set for class “b”: • More rules could be added for “perfect” rule set If x  1.2 then class = b If x > 1.2 and y  2.6 then class = b

  17. A simple covering algorithm • Generates a rule by adding tests that maximize rule’s accuracy • Similar to situation in decision trees: problem of selecting an attribute to split on. • But: decision tree inducer maximizes overall purity • Here, each new test (growing the rule) reduces rule’s coverage.

  18. Selecting a test • Goal: maximizing accuracy • t: total number of instances covered by rule • p: positive examples of the class covered by rule • t-p: number of errors made by rule Þ Select test that maximizes the ratio p/t • We are finished when p/t = 1 or the set of instances can’t be split any further

  19. Age Spectacle prescription Astigmatism Tear production rate Recommended Lenses young myope no reduced none young myope no normal soft young myope yes reduced none young myope yes normal hard young hypermetrope no reduced none young hypermetrope no normal soft young hypermetrope yes reduced none young hypermetrope yes normal hard pre-presbyopic myope no reduced none pre-presbyopic myope no normal soft pre-presbyopic myope yes reduced none pre-presbyopic myope yes normal hard pre-presbyopic hypermetrope no reduced none pre-presbyopic hypermetrope no normal soft pre-presbyopic hypermetrope yes reduced none pre-presbyopic hypermetrope yes normal none presbyopic myope no reduced none presbyopic myope no normal none presbyopic myope yes reduced none presbyopic myope yes normal hard presbyopic hypermetrope no reduced none presbyopic hypermetrope no normal soft presbyopic hypermetrope yes reduced none presbyopic hypermetrope yes normal none Example: contact lenses data

  20. Example: contact lenses data The numbers on the right show the fraction of “correct” instances in the set singled out by that choice. In this case, correct means that their recommendation is “hard.”

  21. Modified rule and resulting data The rule isn’t very accurate, getting only 4 out of 12 that it covers. So, it needs further refinement.

  22. Further refinement

  23. Modified rule and resulting data Should we stop here? Perhaps. But let’s say we are going for exact rules, no matter how complex they become. So, let’s refine further.

  24. Further refinement

  25. The result

  26. Pseudo-code for PRISM Heuristic: order C in ascending order of occurrence. For each class C Initialize E to the instance set While E contains instances in class C Create a rule R with an empty left-hand side that predicts class C Until R is perfect (or there are no more attributes to use) do For each attribute A not mentioned in R, and each value v, Consider adding the condition A = v to the left-hand side of R Select A and v to maximize the accuracy p/t (break ties by choosing the condition with the largest p) Add A = v to R Remove the instances covered by R from E RIPPER Algorithm is similar. It uses instead of p/t the FOIL info gain.

  27. Separate and conquer • Methods like PRISM (for dealing with one class) are separate-and-conquer algorithms: • First, a rule is identified • Then, all instances covered by the rule are separated out • Finally, the remaining instances are “conquered” • Difference to divide-and-conquer methods: • Subset covered by rule doesn’t need to be explored any further

More Related