1 / 10

A rule induction algorithm

ID3 (1986), Interatactive Dichotomizer 3 , followed by C4.5 (1993) then C5.0 (>2000) [Ross Quinlan]. the training set is partitioned into smaller & smaller subsets. a selection criteria forms the basis on which the training set is subdivided.

Roberta
Download Presentation

A rule induction algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ID3 (1986), Interatactive Dichotomizer 3, followed by C4.5 (1993) then C5.0 (>2000) [Ross Quinlan]. the training set is partitioned into smaller & smaller subsets. a selection criteria forms the basis on which the training set is subdivided. uses a `divide and conquer' method to build the tree data is divided into subsets until the subset contains a single class The algorithm is recursive The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm

  2. An example (animal classification) 4 input attributes hair [T, F] swims [T, F] colour [white, brown, gray] size [small, medium, large] 3 Classes A = KANGAROO B = DOLPHIN C = WHALE # hair swims colour size CLASS 1 T F gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm

  3. An example (animal classification) # hair swims colour size CLASS 1 T F gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO Order of attribute selection will be hair, swims, colour, size The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm

  4. An example (animal classification) # hair swims colour size CLASS 1 T F gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO Change order of attribute selection: hair, colour, swims,size The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm

  5. An example (animal classification) # hair swims colour size CLASS 1 T F gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO Change order of attribute selection: size, swims, colour, hair The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm

  6. Introduce a conflicting example (the `non swimming' smallish whale!) # hair swims colour size CLASS 1 T F gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO 7 T F gray medium WHALE Order of attribute selection: hair, colour, swims, size The basic algorithm PROCEDURE BuildTree (ExampleSubset) NumberOf Classes = calculate the number of classes in the example subset IF NumberOfClasses = 0 THEN Null leaf ELSE IF NumberOfClasses = 1 THEN store the output class as a leaf in the tree ELSE DecisionNodeInput = Determine the input IF the DecisionNodeInput = 0 THEN Error: more than one class has all the same attributes ELSE Create a decision node for the DecisionNodeInput FOR all values of the DecisionNodeInput Determine the NewExampleSubset for this input value . BuildTree(NewExampleSubset) ENDFOR ENDIF ENDIF ENDIF A rule induction algorithm Same as example 1

  7. A rule induction algorithm #1 2 3 4 5 6 7 The tree is Hair? T F #3 4 5 #12 6 7 colour? white colour? white gray gray NULL brown brown #1 6 7 swims? DOLPHIN T DOLPHIN WHALE KANGAROO F #3 #4 #2 #5 #1 6 7 NULL size? small large medium KANGAROO NULL #6 error #1 7

  8. A rule induction algorithm How might an algorithm of this type handle missing data? As the attribute value set has to be finite and discrete then the simplest way is to treat a missing value as an extra attribute value eg # hair swims colour size CLASS 1 T ? gray medium KANGAROO 2 T F brown medium KANGAROO 3 F T gray large DOLPHIN 4 F T white medium DOLPHIN 5 F T brown large WHALE 6 T F gray large KANGAROO The value set for swims now becomes {T, F, ?}

  9. The rule sets now be written down from the decision trees. For example IF hair = T THEN KANGAROO IF hair = F AND colour = white THEN DOLPHIN IF hair = F AND colour = brown THEN WHALE IF hair = F AND colour = gray THEN DOLPHIN A rule induction algorithm Hair? T F KANGAROO Colour? white gray DOLPHIN brown DOLPHIN WHALE

  10. A rule induction algorithm Comments: The choice of which attribute to split on is crucial The algorithm can deal with missing data The algorithm can deal with conflict, by flagging that it exists Next time …

More Related