120 likes | 359 Views
Rule induction: Ross Quinlan's ID3 algorithm. Fredda Weinberg CIS 718X Fall 2005 Professor Kopec Assignment #3. The learning problem. You are presented with the data. You have a supervised learning problem (that is, a target variable).
E N D
Rule induction:Ross Quinlan's ID3 algorithm Fredda Weinberg CIS 718X Fall 2005 Professor Kopec Assignment #3
The learning problem • You are presented with the data. • You have a supervised learning problem (that is, a target variable). • In practice, there is no such thing as the correct model. • You are looking for a “best approximating” model. • There is no reason to think that linear models provide the “best approximating” model. • SPSS CLementine Users Group
Terms • General: • Decision trees. • “Recursive partitioning” -- Apply the same splitting rule to smaller and smaller partitions of the sample space. • Classification • Tree-based classification. • Classification trees. • ibid
Rule induction 1. For each attribute, compute its entropy with respect to the conclusion 2. Select the attribute (say A) with lowest entropy. 3. Divide the data into separate sets so that within a set, A has a fixed value (eg Color=green eye color in one set, Color=brown in another, etc). 4. Build a tree with branches: if A=a1 then ... (subtree1) if A=a2 then ... (subtree2) ...etc... 5. For each subtree, repeat this process from step 1. 6. At each iteration, one attribute gets removed from consideration. The process stops when there are no attributes left to consider, or when all the data being considered in a subtree have the same value for the conclusion (eg they all say Conclusion=safe from sunburn). Rule induction: Ross Quinlan's ID3 algorithm
Iterative Dichotomizer The rule induction algorithm was first used by Hunt in his CLS (concept learning system) in 1962. Then, with extensions for handling numeric data too, it was used by Ross Quinlan for his ID3 system in 1979. Quinlan'sID3 tried to cut down on effort by inducing a set of rules from a small subset of data, and then testing to see if those rules explained other data. Data not explained were then added to the chosen subset, and new rules induced. This process continued until all the data was accounted for. The letters ID stood for `iterative dichotomiser', a fancy name for this simple algorithm. Rule induction: Ross Quinlan's ID3 algorithm
Entropy • Entropy = Si -pi log2 pi • Information-theoretic criterion: Minimum number of bits needed to encode the classification of an arbitrary case. • Ranges from 0 to 1. • 0 if p is concentrated in one class. • Maximal if p is uniform across classes. • Entropy gain is reduction in entropy after split. Interpretation: Number of bits saved when encoding the target value with knowledge of the predictor. • Entropy gain is biased in favor of attributes with many values. Gain ratio discourages the selection of attributes with many uniformly distributed values. • SPSS CLementine Users Group
Tech Support toy database: is it the equipment or the commander? Decision Trees by Computational Intelligence
Applications • Predicting Magnetic Properties of Crystals • Profiling High Income Earners from Census Data • Assessing Churn Risk • Detecting Advertisements on the Web • Identifying Spam • Diagnosing Hypothyroidism