Split Selection for Decision Tree

Split Selection for Decision Tree We already learned on selection method • Information Gain (entropy) • Are there any other selection criterion? • Yes • How many are they? • A Lot • Intuition • Formalism • Various Selection Methods CAP5610

CAP5610

How to determine the Best Split • Greedy approach: • Nodes with homogeneous class distribution are preferred • Need a measure of node impurity: Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity CAP5610

M0 M2 M3 M4 M1 M12 M34 How to Find the Best Split Before Splitting: A? B? Yes No Yes No Node N1 Node N2 Node N3 Node N4 Gain = M0 – M12 vs M0 – M34

CAP5610

Remedy: Concavity • Use impurity functions that are concave • Phi’’ <0 • Example impurity functions • Entropy • Gini-Index CAP5610

Entropy as Impurity Measure • Entropy at a given node t: (NOTE: p( j | t) is the relative frequency of class j at node t). • Measures homogeneity of a node. • Maximum (log nc) when records are equally distributed among all classes implying least information • Minimum (0.0) when all records belong to one class, implying most information • Entropy based computations are similar to the GINI index computations

Measure of Impurity: GINI • Gini Index for a given node t : (NOTE: p( j | t) is the relative frequency of class j at node t). • Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information • Minimum (0.0) when all records belong to one class, implying most interesting information

Misclassification Error / Resubstitution Error • Classification error at a node t : • Measures misclassification error made by a node. • Maximum (1 - 1/nc) when records are equally distributed among all classes, implying least interesting information • Minimum (0.0) when all records belong to one class, implying most interesting information

Comparison among Splitting Criteria For a 2-class problem:

CAP5610

Split Selection for Decision Tree

Split Selection for Decision Tree

Presentation Transcript

Decision tree

Decision tree

Efficient Determination of Dynamic Split Points in a Decision Tree

Decision Tree

Decision Tree

Decision tree

“Split Cherry Tree”

Calculations for Decision Tree

Tree Selection

Decision Tree

DECISION TREE

Insertion Policy Selection Using Decision Tree Analysis

DECISION TREE

Decision Tree

Decision Tree

Decision Tree

Decision Tree

Decision Tree

Decision Tree

Decision Tree