The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems?

The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems? • Very robust --- good average testing performance: outperform other methods over sets of diverse benchmarks. • Decision trees are still somewhat understandable for domain experts. • Very useful in early stages of a data analysis project:attributes near the root are very important, attributes near the leafs are somewhat important, attributes that do not occur or occur very rarely near the leafs are not important. • Information gain heuristic avoids searching a huge search space --- claim: searches an NP-hard search space quite well. • The approach avoids the combinatorial explosion of rules/nodes that other approaches face through the use of sophisticated pruning techniques and because of its hierarchical knowledge representation approach. • Can cope with: missing data, noisy data, mixed (numerical and symbolic) data. • Easy to use; do not require to provide additional domain knowledge. • Simplicity of the approach is appealing.

Decision Trees:The Bad News • Rely on rectangular approximations --- this kind of approximations is sometimes not be well suited for particular application domains. • Decision trees rely on the ordering of attribute values, and not their absolute differences; e.g. 5>3>1 and 3.0001>3>2.9999 is the same in the context of C5.0; basically, decision trees employ ordering based classification in contrast to distance-based classification which is used by techniques, such as nearest neighbors. If the notion of distance is of key importance for an application, decision trees might be less suitable for the application. • Not necessary good for applications in which a lot of attributes have a minor impact and very few or no attributes have a major impact on a decision --- violates the hierarchical nature of decision trees. • Data collections have to be in flat-file format, which causes problems with multi-valued attributes (but other approaches face similar problems) Summary: Although decision trees might not be “perfect” for all applications, I consider decision trees as one of the most promising machine learning and data mining technologies for classification tasks.

Decision Trees & the Concept Learning / Classification Tool Market • Main Competitors (performance is “comparable” to decision trees): • Neural Networks (good overall learning performance, have a hard time to tell what they learned) • Support Vector Machines (somewhat new) • Other Competitors (“inferior performance” or other problems): • Fuzzy Techniques (combinatorial explosion of rules, not easy to use, lack of heuristics, poor learning performance) • Discriminant Analysis (sound theoretical foundation, not very stable learning performance: does very well for some benchmarks and very badly for others) • Association rule learning (needs symbolic data sets, combinatorial explosion of rules), • Bayesian Rule-learning approaches(many diverse approaches which makes it difficult to evaluate the members of this group; most approaches are restricted to symbolic data sets) • Classical and Symbolic Regression (poor learning performance) • Nearest neighbor(success strongly depends on the availability of a “good” distance function; learning performance not very stable) • Logic-based rule-learning approaches, such as AQ-family (currently not very popular) Remark: The following evaluation is based on research projects that benchmarked various approaches which were conducted by the author and his students Y.J. Kim, Brandon Rabke, Ruijiang Zhang, Jim Reynolds and Zheng Wen.

The Good News: Why are Decision Trees Currently Quite Popular for Classification Problems?