Near-Minimax Optimal Learning with Decision Trees

Near-Minimax Optimal Learning with Decision Trees Rob Nowak and Clay Scott University of Wisconsin-Madison and Rice University nowak@engr.wisc.edu Supported by the NSF and the ONR

Basic Problem Classification: build a decision rule based on labeled training data Given n training points, how well can we do ?

Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99

Dyadic Thinking about Classification Trees recursive dyadic partition

Dyadic Thinking about Classification Trees Pruned dyadic partition Pruned dyadic tree Hierarchical structure facilitates optimization

The Classification Problem Problem:

Classifiers The Bayes Classifier: Minimum Empirical Risk Classifier:

Generalization Error Bounds

Selecting a good h

Convergence to Bayes Error

Ex. Dyadic Classification Trees Bayes decision boundary labeled training data pruned RDP complete RDP Dyadic classification tree

0 0 0 1 1 0 0 1 1 1 1 Codes for DCTs code-lengths: ex: code: 0001001111 + 6 bits for leaf labels

Compare with CART: Error Bounds for DCTs

Rate of Convergence Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99 C. Scott & RN ‘02

Why too slow ? because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced all |T| leaf trees are equally favored

Local Error Bounds in Classification Spatial Error Decomposition: Mansour & McAllester ‘00

Relative Chernoff Bound

Local Error Bounds in Classification

Bounded Densities

Global vs. Local Key: local complexity is offset by small volumes!

Local Bounds for DCTs

Unbalanced Tree Global bound: J leafs depth J-1 Local bound:

Mammen & Tsybakov ‘99 C. Scott & RN ‘03 Convergence to Bayes Error

Concluding Remarks ~ data dependent bound Neural Information Processing Systems 2002, 2003 nowak@engr.wisc.edu

Near-Minimax Optimal Learning with Decision Trees

Near-Minimax Optimal Learning with Decision Trees

Presentation Transcript

Game Trees – The Minimax Method

Learning Markov Network Structure with Decision Trees

Decision Trees

Classification with Decision Trees

Learning Decision Trees

Decision Trees with Variance

Learning with Trees

Learning: Overview and Decision Trees

Decision Trees

Mining Optimal Decision Trees from Itemset Lattices

Decision Trees

Learning decision trees

Learning decision trees

Learning - Decision Trees

Machine Learning: Decision Trees

Near-Optimal Hot-Potato Routing on Trees

Machine Learning, Decision Trees, Overfitting

Machine Learning: Decision Trees

Decision Trees