270 likes | 421 Views
Near-Minimax Optimal Learning with Decision Trees. Rob Nowak and Clay Scott. University of Wisconsin-Madison and Rice University. nowak@engr.wisc.edu. Supported by the NSF and the ONR. Basic Problem. Classification : build a decision rule based on labeled training data.
E N D
Near-Minimax Optimal Learning with Decision Trees Rob Nowak and Clay Scott University of Wisconsin-Madison and Rice University nowak@engr.wisc.edu Supported by the NSF and the ONR
Basic Problem Classification: build a decision rule based on labeled training data Given n training points, how well can we do ?
Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99
Dyadic Thinking about Classification Trees recursive dyadic partition
Dyadic Thinking about Classification Trees Pruned dyadic partition Pruned dyadic tree Hierarchical structure facilitates optimization
The Classification Problem Problem:
Classifiers The Bayes Classifier: Minimum Empirical Risk Classifier:
Ex. Dyadic Classification Trees Bayes decision boundary labeled training data pruned RDP complete RDP Dyadic classification tree
0 0 0 1 1 0 0 1 1 1 1 Codes for DCTs code-lengths: ex: code: 0001001111 + 6 bits for leaf labels
Compare with CART: Error Bounds for DCTs
Rate of Convergence Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99 C. Scott & RN ‘02
Why too slow ? because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced all |T| leaf trees are equally favored
Local Error Bounds in Classification Spatial Error Decomposition: Mansour & McAllester ‘00
Global vs. Local Key: local complexity is offset by small volumes!
Unbalanced Tree Global bound: J leafs depth J-1 Local bound:
Mammen & Tsybakov ‘99 C. Scott & RN ‘03 Convergence to Bayes Error
Concluding Remarks ~ data dependent bound Neural Information Processing Systems 2002, 2003 nowak@engr.wisc.edu