1 / 87

Learning with Trees

Learning with Trees. Rob Nowak. University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett. www.ece.wisc.edu/~nowak. Artwork: Piet Mondrian. Basic Problem: Partitioning.

hong
Download Presentation

Learning with Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning with Trees Rob Nowak University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett www.ece.wisc.edu/~nowak Artwork: Piet Mondrian

  2. Basic Problem: Partitioning Many problems in statistical learning theory boil down to finding a good partition function partition

  3. Classification Learning and Classification: build a decision rule based on labeled training data Labeled training features Classification rule: partition of feature space

  4. Signal and Image Processing MRI data brain aneurysm Extracted vascular network Recover complex geometrical structure from noisy data

  5. Partitioning Schemes Support Vector Machine image partitions

  6. Why Trees ? • Simplicity of design • Interpretability • Ease of implementation • Good performance in practice Trees are one of the most popular and widely used machine learning / data analysis tools CART:Breiman, Friedman, Olshen, and Stone, 1984 Classification and Regression Trees C4.5: Quinlan 1993, C4.5: Programs for Machine Learning JPEG 2000: Image compression standard, 2000 http://www.jpeg.org/jpeg2000/

  7. Example: Gamma-Ray Burst Analysis photon counts burst Compton Gamma-Ray Observatory Burst and Transient Source Experiment (BATSE) time One burst (10’s of seconds) emits as much energy as our entire Milky Way does in one hundred years ! x-ray “after glow”

  8. fine partition Trees and Partitions coarse partition

  9. Estimation using Pruned Tree piecewise constant fits to data on each piece of the partition provides a good estimate Each leaf corresponds to a sample f(ti ), i=0,…,N-1

  10. Gamma-Ray Burst 845 piecewise linear fit on each cell piecewise polynomial fit on each cell

  11. Recursive Partitions

  12. Adapted Partition

  13. Image Denoising

  14. Decision (Classification) Trees Bayes decision boundary labeled training data complete partition pruned partition decision tree - majority vote at each leaf

  15. Classification Ideal classfier Adapted partition histogram 256 cells in each partition

  16. Image Partitions 1024 cells in each partition

  17. Image Coding JPEG 0.125 bpp JPEG 2000 0.125 bpp non-adaptive partitioning adaptive partitioning

  18. Probabilistic Framework

  19. Prediction Problem

  20. Challenge

  21. Empirical Risk

  22. Empirical Risk Minimization

  23. Classification and Regression Trees

  24. Classification and Regression Trees 1 1 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0

  25. Empirical Risk Minimization on Trees

  26. Overfitting Problem crude stable accurate variable

  27. Bias/Variance Trade-off large bias small variance coarse partition small bias large variance fine partition

  28. Estimation and Approximation Error

  29. Estimation Error in Regression

  30. Estimation Error in Classification

  31. trust no trust risk variance overfitting to data Partition Complexity and Overfitting empirical risk # leaves

  32. Controlling Overfitting

  33. Complexity Regularization

  34. Per-Cell Variance Bounds: Regression

  35. Per-Cell Variance Bounds: Classification

  36. Variance Bounds

  37. A Slightly Weaker Variance Bound

  38. “small” leafs contribute very little to penalty Complexity Regularization

  39. Example: Image Denoising This is special case of “wavelet denoising” using Haar wavelet basis

  40. Theory of Complexity Regularization

  41. Coffee Break !

  42. DALI ? mustache eyes Classification

  43. Probabilistic Framework

  44. Learning from Data 0 1 0 1

  45. Approximation and Estimation 0 Approximation 1 BIAS Model selection VARIANCE

  46. Classifier Approximations 0 1

  47. Approximation Error Symmetric difference set Error

  48. Approximation Error boundary smoothness risk functional (transition) smoothness

  49. Boundary Smoothness

More Related