870 likes | 970 Views
Learning with Trees. Rob Nowak. University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett. www.ece.wisc.edu/~nowak. Artwork: Piet Mondrian. Basic Problem: Partitioning.
E N D
Learning with Trees Rob Nowak University of Wisconsin-Madison Collaborators: Rui Castro, Clay Scott, Rebecca Willett www.ece.wisc.edu/~nowak Artwork: Piet Mondrian
Basic Problem: Partitioning Many problems in statistical learning theory boil down to finding a good partition function partition
Classification Learning and Classification: build a decision rule based on labeled training data Labeled training features Classification rule: partition of feature space
Signal and Image Processing MRI data brain aneurysm Extracted vascular network Recover complex geometrical structure from noisy data
Partitioning Schemes Support Vector Machine image partitions
Why Trees ? • Simplicity of design • Interpretability • Ease of implementation • Good performance in practice Trees are one of the most popular and widely used machine learning / data analysis tools CART:Breiman, Friedman, Olshen, and Stone, 1984 Classification and Regression Trees C4.5: Quinlan 1993, C4.5: Programs for Machine Learning JPEG 2000: Image compression standard, 2000 http://www.jpeg.org/jpeg2000/
Example: Gamma-Ray Burst Analysis photon counts burst Compton Gamma-Ray Observatory Burst and Transient Source Experiment (BATSE) time One burst (10’s of seconds) emits as much energy as our entire Milky Way does in one hundred years ! x-ray “after glow”
fine partition Trees and Partitions coarse partition
Estimation using Pruned Tree piecewise constant fits to data on each piece of the partition provides a good estimate Each leaf corresponds to a sample f(ti ), i=0,…,N-1
Gamma-Ray Burst 845 piecewise linear fit on each cell piecewise polynomial fit on each cell
Decision (Classification) Trees Bayes decision boundary labeled training data complete partition pruned partition decision tree - majority vote at each leaf
Classification Ideal classfier Adapted partition histogram 256 cells in each partition
Image Partitions 1024 cells in each partition
Image Coding JPEG 0.125 bpp JPEG 2000 0.125 bpp non-adaptive partitioning adaptive partitioning
Classification and Regression Trees 1 1 1 1 1 1 0 1 1 0 0 0 0 1 0 0 0 0 0
Overfitting Problem crude stable accurate variable
Bias/Variance Trade-off large bias small variance coarse partition small bias large variance fine partition
trust no trust risk variance overfitting to data Partition Complexity and Overfitting empirical risk # leaves
“small” leafs contribute very little to penalty Complexity Regularization
Example: Image Denoising This is special case of “wavelet denoising” using Haar wavelet basis
DALI ? mustache eyes Classification
Learning from Data 0 1 0 1
Approximation and Estimation 0 Approximation 1 BIAS Model selection VARIANCE
Approximation Error Symmetric difference set Error
Approximation Error boundary smoothness risk functional (transition) smoothness