200 likes | 334 Views
Margin Trees for High-dimensional Classification. Tibshirani and Hastie. Errata (confirmed by Tibshirani). Section 2 (a) about the property of 'single linkage‘. M should be M 0 Section 2.1 close to the last line of second paragraph. “at least” should be “at most”
E N D
Margin Trees for High-dimensional Classification Tibshirani and Hastie
Errata (confirmed by Tibshirani) • Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).
Targeted Problem • Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients
T ( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer
Higher Interpretability • Multi-class problems reduce to binary • 1vs1 voting not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)
Training Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}
Tree Structure(1/2) • Top-down Construction • Greedy
Greedy (1/3) 1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!
Greedy (2/3) 1,2,3 2,3 • Repeat in child nodes.
Greedy (2/3) 1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin
Tree Structure(2/2) • Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.
Complete Tree Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree
Efficient Greedy Tree Construction • Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0
Comparable testing performances (also 1vs1 voting) • Complete linkage tree more balance more interpretable
T T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function
Feature Selection • Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM force βi to zero
T T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0
Discussion • Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).