1 / 20

Margin Trees for High-dimensional Classification

Margin Trees for High-dimensional Classification. Tibshirani and Hastie. Errata (confirmed by Tibshirani). Section 2 (a) about the property of 'single linkage‘. M should be M 0 Section 2.1 close to the last line of second paragraph. “at least” should be “at most”

india
Download Presentation

Margin Trees for High-dimensional Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Margin Trees for High-dimensional Classification Tibshirani and Hastie

  2. Errata (confirmed by Tibshirani) • Section 2 (a) about the property of 'single linkage‘. M should be M0 • Section 2.1 close to the last line of second paragraph. “at least” should be “at most” • The statements about complete/single linkage are misleading. In fact, they use standard definition of complete/single linkage except the distance metric is replaced with margin between pairwise classes. (I traced their code to confirm this).

  3. Targeted Problem • Multi-class • #class >> 2 • High-dimensional, few samples • #features >> #data linear separable • already good accuracy, need interpretable model • Ex. micro-array data • feature : gene expression measurement • class: type of cancer • Instances: patients

  4. T ( ) ¯ ¯ S i + x g n x 0 Learn a Highly Interpretable Structure for Domain Experts Check certain genes Help create the link of gene to cancer

  5. Higher Interpretability • Multi-class problems  reduce to binary • 1vs1 voting  not meaningful • tree representation • Non-linear-separable data • single non-linear classifier • organized teams of linear classifiers • Solution: • Margintree =Hierarchical Tree + max-margin classifier + Feature Selection (interpretation) (minimize risk) (limited #feature/split)

  6. Training Construct tree structure Train max-margin classifier at each splitter Testing Start from root node Going down following the prediction of classifiers at splitting points ex. Right, Right  class: 3 Using margin-Tree {1} vs{2,3} {2} vs {3}

  7. Tree Structure(1/2) • Top-down Construction • Greedy

  8. Greedy (1/3) 1,2,3 • Starting from root with all classes {1,2,3} • find maximum margin among all partitions {1} vs {2,3}; {2} vs {1,3}; {3}vs{1,2} 2n-1partitions!

  9. Greedy (2/3) 1,2,3 2,3 • Repeat in child nodes.

  10. Greedy (2/3) 1,2,3 2,3 • Done! • Warning: Greedy not necessary lead to global optimum • i.e. find out the global maximal margin

  11. Tree Structure(2/2) • Bottom-up Treeiteratively merge closest groups. • Single linkage: distance = nearest pair. • Complete linkage: distance = farthest pair.

  12. Complete Tree

  13. Complete Tree Height(subtree) = distance(the farthest pair of classes)≥ Margin(cutting through the subtree) When looking for a Margin > Height(substree), never break classes in the subtree

  14. Efficient Greedy Tree Construction • Construct a complete linkage tree T • Estimate current lower bound of maximal margin M0= max Margin(individual class, rest) • To find a margin ≥ M0We only needto consider partition between{5,4,6}, {1}, {2,3} M0

  15. Comparable testing performances (also 1vs1 voting) • Complete linkage tree more balance  more interpretable

  16. T T ( ) ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Recall the cutting plane βis the weight of features in decision function

  17. Feature Selection • Hard-thresholding at each split • Discard n features with low abs(βi) by setting βi=0 • Proportional to margin: n = α|Margin| • α chosen by cross-validation error • βunavailable using non-linear kernel • Alternative methods • L1-norm SVM  force βi to zero

  18. T T ( ) ¯ ¯ ¯ ¯ ¯ D S i i i 0 + + e c x s o n g n x = = 0 0 Setting βi=0

  19. Feature Selection Result

  20. Discussion • Good for multi-class, high-dimensional data • Bad for non-linear separable data. • Each node will contain impure dataimpure β • Testing performance comparable to traditional multi-class max-margin classifiers (SVMs).

More Related