1 / 118

2004 年 11 月 24 日(水)~ 26 日(金)

2004 Open Lecture at ISM Recent topics in machine learning: Boosting. 2004 年 11 月 24 日(水)~ 26 日(金). 公開講座 統計数理要論 「機械学習の最近の話題」. ブースト学習. 江口 真透. (統計数理研究所 , 総合研究大学院統計科学). 講座内容. ブースト学習 : 統計的パタン認識の手法であるアダブーストを 概説し、 その長所と欠点について考察します . 遺伝子発現、リモートセンシング・データ などの適用例の紹介をします。.

Download Presentation

2004 年 11 月 24 日(水)~ 26 日(金)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2004 Open Lecture at ISM Recent topics in machine learning: Boosting 2004年11月24日(水)~ 26日(金) 公開講座 統計数理要論「機械学習の最近の話題」 ブースト学習 江口 真透 (統計数理研究所, 総合研究大学院統計科学)

  2. 講座内容 • ブースト学習: • 統計的パタン認識の手法であるアダブーストを • 概説し、その長所と欠点について考察します. • 遺伝子発現、リモートセンシング・データ • などの適用例の紹介をします。

  3. Boost Leaning (I) • 10:00-12:30 November 25 Thu Boost learning algorithm AdaBoost AsymAdaBoost AsymLearning EtaBoost Robust learning GroupBoost Group learning

  4. Boost Leaning (II) 13:30-16:00 November 26 Fri BridgeBoost Meta learning LocalBoost Local learning Statistical discussion Probablistic framework Bayes Rule, Fisher’s LDF, Logistic regression Optimal classifier by AdaBoost

  5. 謝辞 • この講座で紹介された内容の多くは,以下の • 共同研究者の方々との成果を含む.ここに感謝する. • 村田 昇氏(早稲田大学理工学) • 西井龍映氏(九州大学数理学) • 金森敬文氏(東京工大,情報数理) • 竹之内高志氏(統計数理研究所) • 川喜田正則君(総研大,統計科学) • John B. Copas (Dept Stats, Univ of Warwick)

  6. The strength of the weak learnability. Schapire, R. (1990) Strong learnability If, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. Weak learnability The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing.

  7. Web-page on Boost Boosting Research Site: http: //www.boosting.org/ Robert Schapire’s home page: http://www.cs.princeton.edu/~schapire/ Yoav Freund's home page : http://www1.cs.columbia.edu/~freund/ John Lafferty http://www-2.cs.cmu.edu/~lafferty/

  8. Statistical pattern recognition Recognition for Character, Image, Speaker, Signal, Face, Language,… Prediction for Weather, earthquake, disaster, finance, interest rates, company bankruptcy, credit, default, infection, disease, adverse effect Classification for Species, parentage, genomic type, gene expression, protein expression, system failure, machine trouble

  9. Multi-class classification Class label Feature vector Discriminant function Classification rule

  10. Binary classification label 0-normalization Classification rule Learn a training dataset Make a classification

  11. Statistical learning theory • Boost learning Boost by filter (Schapire, 1990) Bagging, Arching (bootstrap) (Breiman, Friedman, Hasite) AdaBoost (Schapire, Freund, Batrlett, Lee) Support vector Maximize margin Kernel space (Vapnik, Sholkopf)

  12. Class of weak machines Stamp class Linear class ANN class SVM class kNN class Point:colorful character rather than universal character

  13. AdaBoost

  14. Learning algorithm Final machine

  15. Simulation (complete separation) Feature space [-1,1]×[-1,1] Decision boundary

  16. Set of weak machines Linear classification machines Random generation

  17. Learning process (I) Iter = 1, train err = 0.21 Iter = 13, train err = 0.18 Iter = 17, train err = 0.10 Iter = 47, train err = 0.08 Iter = 23, train err = 0.10 Iter = 31, train err = 0.095

  18. Learning process (II) Iter = 55, train err = 0.061 Iter = 99, train err = 0.032 Iter = 155, train err = 0.016

  19. Final stage Contour ofF(x) Sign(F(x))

  20. 0.2 0.15 0.1 0.05 50 100 150 200 250 Learning curve Training error Iter = 1,…..,277

  21. Characteristics Update Weighted error rates ( least favorable )

  22. Exponential loss Exponential loss : Update by

  23. Sequential minimization where Equality holds iff

  24. AdaBoost = minimum exp-loss

  25. Simulation (complete random)

  26. Overlearning of AdaBoost Iter = 51, train err = 0.21 Iter = 151, train err = 0.06 Iter =301, train err = 0.0

  27. Drawbacks of AdaBoost 1. Unbalancing learning AsymAdaBoost Balancing the false n/ps’ 2. Over-learning even for noisy dataset Robustfy mislabelled examples EtaBoost GroupBoost Relax the p >> n problems LocalBoost Extract spatial information BridgeBoost Combine different datasets

  28. AsymBoost The small modification of AdaBoost into 2 (b)’ The selection of k The default choice is

  29. Weighted errors by k

  30. Result of AsymBoost

  31. Eta-loss function regularized

  32. EtaBoost (b)

  33. A toy example

  34. AdaBoost vs Eta-Boost

  35. Simulation (complete random) Overlearning of AdaBoost Iter = 51, train err = 0.21 Iter =301, train err = 0.0

  36. EtaBoost Iter = 51, train err = 0.25 Iter = 51, train err = 0.15 Iter =351, train err = 0.18

  37. Mis-labeled examples Mis-labeled

  38. Comparison AdaBoost EtaBoost

  39. GroupBoost Relax over-learning of AdaBoost by group learning Idea: In AdaBoost 2 (a) The best machine is singly selected Other better machines are cast off. Is there any wise way of grouping G best macines?

  40. Grouping machines

  41. GroupBoost

  42. Grouping jumps for the next

  43. Learning archtecture Grouping G machines

  44. AdaBoost and GroupBoost Update the weights

  45. From microarray Contest program from bioinformatics (BIP2003) http://contest.genome.ad.jp/ Microarray data Number of genes p = 1000~100000 Size of individuals n = 10~100

  46. Output http://genome-www.stanford.edu/cellcycle/

More Related