1 / 24

A Statistical Mechanical Analysis of Online Learning:

This study analyzes the generalization performance of a student in an online learning model composed of a true teacher, ensemble teachers, and the student. By using statistical mechanics, the relationship between the number and diversity of ensemble teachers and the generalization error is discussed.

jhattie
Download Presentation

A Statistical Mechanical Analysis of Online Learning:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

  2. Background (1) • Batch Learning • Examples are used repeatedly • Correct answers for all examples • Long time • Large memory • Online Learning • Examples used once are discarded • Cannot give correct answers for all examples • Large memory isn't necessary • Time variant teacher

  3. Jan. 2006 A Statistical Mechanical Analysis of Online Learning:Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

  4. True Teacher A Student Moving Teacher Jan. 2006

  5. A Statistical Mechanical Analysis of Online Learning: Many Teachers or Few Teachers ? Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp

  6. True teacher Ensemble teachers Student

  7. P U R P O S E To analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher To discuss the relationship between the number, the diversity of ensemble teachers and the generalization error

  8. M O D E L (1/4) True teacher • J learnsB1,B2,・・・ in turn. • J can not learnA directly. • A, B1,B2,・・・,J are linear perceptrons with noises. Ensemble teachers Student

  9. +1 -1 Simple Perceptron Output Connection weights Inputs

  10. Linear Perceptron Simple Perceptron Output Connection weights Inputs

  11. M O D E L (2/4) Linear Perceptrons with Noises

  12. Inputs: • Initial value of student: • True teacher: • Ensemble teachers: • N→∞ (Thermodynamic limit) • Order parameters • Length of student • Direction cosines M O D E L (3/4)

  13. True teacher Ensemble teachers Student

  14. fkm Student learns K ensemble teachers in turn. M O D E L (4/4) Squared errors Gradient method

  15. GENERALIZATION ERROR • A goal of statistical learning theory is to obtain generalization error theoretically. • Generalization error= mean of errors over the distribution of new input

  16. Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameters

  17. Analytical solutions of order parameters

  18. GENERALIZATION ERROR • A goal of statistical learning theory is to obtain generalization error theoretically. • Generalization error= mean of errors over the distribution of new input

  19. Dynamical behaviors of generalization error, RJand l (η=0.3, K=3, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) Student J Ensemble teachers

  20. Analytical solutions of order parameters

  21. ・If 0<η<2 Steady state analysis( t → ∞ ) ・If η<0 orη>2 Generalization error and lengthof student diverge. If η<1, the more teachers existor the richer the diversity of teachers is, the cleverer the student can become. If η>1, the fewer teachers existor the poorer the diversity of teachers is, the cleverer the student can become.

  22. Steady value of generalization error, RJand l (K=3, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) J

  23. Steady value of generalization error, RJand l (q=0.49, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) J

  24. CONCLUSIONS We have analyzed the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student. Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we have proven that when the learning rate satisfies η<1, the larger the number K is and the more diversity the teachers have, the smaller the generalization error is. On the other hand, when η>1, the properties are completely reversed. If the diversity of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.

More Related