240 likes | 253 Views
This study analyzes the generalization performance of a student in an online learning model composed of a true teacher, ensemble teachers, and the student. By using statistical mechanics, the relationship between the number and diversity of ensemble teachers and the generalization error is discussed.
E N D
A Statistical Mechanical Analysis of Online Learning: Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp
Background (1) • Batch Learning • Examples are used repeatedly • Correct answers for all examples • Long time • Large memory • Online Learning • Examples used once are discarded • Cannot give correct answers for all examples • Large memory isn't necessary • Time variant teacher
Jan. 2006 A Statistical Mechanical Analysis of Online Learning:Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp
True Teacher A Student Moving Teacher Jan. 2006
A Statistical Mechanical Analysis of Online Learning: Many Teachers or Few Teachers ? Seiji MIYOSHI Kobe City College of Technology miyoshi@kobe-kosen.ac.jp
True teacher Ensemble teachers Student
P U R P O S E To analyze generalization performance of a model composed of a student, a true teacher and K teachers (ensemble teachers) who exist around the true teacher To discuss the relationship between the number, the diversity of ensemble teachers and the generalization error
M O D E L (1/4) True teacher • J learnsB1,B2,・・・ in turn. • J can not learnA directly. • A, B1,B2,・・・,J are linear perceptrons with noises. Ensemble teachers Student
+1 -1 Simple Perceptron Output Connection weights Inputs
Linear Perceptron Simple Perceptron Output Connection weights Inputs
M O D E L (2/4) Linear Perceptrons with Noises
Inputs: • Initial value of student: • True teacher: • Ensemble teachers: • N→∞ (Thermodynamic limit) • Order parameters • Length of student • Direction cosines M O D E L (3/4)
True teacher Ensemble teachers Student
fkm Student learns K ensemble teachers in turn. M O D E L (4/4) Squared errors Gradient method
GENERALIZATION ERROR • A goal of statistical learning theory is to obtain generalization error theoretically. • Generalization error= mean of errors over the distribution of new input
Simultaneous differential equations in deterministic forms, which describe dynamical behaviors of order parameters
GENERALIZATION ERROR • A goal of statistical learning theory is to obtain generalization error theoretically. • Generalization error= mean of errors over the distribution of new input
Dynamical behaviors of generalization error, RJand l (η=0.3, K=3, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) Student J Ensemble teachers
・If 0<η<2 Steady state analysis( t → ∞ ) ・If η<0 orη>2 Generalization error and lengthof student diverge. If η<1, the more teachers existor the richer the diversity of teachers is, the cleverer the student can become. If η>1, the fewer teachers existor the poorer the diversity of teachers is, the cleverer the student can become.
Steady value of generalization error, RJand l (K=3, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) J
Steady value of generalization error, RJand l (q=0.49, RB=0.7, σA2=0.0, σB2=0.1, σJ2=0.2) J
CONCLUSIONS We have analyzed the generalization performance of a student in a model composed of linear perceptrons: a true teacher, K teachers, and the student. Calculating the generalization error of the student analytically using statistical mechanics in the framework of on-line learning, we have proven that when the learning rate satisfies η<1, the larger the number K is and the more diversity the teachers have, the smaller the generalization error is. On the other hand, when η>1, the properties are completely reversed. If the diversity of the K teachers is rich enough, the direction cosine between the true teacher and the student becomes unity in the limit of η→0 and K→∞.