290 likes | 424 Views
תשס״ד בר־ אילן אוניברסיטת המוח לחקר ברשתות המרכז הרב תחומי מרוכז קורס. Section 2: On-line Learning. Based on slides from Michael Biehl’s summer course. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004.
E N D
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Section 2: On-line Learning Based on slides from Michael Biehl’s summer course Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Section 2.1: The Perceptron Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס The Perceptron Input: Adaptive Weights J Output: S Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס W Perceptron: binary output Implements a linearly separable classification of inputs Milestones: Perceptron convergence theorem, Rosenblatt (1958) Capacity, winder (1963) Cover(1965) Statistical Physics of perceptron weights, Gardner (1988) How does this device learn? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning a linearly separable rule from reliable examples • Unknown rule: ST()=sign(B) =±1 Defines the correct classification. Parameterized through a teacher perceptron with weights BRN, (BB=1) • Only available information: example data D= {, ST()=sign(B) for =1…P } Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning a linearly… (Cont.) • Training: finding the student weights J • J parameterizes a hypothesis SS()=sign(J) • Supervised learning is based on the student performance with respect to the training data D • Binary error measure T(J)= [SS(),ST()] T(J)=0 if SS()ST() T(W)=1 if SS()=ST() Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Off-line learning • Guided by the minimization of a cost function H(J), e.g., the training error H(J) tT(J) Equilibrium statistical mechanics treatment: • Energy H of N degrees of freedm • Ensemble of systems is in thermal equilibrium at formal temperature • Disorder avg. over random examples (replicas) assumes distribution over the inputs • Macroscopic description, order parameters • Typical properties of large sustems, P= N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס On-line training • Single presentation of uncorrelated (new) {,ST()} • Update of student weights: • Learning dynamics in discrete time Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס On-line training - Statistical Physics approach • Consider sequence of independent, random • Thermodynamic limit • Disorder average over latest example self-averaging properties • Continuous time limit Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization Performance of the student (after training) with respect to arbitrary, new input • In practice: empirical mean of mean error measure over a set of test inputs • In the theoretical analysis: average over the (assumed) probability density of inputs Generalization error: Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization (cont.) The simplest model distribution: Isotropic density P(), uncorrelated with B and J Consider vectors of independent identically distributed (iid) components j with Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס B J Geometric argument Projection of data into (B, J)-plane yields isotropic density of inputs g=/ ST()=SS() For |B|=1 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Overlap Parameters Sufficient to quantify the success of learning R=BW Q=JJ Random guessing R=0, g=1/2 Perfect generalization , g=0 Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Derivation for large N Given B, J, and uncorrelated random input i=0, i j =ij, consider student/teacher fields that are sums of (many) independent random quantities: x=J=∑iJiI y=B=∑iBii Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Central Limit Theorem Joint density of (x,y) is for N→∞, a two dimensional Gaussian, fully specified by the first and the second moments x=∑iJii=0y=∑iBii=0 x2 = ∑ijJiJjij = ∑iJi2 = Q y2 = ∑ijBiBjij = ∑iBi2 = 1 xy = ∑ijJiBjij = ∑iJiBi = R Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Central Limit Theorem (Cont.) Details of the input are irrelevant. Some possible examples: binary, i1, with equal prob. Uniform, Gaussian. Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Generalization Error The isotropic distribution is also assumed to describe the statistics of the example data inputs Exercise: Derive the generalization error as a function of R,Q use Mathematical notes Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Assumptions about the data • No spatial correlatins • No distinguished directions in the input space • No temporal correlations • No correlations with the rule • Single presentation without repeatitions Consequences: • Average over data can be performed step by step • Actual choice of B is irrelevant, it is not necessary to averaged over the teacher Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebbian learning (revisited) Hebb 1949 • Off-line interpretation Vallet 1989 Choice of student weights given D={,ST}=1P J(P)= ∑ST/N • Equivalent On-line interpretation Dynamics upon single presentation of examples J()= J(-1) + ST/N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line From microscopic to macroscopic: recursions for overlaps Exercise: Derive the update equations of R,Q Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Average over the latest example … The random input, enters only through the fields The random input and J(-1), Bare statistically independent The Central Limit Theorems applies and obtains the joint density Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Exercise: Derive the update equations of R,Q as a function of use Mathematical notes Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line (Cont.) Continuous time limit, N→∞, = /N, d=1/N Initial conditions - tabula rasa R(0)=Q(0)=0 What are the mean values after training with N examples??? [See matlab code] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Hebb: on-line mean values Self average properties of A(J): • The width of the distribution vanishes • The observation of a value of A different from its mean occurs with vanishing probability The order parameters, Q and R, are self averaging for infinite N Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning Curve: dependent of the order parameters The normalized overlap between the two vectors, B, J provides the angle between the vectors two vectors Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Learning Curve: dependent of the order parameters Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Asymptotic expansion [draw w. matlab] Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Modified Hebbian learning The training algorithm is defined by a modulation function f J()= J(-1) +f(…) ST/N Restriction: f may depend on available quantities: f(J(-1),,ST) Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004
תשס״דבר־ אילןאוניברסיטתהמוחלחקרברשתות המרכזהרבתחומימרוכזקורס Questions: • Is the perceptron algorithm Rosenblatt 1959, that learns only when there is a mistake performs better than the Hebb algorithm? • What training algorithm will provide the best learning/ the fastest asymptotic decrease? • Is it possible to achieve an asymptotic behavior, on-line? Mini-course on ANN and BN, The Multidisciplinary Brain Research center, Bar-Ilan University, May 2004