1.23k likes | 1.47k Views
Chapter 3 Mapping Networks. 國立雲林科技大學 資訊工程研究所 張傳育 (Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: chuanyu@yuntech.edu.tw. Introduction. Mapping Networks Associative memory networks Feed-forward multilayer perceptrons Counter-propagation networks
E N D
Chapter 3Mapping Networks 國立雲林科技大學 資訊工程研究所 張傳育(Chuan-Yu Chang ) 博士 Office: ES 709 TEL: 05-5342601 ext. 4337 E-mail: chuanyu@yuntech.edu.tw
Introduction • Mapping Networks • Associative memory networks • Feed-forward multilayer perceptrons • Counter-propagation networks • Radial Basis Function networks. Mapping ANN (.) x y
Associative Memory Networks • 記憶容量(memory capability)對於資訊系統而言,記憶(remember)並且降低(reduce)所儲存的資訊量是很重要的。 • 儲存的資訊必須被適當的存放在網路的記憶體中,也就是說,給予一個關鍵的(key)輸入或刺激(stimulus),將從聯想式記憶體(associative memory)中擷取記憶化的樣本(memorized pattern),以適當的方式輸出。
Associative Memory Networks (cont.) • 在神經生物學的系統中,記憶(memory)的概念與神經因環境和有機結構的互動改變有關。如果變化不存在,則不會有記憶存在。 • 如果記憶是有用的,則它必須是可被存取的(從神經系統),所以會發生學習(learning),而且也可以擷取(retrieval)資訊。 • 一個樣本(pattern)可經由學習的過程(learning process)存在記憶體。
Associative Memory Networks (cont.) • 依據記憶時間的長短,有兩種記憶形式: • Long-term memory • Short-time memory • 有兩種基本的associative memory • Autoassociative memory • A key input vector is associated to itself. • The input and output space dimensions are the same. • Heteroassociative memory • Key input vectors are associated with arbitrary memorized vectors. • The output space dimension may be different from the input space dimension.
General Linear Distributed Associative Memory • General Linear Distributed Associative Memory 的學習過程給網路一個key input pattern (vector),然後記憶體轉換這個向量(vector)成一個儲存(記憶)的pattern.
General Linear Distributed Associative Memory (cont.) • Input (key input pattern) • Output (memorized pattern) • 對一個n-dimension的神經架構,可聯想(associate) h個pattern。h<=n。實際上是h < n。 (3.1) (3.2)
General Linear Distributed Associative Memory (cont.) • Key vector xk和記憶向量yk間的線性對應可寫成:其中W(k)為權重矩陣(weight matrix) • 從這些權重矩陣,可建構Memory matrix M • describes the sum of the weight matrices for every input/output pair. This can be written asthe memory matrix can be thought of as representing the collective experience. (3.3) (3.4) (3.5)
Correlation Matrix Memory • Estimation of the memory matrix • Outer product rule • Each outer product matrix is an estimate of the weight matrix W(k) that maps the output pattern yk onto the input pattern xk. • An associative memory designed in outer product rule is called a correlation matrix memory (3.6) (3.7)
Correlation Matrix Memory (cont.) • 當某key pattern輸入時,希望memory matrix能recall適當的memorized pattern,假設estimated memory matrix已經由(3.6)學習了h次,則輸入第q個key input xq,欲求memorized yq(3.8)式可改寫成 (將(3.6)代入(3.8))假設每個key input vector具有normalized unit length. (3.8) (3.9) (3.10)
Correlation Matrix Memory (cont.) • 此normalized unit length可藉由除以Euclidean norm來求得:因此,(3.9)可改寫成其中 (3.11) Desired response (3.12) Noise, crosstalk (3.13)
Correlation Matrix Memory (cont.) • 從(3.13)當網路引進associated key pattern (y=yq, zq=0) • 若y=yq表示輸入xq可精確的得到其對應的輸出yq。 • 若zq不為0,表示有noise或crosstalk。 • 若key input之間為orthogonal,則crosstalk為0 • 因此可得到perfect memorized pattern。 • 若輸入的key input之間存在linearly independent,則可在計算memory matrix之前先使用Gram-Schmidt orthogonalization來產生orthogonal input key vectors。 (3.14)
Correlation Matrix Memory (cont.) • The storage capacity of the associative memory is • The storage limit depends on the rank of the memory matrix.
Correlation Matrix Memory (cont.) • Example 3.1 • Autoassociative memoryThe memory is trained with three key vectors:Each of these vectors has unit length. (3.15)
Correlation Matrix Memory (cont.) • The angles between these vectors: • These three vectors are far from being mutually orthogonal. (3.16) (3.17) (3.18)
Correlation Matrix Memory (cont.) • The memory matrix of the autoassociative network • Using (3.19), the estimate of the input key patterns (3.19) (3.20)
Correlation Matrix Memory (cont.) • The Euclidean distance of the response vector from each of the key vectors. (3.21) (3.22) (3.23) 有較小的error
Correlation Matrix Memory (cont.) • Another key vector with unit length • The angles between these vectors (3.24) (3.25) (3.26) (3.27)
Correlation Matrix Memory (cont.) • The memory matrix is • The estimate of input key patterns (3.28) (3.29)
Correlation Matrix Memory (cont.) • The Euclidean distance of the response vector from each of the key vectors. (3.30) (3.31) (3.32) 相較於(3.21)~(3.23)有更低的error。
Correlation Matrix Memory (cont.) • An error correction approach for correlation matrix memories • The drawback of associative memory is the relatively large number of errors that can occur during recall. • The simplest correlation matrix memory has no provision for correcting errors • Lack of feedback from the output to input.
Correlation Matrix Memory (cont.) • The objective of the error correction approach is to have the associative memory reconstruct the memorized pattern in an optimal sense. • The error vector is defined as其中,yk是desired pattern with the key input pattern xk. • The discrete-time learning rule based on steepest descent (3.33) (3.34)
Correlation Matrix Memory (cont.) • 其中energy function定義成 • Computing the gradient of (3.35) • 將(3.36)代入(3.34) (3.35) (3.36) (3.37) Error vector
Correlation Matrix Memory (cont.) • (3.37)式的第二項中的中括號的內容,為 error vector(參見(3.33)) ,因此具有error correction的效果。 • 將(3.37)展開可改寫成和(3.7)比較起來,多了第二項。 • (3.37)式基於error correction的監督式學習是對h個association的每個不同pattern重複訓練所得到。 (3.38) (3.39)
Correlation Matrix Memory (cont.) • Selecting the learning rate parameter • Fixed learning rate • Adjustable learning rate with respect to time • For each association, the iterative adjustments to the memory matrix (3.37) continue until the error vector ek (3.33) becomes negligibly small. • The initial memory matrix M(0)=0. • 和(2.29)比較,發現(3.37)也是LMS。
Backpropagation Learning Algorithm • Backpropagation algorithm • Generalized delta rule • Training MLPs with backpropagation algorithms results in a nonlinear mapping. • The MLPs can have its synaptic weights adjusted by the BP algorithm to develop a specific nonlinear mapping. • The fixed weights after the training process can provide an association task for classification, pattern recognition, diagnosis, etc. • During the training phase of the MLP, the synaptic weights are adjusted to minimize the disparity between the actual and desired outputs of the MLP, averaged over all input patterns.
Backpropagation Learning Algorithm (cont.) • Basic BP algorithm for the feedforward MLP 3 layers network -1 output layer -2 hidden layers
Backpropagation Learning Algorithm (cont.) • The standard BP algorithm for training of the MLP NN is based on the steepest descent gradient approach applied to the minimization of an energy function representing the instantaneous error.其中 dq表示第q個input的desired output.X(3)out=yq為MLP的實際輸出。 (3.40)
Backpropagation Learning Algorithm (cont.) • Using the steepest-descent gradient approach, the learning rule for a network weight in any one of the network layers is given bywhere s=1,2,3 (3.42)
Backpropagation Learning Algorithm (cont.) • The weights in the output layer can be updated according to • Using the chain rule for the partial derivatives, (3.42) can be rewritten as (3.42) (3.43)
Backpropagation Learning Algorithm (cont.) • 分別計算(3.43)式的各項or (3.44) Eq以(4.30)式代入 (3.45) (3.46) Local error, delta
Backpropagation Learning Algorithm (cont.) • Combining (3.43), (3.44), (3.46), the learning rule for the weights in the output layer of the network isor • In the hidden layer, applying the steepest descent gradient approach (3.47) (3.48) (3.49)
Backpropagation Learning Algorithm (cont.) • (3.49)式右側的二階微分項,可表示成 (3.50) (3.51) (3.52)
Backpropagation Learning Algorithm (cont.) • Combining equation (3.49), (3.50), and (3.52) yieldsor (3.53) (3.54)
Backpropagation Learning Algorithm (cont.) • Generalized weights update formwhere (for the output layer)(for the hidden layers) (3.55) (3.56) (3.57)
Backpropagation Learning Algorithm (cont.) • Standard BP algorithm • Step 1: Initialize the network synaptic weights to small random values. • Step 2: From the set of training input/output pair, present an input pattern and calculate the network response. • Step 3: The desired network response is compared with the actual output of the network, and by using (3.56) and (3.57) all the local errors can be computed. • Step 4: The weights of the network are updated according to (3.55). • Step 5: Until the network reaches a predetermined level of accuracy in producing the adequate response for all the training patterns, continue steps 2 through 4.
Backpropagation Learning Algorithm (cont.) • Some Practical Issues in Using Standard BP • Initialization of synaptic weights • Initially set to small random values. • 若是設太大,很可能會造成saturation. • Heuristic algorithm (Ham, Kostanic 2001) • Set the weights are uniformly distributed random numbers in the interval from -0.5/fan-in to 0.5/fan-in
Backpropagation Learning Algorithm (cont.) • Nguyen and Widrow’s initialization algorithm • Define (適合在具有一個隱藏層的架構) • n0: number of components in input layer • n1: number of neurons in hidden layer • g: scaling factor • Step 1: Compute the scaling factor according to • Step 2: Initialize the weights wijof a layer as random numbers between -0.5 and 0.5 • Step 3: Reinitialize the weights according to • Step 4: For the i-th neuron in the hidden layer, set the bias to be a random number between –wij and wij. (3.58) (3.59)
Backpropagation Learning Algorithm (cont.) • Network configuration and ability of the network to generalize • The configuration of the MLP NN由下列所決定: • Hidden layer的數量、每個hidden layer的神經元數量、神經元所採用的activation function。 • Network performance的影響不在於activation function的形式,而在於hidden layer的數量、每個hidden layer的神經元數量。 • hidden layer神經元數量是以trial and error的方式決定。 • MLP NN一般被設計成有2層hidden layer。 • 一般而言,較多的Hidden layer神經元數量,可保證good network performance,但是一個”over-designed”架構,會造成”over-fit” ,而喪失網路的generalization。
Backpropagation Learning Algorithm (cont.) • Example 3.2 • 此網路具有一個hidden layer,50個神經元。 • 要訓練一個非線性方程式 • (a)在[0,4]之間,每0.2取樣一點,共21點。 • (b)在[0,4]之間,每0.01取樣一點,共400點。(造成overfitting)
Backpropagation Learning Algorithm (cont.) • Independent validation • 使用training data來評估網路最後的performance quality,會造成overfitting。 • 可使用independent validation來避免此問題。 • 將可用的data分成training set和testing set。 • 一開始先將data randomize, • 接著,再將資料分成兩部分:training set用來update網路的權重。Testing set用來評估training的效能
Backpropagation Learning Algorithm (cont.) • Speed of convergence • The convergence properties depend on the magnitude of the learning arte. • 為保證網路能夠收斂,且避免訓練過程的震盪,learning rate必須設成相當小的值。 • 若是網路的訓練起始點遠離global minimum,會造成許多神經元的飽和,使得梯度變化變小,甚至卡在error surface的局部最小值,造成收斂速度的減慢。 • 快速演算法可分成兩大類: • Consists of various heuristic improvements to the standard BP algorithm • Involves use of standard numerical optimization techniques
Backpropagation Learning Algorithm (cont.) • Backpropagation Learning Algorithm with Momentum Updating • To update the weights in the direction which is a linear combination of the current gradient of the instantaneous error surface and the one obtained in the previous step of the training. • The weights are updated according toor Momentum term (3.61) (3.62)
Backpropagation Learning Algorithm (cont.) • This type of learning improves convergence • If the training patterns contain some element of uncertainly, then updating with momentum provides a sort of low-pass filtering by preventing rapid changes in the direction of the weight updates. • Render the training relatively immune to the presence of outliers or erroneous training pairs. • Increase the rate of weight change, and the speed of convergence is increased.
Backpropagation Learning Algorithm (cont.) • 前述的幾點可表示成下列的update equation • 如果網路是操作在error surface的平坦區域,則gradient將不會改變,因此(3.63)可改寫成 • 由於forgetting factor總是小於1,所以effective learning rate可定成 (3.63) (3.64) (3.65)
Backpropagation Learning Algorithm (cont.) • Batch Updating • 標準的BP假設權重的更新是對每一個input/output training pair。 • Batch-updating 則是累積許多的training pattern才來更新權重值。(可視為將許多個別I/O pair的修正量平均後再修正權重) 。 • Batch-updating具有下列優點: • Gives a much better estimate of the error surface. • Provides some inherent low-pass filtering of the training pattern. • Suitable for more sophisticated optimization procedures.
Backpropagation Learning Algorithm (cont.) • Search-Then-Converge Method • A kind of heuristic strategy for speeding up BP • Search phase: • The network is relatively far from the global minimum. The learning rate is kept sufficiently large and relatively constant. • Converge phase • The network is approaching the global minimum. The learning rate is decreased at each iteration.
Backpropagation Learning Algorithm (cont.) • 因為實際上無法知道網路距離global minimum有多遠,因此可以下列兩式來估計: • 基本上,1<c/m0<100, 100<k0<500 • 當k<<k0,learning rate近似於m0。(search phase) • 當k>>k0,learning rate以1/k到1/k2的比例減少(converage phase) (3.66) (3.67)
Backpropagation Learning Algorithm (cont.) • Batch Updating with Variable Learning Rate • A simple heuristic strategy to increase the convergence speed of BP with batch update. • To increase the magnitude of the learning rate if the learning in the previous step has decreased the total error function. • If the error function has increased, the learning rate need to be decreased.
Backpropagation Learning Algorithm (cont.) • The algorithm can be summarized as • If the error function over the entire training set has decreased, increase the learning rate by multiplying it by a number h>1 (typically h=1.05) • If the error function has increased more than some set percentage x, decrease the learning rate by multiplying it by a number c<1 (typically c=0.7) • If the error function is increased less than the percentage x, the learning rate remains unchanged. • Apply the variable learning rate to batch updating can significantly speed up the convergence. • The algorithm easily be trapped in a local minimum. • 可設定最小learning rate mmin