320 likes | 524 Views
Temporal Sequence Processing using Recurrent SOM. Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Timo Koskela Markus Varsta Jukka Heikkonen Kimmo Kaski. Outline. Motivation Objective Introduction Recurrent Self-Organizing Map
E N D
Temporal Sequence Processing using Recurrent SOM Advisor:Dr. Hsu Graduate:Ching-Lung Chen Author:Timo Koskela Markus Varsta Jukka Heikkonen Kimmo Kaski
Outline • Motivation • Objective • Introduction • Recurrent Self-Organizing Map • Temporal Kohonen Map • Modified TKM:RSOM • Case Studies • Conclusions • Personal opinion
Motivation • Temporal sequence processing (TSP) a popular research from weather forecasting to time series prediction, but few study in non-linear or unsupervised approach can handle large data. • Traditional way have some problems
Objective • Use and explain the recently approach of using neural networks in TSP such as TKM and RSOM.
Introduction (1/3) • Traditional way convert the temporal sequence into concatenated vector via a tapped delay line, and to feed the resulting vector as an input to a network. • It has a drawbacks, the most serious ones of being the difficulty to determine the proper length for the delay line.
Introduction (2/3) • Temporal Kohonen Map (TKM) is one unsupervised approach for TSP dervied from the kohonen’s Self-Organizing Map algorithm. • In TKM the involvement of the earlier input vectors in each unit is represented by using a recursive difference equation which defines the current unit active as a function of the previous activation and the current input vector.
Introduction (3/3) • RSOM defines a difference vector for each unit of the map which is used for selecting the best matching unit and also for adaptation of weights of the map. • Difference vector captures the magnitude and direction of the error in the weight vectors and allows learning temporal context.
Recurrent Self-Organizing Map(1/3) • Recurrent Self-Organizing Map (RSOM) that allows storing certain information from the past input vectors. The information is stored in the form of difference vectors in the map units. • The best matching unit (bmu) b to given input pattern x(n) is selected using somemetric based criterion, such as
Recurrent Self-Organizing Map(2/3) • During the learning phase the weights in the map are updated towards the given input pattern x(n) according to • ,learning rate • neighborhood function, typical choice Gaussian function
Recurrent Self-Organizing Map(3/3) • During this quantization stage the gain has to be sufficiently small to avoid losing the map order, how small exactly varies from case to case. • When over training we seek to minimize the sum squared distance E
Temporal Kohonen Map (1/4) • Temporal Kohonen Map (TKM) differs from the the SOM only in its outputs. • The output of the normal SOM is a typical winner take all strategy. • In the TKM the sharp outputs are replaced with leaky integrator outputs which, once activated, gradually lose their activity.
Temporal Kohonen Map (2/4) • The modeling of the outputs in the TKM is close to the behavior of natural neurons, which retain an electrical potential on their membranes with decay. • In the TKM this decay is modeled with the difference equation: • 0<d<1 can be viewed as a time constant • Vi(n) is the activation of the unit i at step n • The bum b is the unit with maximum activity
Temporal Kohonen Map (3/4) Eq.5 has the following general solution: Futher analysis of Eq. 6 , when wi is assumed constant, goes as follows:
Temporal Kohonen Map (4/4) • When w is optimal in the vector quantization sense the Eq.7 is zero as minimizes the sum in Eq.6 • The result shows how the optimal weight vectors in the vector quantization sense are linear combinations of the input patterns.
Some of the problems of the original TKM have a convenient solution in simply moving the leaky integrators from the unit outputs into the inputs. Moving the leaky integrators from the outputs into the inputs yields: For the temporally leaked difference vector at each map unit. Is the leaking coefficient analogous to d in the TKM, yi(n) is the leaked difference vector Modified TKM:RSOM (1/5)
Modified TKM:RSOM (2/5) • Schematic picture of an RSOM unit • Large a correspond to short memory while small values of a correspond to long memory and slow decay of activation. • Eq.9 can be written in a familiar form by replace in,yielding:
Modified TKM:RSOM (3/5) • An exponentially weighted linear IIR filter with the impulse response h(k) = a(1-a)k, k>=0 • Feedback quantity in RSOM the bmu b at step n is now searched by: • Repeating the mathematical analysis on RSOM earlier done with the TKM in Eqs.7 and 8 yields:
Modified TKM:RSOM (4/5) • The square of the norm of y(n) is • Optimizing the vector quantization crition given in general form in Eq.4 with respect to y yields :
Modified TKM:RSOM (5/5) • When w is optimal. The optimal w can be analytically solved : • From Eq.14 one immediately observes how the optimal w’s are linear combinations of the x’s • RSOM is trained with the y’s it seeks to minimize the quantization criterion • TKM seeks to minimize the normal vector quantization criterion
Case Studies • Synthetic data. • Use RSOM to cluster EEG data with “normal” or “containing epileptic activity” • Use local model with RSOM for time series prediction.
Synthetic data (1/3) • Both RSOM and TKM were trained with five one-dimensional input patterns,1,6,11,16,21 with additive approximately Gaussian noise. • Using the heuristic criterion we arrived at optimal a = 0.8 and accordingly optimal d = 0.2 • Map trained for 2000 iterations, with 25 units and
Synthetic data (3/3) • The TKM’s solution is more complicated. Basically has two mechanisms: • Learning temporal context from past … • Preserves contextual information is topological neighborhood.
Clustering of EEG Patterns (1/3) • The sampling rate of the EEG data used in the test was 200Hz. • For spectral feature extraction at time t a window Wt of 256 samples was collected from the original EEG sequence SEEG as follows: Wt(i)= SEEG(t-127+i), i = 0,….,255. • Use Daub4 and Daub12 wavelets and transform to g(t) continuous signal.
Clustering of EEG Patterns (2/3) • The extracted EEG features were clustered by four SOMs of 3x3 ,5x5 ,9x9 ,17x17 units. • The training data contained total of 150987 16dimensional feature vectors, among which 5430 pattern correspond to epileptic activity.
Time Series prediction (1/3) • Time series prediction with RSOM and local linear models. • The data is laser time series consists intensity of the intensity of an infrared laser. • First 2000 samples used for training , and rest 1000 samples used for testing
Time Series prediction (2/3) • Free parameters during training for RSOM include input vector length p, time step between consecutive input vectors s, number of units nu and leaking coefficient a, RSOM(p,s, nu ,a). • The local linear regression model using the same parameters, and build in MATLAB 5.
Time Series prediction (3/3) • The test was compare with RSOM、AR model and MLP model.
Conclusions (1/2) • RSOM provided better result when a simple prularity rule was used for classification. • Results in prediction case are not the best possible with RSOM, since the search space of the free parameters of the model was quite small. • An important property of RSOM is its visualization ability.
Conclusions (2/2) • Unsupervised learning of the temporal context is another attractive property of RSOM. • In this paper we presented RSOM that has the same feedback structure in all units. It is possible, however, to allow the units of RSOM to have different recurrent structures.
Personal Opinion • In this paper just descripted the TKM and RSOM’s spirit, not in detail . • Until now we still don’t know what the sequence image’s mean. • Maybe we should find the original paper to help us know what’s the mean.