240 likes | 344 Views
Self-organizing Learning Array based Value System — SOLAR-V. Yinyin Liu EE690 Ohio University Spring 2005. Outline. What is Value System Basics of SOLAR Least Square Method Value Learning in SOLAR Application How to use in SOLAR-V Project. Value System.
E N D
Self-organizing Learning Array based Value System—SOLAR-V Yinyin Liu EE690 Ohio University Spring 2005
Outline • What is Value System • Basics of SOLAR • Least Square Method • Value Learning in SOLAR • Application • How to use in SOLAR-V Project
Value System • In value system, terms such as 'good', 'bad', 'better', and 'worse' are quantified. • Reinforcement learning • Data analysis • Model interpreting and decision making • Value prediction • With the right value system, one will make good short term decisions and have good long term results.
r r r . . . . . . t +1 t +2 s s t +3 s s t +1 t +2 t +3 a a a a t t +1 t +2 t t +3 Value System • Reinforcement Learning (RL) • a computational approach to learning • An agent tries to maximize the total reward when interacting with complex uncertain environment • Future expected reward—value
θ value Trial 1 r=1 X-initial X over limit 0 x1, x2, x3, x4, a Value System • Value functions in RL • Functions of state-action pair: how good it is to perform a given action in a given state • Value functions can be estimated from experience
Basics of SOLAR N input samples • Training data: • SOLAR have N inputs and reads the information in parallel from M feature vectors. • Prewiring procedure. • SOLAR is a feed forward structure. • Interconnections and neuron operations are dynamic based on data during the interaction with environment. M
Basics of SOLAR • An Deviation-based Selection (DBS)—determine a proper operation and inputs for each neuron. • Each neuron is a value estimator. Final value approximation is the global voting from all the neurons
Least Square Method • Least Square Method: least square fit to obtain least sum of squared errors between the data and approximation • Function—linear combination of k basis functions • Wis a set of weights—needs to be found out • Projection to the space spanned by the basis function • Easy to implement and debug—quantify the importance of each basis feature, engineer the features for better performance.
Least Square Method • Signal-to-noise Ratio (SNR)-controlled LSF • How many basis functions should be considered? • Polynomials as basis functions—up to which order? • Information may be corrupted by noise—over-fitting should be avoided • we need to determine when the difference between the approximated and the measured data has characteristics of the noise signal.
Least Square Method • SNR-controlled LSF • approximation error signal e(x)=s(x)-a(x) • Determine the signal to noise ratio of e(x) by using signal correlation • This S/N of the Gaussian noise is a random variable and its statistics can be directly estimated • If this means that, most likely, not all the information was extracted from sampled data. In such case, increase basis function order by 1
Least Square Method • SNR-controlled LSF
Least Square Method • Weighted LSF • knowledge is accumulated through the learning process, should relate more to recent information • Recent data weights more in the learning • Apply weights to data: exponentially declining going back from the most recent data
Least Square Method • Weighted LSF
Least Square Method • Weighted LSF • error signal has to be weighted as well. • The SNR of Gaussian noise • Comparison
value Trial 1 r=1 X-initial X over limit 0 x1, x2, x3, x4, a Value Learning • Training data: • Neuron initial wiring
Value Learning • Non-linear Scaling:
Value Learning • Deviation-based Selection in neurons • Inputs: 1 or 2 inputs • Operation: ident, half, exp, log, add, sub • Self-organized structure
Value Learning • Value information is stored distributed in neurons. • Value approximation is collecting the approximation result from all the neurons.
Value Learning • Information fusion: use information from many sources. • Value approximation: global voting
Value Learning • Learning performance
X(t) X(t) Action Network Value Network J(t) u(t) X(t+1) System Application • On-line Learning control by Reinforcement and Association
Application • Financial data analysis Data:103 features from 52 companies. Value: 52 gain values given by the one-year gain on investment. • Prediction Prediction future gain based on current features.
SOLAR-V Project M x N matrix: features • Prepare the data • Data sample along the row: N samples • Features along the column: M features • Given values in a row vector: N values • Save “features” and “values” in a training MAT file • Save “features_test” and “values_test” in testing MAT file • How to recall the function • Run “solar_v_main.m” • Input MAT file name and number of layers in command window. • Input if you think any featureis more significant and like to repeat how many times. • In the function, data will be scaled to 0~255, values are kept unchanged • Function will determine how many neurons per layer, you can decide how many layers • Several figures will be generated • Prewired network structure • Self-organized network structure • Learning performance compared with Neural Network • Testing results from SOLAR-V compared with Neural Network 1 x N vector: values
SOLAR-V Project • Example: data from on-line control model 0.0074 0.0082 0.0090 0.0115 0.0156 0.0213 0.0287 0.0377 0.0483… -0.1366 -0.1548 -0.1739 -0.2220 -0.2995 -0.4067 -0.5443 -0.7131 -0.9139… 0.1312 0.0014 0.1318 0.2622 0.3928 0.5234 0.6542 0.7851 0.9160… -0.1582 -0.0232 -0.1679 -0.3137 -0.4613 -0.6116 -0.7653 -0.9230 -1.0854… -0.0074 0.0259 0.0143 0.0268 0.1006 0.2359 0.4482 0.6783 0.8436… 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8…. • 5 neurons per layer, 3 layers 4 states 1 action Data Value