230 likes | 465 Views
Fuzzy Inference System Learning By Reinforcement. Presented by Alp Sardağ. A Comparison of Fuzzy & Classical Controllers. Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. Rules close to natural language
E N D
Fuzzy Inference System Learning By Reinforcement Presented by Alp Sardağ
A Comparison of Fuzzy & Classical Controllers • Fuzzy Controller: Expert systems based on if-then rules where premises and conclusions are expressed by means of linguistic terms. • Rules close to natural language • A priori knowledge • Classical Controller: Need analytical task model.
Design Problem of FC • A priori knowledge extraction is not easy: • Disagreement between experts • Great number of variables necessary to solve the control task
Self Tunning FIS • A direct teacher: based on input-output set of trainning data. • A distal teacher: does not give the correct actions, but the desired effect on the process. • A performance measure: EA • A critic: gives rewards and punishment with respect to state reached by the learner. RL methods. • There are no more than two fuzzy sets activated for an input value
Goal • To overcome the limitations of classical reinforcement learning methods, ”discrete state perception and discrete actions”. NOTE: In this paper MISO FIS is used.
A MIMO FIS FIS is made of N rules of the following form: Ri: ith rule of the rule base Si:input variables Lij: linguistic term of input variable; its membership function Lij YNO:output variables Oij: linguistic term of output variable
Rule Preconditions • Membership functions are triangles and trapezoids (altough not differentiable). • because they are simple • Sufficient in a number of application • Strong fuzzy partition used: • All values activate at least one fuzzy set, the input universe is completely covered.
Rule Conclusions • Each of i rule has No corresponding conclusions: • For Each Rule the truth value with respect to S is computed with: where T norm is implemented by a product: • The FIS outputs are
Learning • Number and positions of the input fuzzy labels being set using a priori knowledge. • Structural Learning: consists in tuning the number of rules. • FACL and FQL learning: are reinforcement learning methods that deal with only the conclusion part.
Reinforcement Learning NOTE: state observability is total.
Markovian Decision Problem • S a finite discrete state • U a finite discrete action • R primary reinforcements R:SxUR • P transition probabilities P:SxUxS [0,1]. • State evaluation function:
The Curse of Dimensionality • Some form of generalization must be incorporated in state representation. Various function approximators used: • CMAC • Neural Networks • FIS: the state space encoding is based on a vector corresponding to the current state.
Adaptive Heuristic Critic • AHC is made of two components: • Adaptive Critic Element: Critic developed in an adaptive way from primary reinforcements, represent an evaluation function more informative than the one given by the environment through rewards and punishment (V(S) values). • Associative Search Element: selects actions which lead to better critic values
The Critic At time step t, the critic value is computed with conclusion vector: TD error is given by: TD-learning update rule:
The Actor • When the rule Ri is activated, one of the Ri local action is elected to participate in the global action, based on its quality. The global action triggered: where -greedy is a function implementing mixed exploration-exploitation strategy.
Tunning vector w • TD error, the improvement measure except in the beginning is a good approximator of the optimal evaluation function. The actor learning rule:
Meta Learning Rule • Update strategie for learning rate: • Every parameter should have its learning rate. (=1n) • Every learning rate should be allowed to vary over time. (in order V values to converge) • When the derivative of a parameter have the same sign for several consecutive time steps, its learning rate should be increased. • When the parameter derivative sign alternates for several consecutive time steps, its learning rate should be decreased. Delta-Bar-Delta rule:
Execution Procedure • Estimation of evaluation function corresponding to the current state. • Computation of the TD error. • Tunning of parameter vector v and w. • Estimation of the new evaluation function for the current state with new conclusion vector vt+1. • Learning rate updating with Delta-Bar-Delta rule. • For each activated rule, election of the local action: computation and triggering of the global action Ut+1.
Example Cont. • The number of rules is twenty five. • For the sake of simplicity, the discerete actions available are the same for all rules. • The discerete action set: • The reinforcement function:
Results • Performance measure for distance: • Results: