210 likes | 328 Views
EC: Lecture 17: Classifier Systems. Ata Kaban University of Birmingham. Classifier Systems. Rule based systems with input and output interface. classifiers=rules: IF cond AND cond … THEN action. ternary strings (0,1,#). Classifier List. Input Interface. Message List. Output Interface.
E N D
EC: Lecture 17: Classifier Systems Ata Kaban University of Birmingham
Classifier Systems • Rule based systems with input and output interface classifiers=rules: IF cond AND cond … THEN action ternary strings (0,1,#) Classifier List Input Interface Message List Output Interface database of messages=facts=binary strings
The message list • Finite memory (db) of messages or facts • Messages are binary strings of fixed length • Can be visible or not to the external world • Messages can arrive from several modules • Input interface • Classifies • Messages can be deleted by Output interface (after processing them)
The classifier list • Finite size rule base • Classifier = production rule here • IF cond1 AND cond2 … AND condN THEN action • Represented on fixed nos of words of the form Cond1, cond2, … condN / action • Nos conditions fixed • Conditions can contain NOT • Conditions are fixed length = length of messages • Over the alphabet {0,1,#}, where ‘#’=‘don’t care’ • Conditions are matched against messages in the message list (in terms of Hamming distance)
Matching • Non-negated conditions are satisfied if there is at least a message in the message list whose Hamming distance from the conditions string is 0, disregarding ‘#’ characters • E.g. message list: a: 0101, b: 1010, c: 1111; condition: 0101: satisfied (by a) 1101: not satis #101: satis (by a) 1###: satis (by both b and c) ##00: not satis ####: satis (by a, b and c) • Conditions with many ‘#’ tend to be unspecific
Matching • Negated conditions are satisfied if no message in the message list matches them • E.g. message list: a: 0101, b: 1010, c: 1111 condition: ~0101: not satis (matched by a) ~1101: satis ~#101: not satis (a) ~1###: not satis (b,c) ~##00: satis ~####: not satis (a,b,c) • Negated conditions with many ‘#’s tend to be very specific (I.e. can be satisfied by few messages) • A cond with M hash symbols can be satisfied by 2^{L-M} messages, L being the message length
Actions • Strings of fixed length over {0,1,#} • When a message matches a classifier, the classifier(s) are activated & a message is built as follows: • 0’s and 1’s in the action string are copied in the message • #’s are substituted by the corresponding characters in the message that matches the first condition of the classifier.
E.g. actions • Message list: a: 0101, b: 1010, c: 1111 • Classifier list: i: #11#, ~#110 / 00## ii: ###1, ~#110 / ###0 iii: ##1#, ~1110 / 0##0 • The following set of messages will be produced: 0011 (posted by I, c matches cond1) 0100 (posted by ii, a matches cond1) 1110 (posted by ii, c matches cond1) 0010 (posted by iii, b matches cond1) 0110 (posted bt iii, c matches cond1)
Many classifiers can be activated in parallel! • Conflict resolution is only necessary if the active classifiers can produce more messages than entries in the message list
The Input Interface • Initially the db is empty • The input interface is a mechanism by which the CFS can obtain info about the environment, through messages • These messages are often description of the state set of (binary) detectors that can sense various features of the environment • E.g. indicate the position of a robot controlled by a CFS wrt some obstacles of environment
The Output Interface • The output interface is a mechanism of using (and deleting) some messages in the message list • usually represent (and control) the state of a set of effectors which act on the environment --- e.g. control the actions of a robot • Must distinguish between output messages and other messages! • Types of messages: • Input messages (posted by the input interface) • Internal messages (posted by classifiers to be later matched and processed by other classifiers) • Messages meant to be output messages • A tag can be used to distinguish between types of messages
E.g. • A CFS which has to control a robot might have some classifiers devoted to obstacle avoidance: 1####### / 00 0100 00 ##1##### / 00 0001 00 ####1### / 00 1000 00 ######1# / 00 0010 00 tag Conditions (any obstacles left, right, up down?) action
The CFS Main Cycle • The blocks indicated in the basic CFS diagram are activated according to the following main cycle: • Activate the Input Interface & post the input messages it generates to the message list • Perform the matching of all the conditions of all classifiers against the message list • Activate classifiers whose conditions are satisfied and add their messages to the message list • Activate the output interface, I.e. remove the output messages from the message list & perform the actions they describe • Repeat the previous steps
What is Missing? • It has been proved that in the presence of repetition of the main cycle, using ‘#’ characters and at least 2 conditions per classifier, a CFS is capable of arbitrarily complex computations and of representing structures like stacks, lists, etc. • CFSs can be programmed to do useful things • BUT: are quite difficult to program • Need to add adaptation mechanisms to the basic architecture! – so that they can learn to behave appropriately in the environment or to perform useful tasks.
The Need for Competition • Some classifiers in the system are better than others • i.e. the messages they produce lead to better actions • E.g. for an autonomous agent a good action is the action that leads to getting some food. • In its basic form, a CFS gives the same chances to all classifiers to post their messages • We would like to prioritize good classifiers • Classifiers should compete to post their messages! • Based on some measure of quality
Quality of classifiers • Usefulness, or strength: capability of determining a good performance of the whole system • Relevance to a particular situation, or specificity: (L – nos of ‘#’ symbols)/L (where L is the length of message) • Classifiers that match a particular message (or few messages only) are ‘specialists’ for that kind of situation --- these are preferred over classifiers that match many situations and therefore provide a kind of default behaviour for the system. • This measure is needed to ensure the competition
Quality of classifiers • Bid: strength and specificity combined: Bid = const *strength * specificity (const is usually cca. 1/10) • Notes: • To maintain paralelism, we must allow for more than one winner in the competition • To avoid premature convergence, we use probabilistic competition, with bid-proportionate winning probability • For a given classifier, specificity is a constant, so the strength can only be varied to influence the behaviour of a CFS.
How to adapt? • by Credit assignment • Info about the quality of the behaviour of the system comes from the environment in the form of rewards (e.g. +1,-1) • Need to decide which classifiers are responsible and to which extent for the good or bad overall behaviour The Bucket Brigade Algorithm • If there is a reward (pr punishment), add it to the strength of all classifiers active in the current major cycle • Make each active classifier pay its bid to the classifiers that prepared the stage for it (I.e. posted messages matched by its conditions) • In time, strength is propagated backwards and each classifier receives the correct share of credit for the good (or bad) behaviour of the system as a whole
How to adapt? • by Rule Discovery • Discovery of optimum classifiers by means of genetic algorithms • Gas can be used in 2 ways: • Pittsburg approach: considering the classifier list as a single individual, whose chromosome is obtained by concatenating the conditions and actions of all classifiers • Different sets of classifiers compete • Fitness of each CFS is determined by observing the syst for some time • Michigan approach: considering each classifier as a separate individual • Different classifiers compete or cooperate • Requires a fitness measure for each classifier – bucket brigade algorithm can be used
Main Cycle of Learning Classifier Systems • Read input messages from sensors • Find the classifiers that are activated & select those with highest importance (fitness) • Reduce the fitness of the matching classifiers • Clear the message list • Use the matching classifiers • Evaluate the resulting behaviour • Punish or reward the classifiers that were active (reinforcement learning) • Generate a new population of classifiers with an evolutionary algorithm • iterate
Summary • LCS • Rule based system • Evolving LCS • Pitt approach: every individual contains all rules • Michigan approach: every individual contains a single rule • Fitness based on accuracy of strength value