80 likes | 324 Views
Learning Paradigms and General Aspects of Learning. Different Forms of Learning: Learning agent receives feedback with respect to its actions (e.g. from a teacher) Supervised Learning : feedback is received with respect to all possible actions of the agent
E N D
Learning Paradigms and General Aspects of Learning • Different Forms of Learning: • Learning agent receives feedback with respect to its actions (e.g. from a teacher) • Supervised Learning: feedback is received with respect to all possible actions of the agent • Reinforcement Learning: feedback is only received with respect to the taken action of the agent • Unsupervised Learning: Learning when there is no hint at all about the correct action • Inductive Learning is a form of supervised learning that centers on learning a function based on sets of training examples. Popular inductive learning techniques include decision trees, neural networks, nearest neighbor approaches, discriminant analysis, and regression. • The performance of an inductive learning system is usually evaluated using n-fold cross-validation.
Classifier Systems • According to Goldberg [113], a classifier system is “a machine learning system that learns syntactically simple string rules to guide its performance in an arbitrary environment”. • A classifier system consists of three main components: • Rule and message system • Apportionment of credit system • Genetic Algorithm (for evolving classifers) • First implemented in a system called CS1 by Holland/Reitman(1978). • Example of classifer rules: 00##:0000 00#0:1100 11##:1000 ##00:0001 • Fitness of a classifier is defined by its surrounding environments that pays payoff to classifiers and extract fees from classifiers. • Classifier systems employ a Michigan approach (populations consist of single rules) in the context of an externally defined fitness function.
Challenges in Developing Michigan-style Classifier Systems Example: • We need a set of rules that can solve problems collaboratively --- comparable to find a good soccer or baseball team • We want to have a set of rules that cover all the important situations, and not a set of rules that can only handle very specialized situations --- coverage is an important issue! • Delayed rewards pose particular problems • Only rules responsible for a chosen decision should be reward/penalized • ‘Lazy’ and inactive rules need to be removed • Rules whose reward behavior is predictable or preferable over rules whose reward behavior is harder to predict; prediction error and experience in XCS r2 r1 Reward +5 r3 r4 r5 r6
Bucket Brigade Algorithm • Developed by Holland for the apportionment of credits that relies on the model of a service economy, consisting of two main components: auction and a clearing house. • The environment as well as the classifiers post messages. • Each classifier maintains a bank account that measures its strength. Classifiers that match a posted string, make a bid proportial to their strength. Usually, the highest bidding classifier is selected to post its message (other, more parallel schemes are also used) • The auction permits appropriate classifiers to post their messages. Once a classifier is selected for activation, it must clear its payments through a clearing house paying its bid to other classifiers or the environment for matching messages rendered. A matched and activated classifier sends its bid to those classifiers responsible for sending messages that matched the bidding classifiers conditions. The sent bid-money is distributed in some manner between those classifiers.
Bucket Bridgade (continued) • Rules that cooperate with a classifier are rewarded by receiving the classifiers bid, the last classifier in a chain receives the environmental reward, all the other classifiers receive the reward from their predecessor. • A classifier’s strength might be subject to taxation. The idea that underlies taxation is to punish inactive classifiers: Ti(t):=ctaxSi(t) • The strength of a classifier is updated using the following equation: Si(t+1)= Si(t) - Pi(t) - Ti(t) + Ri(t) • A classifier bids proportional to its strength: Bi=cbidSi • Genetic algorithms are used to evolve classifiers. A classifiers strength defines its fitness, fitter classifiers reproduce with higher probability (e.g. roulette wheel might be employed) and binary string mutation and crossover operators are used to generate new classifiers. Newly generated classifiers replace weak, low strength classifier (other schemes such as crowding could also be employed).
Pittburgh-style Systems • Populations consist of rule-sets, and not of individual rules. • No bucket brigade algorithms is necessary. • Mechanisms to evaluate individual rules are usually missing. • Michigan-style systems are geared towards applications with dynamically changing requirements (“models of adaptation”); Pitt-style systems rely on more static environments assuming a fixed fitness function for rule-sets that are not necessary in the Michigan approach. • Pittsburgh approach systems usually have to cope with variable length chromosomes. • Popular Pittsburgh-style systems include: • Smith’s LS-1-system (learns symbolic rule-sets) • Janikov’s GIL system (learns symbolic rules; employs operators of Michalski’s inductive learning theory as its genetic operators) • Giordana&Saita’s REGAL(learns symbolic concept descriptions) • DELVAUX (learns (numerical) Bayesian rule-sets)
New Trends in Learning Classifier Systems (LCS) • Holland-style LCS work is very similar to work in reinforcement learning, especially Evolutionary Reinforcement Learning and an approach called “Q-Learning”. Newer paper claim that “bucket brigade” and “Q-Learning” are basically the same thing, and that LCS can benefit from recent advances in the area of Q-learning. • Wilson accuracy-based XCS has received significant attention in the literature (to be covered later) • Holland stresses the adaptive component of “his invention” in his newer work. • Recently, many Pittsburgh-style systems have been designed that learn rule-based systems using evolutionary computing which are quite different from Holland’s data-driven message passing systems such as: • Systems that learn Bayesian Rules or Bayesian Belief Networks • Systems that learn fuzzy rules • Systems that learn first order logic rules • Systems that learn PROLOG style programs • Work somewhat similar to classifier systems has become quite popular in field of agent-based systems that have to learn how to communicate and collaborate in a distributed environment.
Important Parameters for XCS XCS learns/maintains the following parameters for all its classifiers during the course of its operation: • p is the expected payoff; has a strong influence (combined with the rule’s fitness value) if a matching classifier’s action is selected for execution. • e is the error made in predicting the payoffs • F (called fitness) denotes a classifiers “normalized accuracy” --- accuracy is the inverse of the degree of error made by a classifier; F combined with a determines which classifiers are chosen to be deleted from the population. Fp determines which actions of competing classifiers are selected for execution. • a determines the average size of action-sets this classifier belonged to; the smaller a/F is the less likely it becomes that this classifier is deleted. • exp (experience) counts how often the classifier the classifier belonged to the action set; has some influence on the prediction of other parameters --- namely, if exp is low default parameters are used when predicting the other parameter (especially, for e, F and a) • Moreover, it is important to know that only classifiers belonging to the action set are considered for reproduction.