180 likes | 282 Views
NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT
E N D
Decision making and learning • Agents have a controller (decision tree, DQT) • Input: situation (as perceived = seen/heard/interpr’d • Output: action • Decision making = using DQT • Learning = modifying DQT • Decisions also depend on inheritable “attitude genes” (learned through evolution)
B 0.5 0.5 VISUAL: FRONT FOOD REACHABLE BAG: FOOD T T NO YES YES NO A A A 1.0 0.6 0.2 0.2 EAT MOVE TURN LEFT TURN RIGHT A 1.0 0.6 0.2 0.2 PICKUP MOVE TURN LEFT TURN RIGHT Legend B Bias T Test A Action Decision Genetic bias Boolean choice 0.2 YES Example of a DQT
Interaction evolution & individual learning • Bias node with n children each with bias bi • Bias ≠ probability • Bias bi is learned, changing (name: learned bias) • Genetic bias gi is inherited, part of genome, constant • Actual probability of choosing child x: p(b,g) = b + (1 - b) ∙ g • Learned and inherited behaviour are linked through formula
DQT nodes & parameters cont’d • Test node language: native concepts + emerging concepts • Native: see_agent, see_mother, see_food, have_food, see_mate, … • New concepts can emerge by categorisation (discrimination game)
Learning: the heart of the emergence engine • Evolutionary learning: • not within an agent (not during lifetime), over generations • by variation + selection • Individual learning: • within one agent, during lifetime • by reinforcement learning • Social learning: • during lifetime, in interacting agents • by sending/receiving + adopting knowledge pieces
Types of learning: properties • Evolutionary learning: • Agent does not create new knowledge during lifetime • Basic DQTree + genetic biases are inheritable • “knowledge creator” = crossover and mutation • Individual learning: • Agent does create new knowledge during lifetime • DQTree + learned biases are modified • “knowledge creator” = reinforcement learning (driven by rewards) • Individually learnt knowledge dies with its host agent • Social learning: • Agent imports knowledge already created elsewhere (new? not new?) • Adoption of imported knowledge ≈ crossover • Importing knowledge pieces • can save effort for recipient • can create novel combinations • Exporting knowledge helps its preservation after death of host
Present status of types of learning • Evolutionary learning: • Demonstrated in 2 NT scenarios • Autonomous selection/reproduction causes problems with population stability (im/explosion) • Individual learning: • code, but never demonstrated in NT scenarios • Social learning: • Under construction/design based on the “telepathy” approach • Communication protocols + adoption mechanisms needed
Evolution: variation operators • Operators for DQT: • Crossover = subtree swap • Mutation = • Substitute subtree with random sub-tree • Change concepts in test nodes • Change bias on an edge • Operators for attitude genes: • Crossover = full arithmetic xover • Mutation = • Add Gaussian noise • Replace with random value
Evolution: selection operators • Mate selection: • Mate action chosen by DQT • Propose – accept proposal • Adulthood OK • Survivor selection: • Dead if too old ( ≥ 80 years) • Dead if zero energy
Experiment: Simple worldSetup: Environment • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): in upper left corner • Initial distribution of food (10000): 5000 in upper left and lower right corner.
Experiment: Simple worldSetup: Agents • Native knowledge (concepts and DQT sub trees) • Navigating (random walk) • Eating (identify, pickup and eat plants) • Mating (identify mates, propose/agree) • Random DQT-tree branches • Differs per agent • Based on the “pool” of native concepts
Experiment: Simple world Simulation continued for 3 months real time to test stability
Experiment: Poisonous FoodSetup: Environment • Two types of food: poisonous (decreases energy) and edible (increases energy) • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): uniform random over the grid space. • Initial distribution of food (10000): 5000 of each type of food uniform random over the same grid space as the agents.
Experiment: Poisonous FoodSetup: Agent • Native knowledge • Identical to simple world experiment • Additional native knowledge • Can distinguish poisonous from edible plants • Relation with eating/picking up is not present • No random DQT-tree branches
Experiment: Poisonous FoodMeasures • Population size • Welfare (energy) • Number of poisonous and edible plants • Complexity of controller (nr. of nodes) • Age