1 / 18

NEW TIES WP2 Agent and learning mechanisms

NEW TIES WP2 Agent and learning mechanisms. Decision making and learning. Agents have a controller (decision tree, DQT) Input: situation (as perceived = seen/heard/interpr’d Output: action Decision making = using DQT Learning = modifying DQT

jalia
Download Presentation

NEW TIES WP2 Agent and learning mechanisms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NEW TIES WP2 Agent and learning mechanisms

  2. Decision making and learning • Agents have a controller (decision tree, DQT) • Input: situation (as perceived = seen/heard/interpr’d • Output: action • Decision making = using DQT • Learning = modifying DQT • Decisions also depend on inheritable “attitude genes” (learned through evolution)

  3. B 0.5 0.5 VISUAL: FRONT FOOD REACHABLE BAG: FOOD T T NO YES YES NO A A A 1.0 0.6 0.2 0.2 EAT MOVE TURN LEFT TURN RIGHT A 1.0 0.6 0.2 0.2 PICKUP MOVE TURN LEFT TURN RIGHT Legend B Bias T Test A Action Decision Genetic bias Boolean choice 0.2 YES Example of a DQT

  4. Interaction evolution & individual learning • Bias node with n children each with bias bi • Bias ≠ probability • Bias bi is learned, changing (name: learned bias) • Genetic bias gi is inherited, part of genome, constant • Actual probability of choosing child x: p(b,g) = b + (1 - b) ∙ g • Learned and inherited behaviour are linked through formula

  5. DQT nodes & parameters cont’d • Test node language: native concepts + emerging concepts • Native: see_agent, see_mother, see_food, have_food, see_mate, … • New concepts can emerge by categorisation (discrimination game)

  6. Learning: the heart of the emergence engine • Evolutionary learning: • not within an agent (not during lifetime), over generations • by variation + selection • Individual learning: • within one agent, during lifetime • by reinforcement learning • Social learning: • during lifetime, in interacting agents • by sending/receiving + adopting knowledge pieces

  7. Types of learning: properties • Evolutionary learning: • Agent does not create new knowledge during lifetime • Basic DQTree + genetic biases are inheritable • “knowledge creator” = crossover and mutation • Individual learning: • Agent does create new knowledge during lifetime • DQTree + learned biases are modified • “knowledge creator” = reinforcement learning (driven by rewards) • Individually learnt knowledge dies with its host agent • Social learning: • Agent imports knowledge already created elsewhere (new? not new?) • Adoption of imported knowledge ≈ crossover • Importing knowledge pieces • can save effort for recipient • can create novel combinations • Exporting knowledge helps its preservation after death of host

  8. Present status of types of learning • Evolutionary learning: • Demonstrated in 2 NT scenarios • Autonomous selection/reproduction causes problems with population stability (im/explosion) • Individual learning: •  code, but never demonstrated in NT scenarios • Social learning: • Under construction/design based on the “telepathy” approach • Communication protocols + adoption mechanisms needed

  9. Evolution: variation operators • Operators for DQT: • Crossover = subtree swap • Mutation = • Substitute subtree with random sub-tree • Change concepts in test nodes • Change bias on an edge • Operators for attitude genes: • Crossover = full arithmetic xover • Mutation = • Add Gaussian noise • Replace with random value

  10. Evolution: selection operators • Mate selection: • Mate action chosen by DQT • Propose – accept proposal • Adulthood OK • Survivor selection: • Dead if too old ( ≥ 80 years) • Dead if zero energy

  11. Experiment: Simple worldSetup: Environment • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): in upper left corner • Initial distribution of food (10000): 5000 in upper left and lower right corner.

  12. Experiment: Simple worldSetup: Agents • Native knowledge (concepts and DQT sub trees) • Navigating (random walk) • Eating (identify, pickup and eat plants) • Mating (identify mates, propose/agree) • Random DQT-tree branches • Differs per agent • Based on the “pool” of native concepts

  13. Experiment: Simple world Simulation continued for 3 months real time to test stability

  14. Experiment: Poisonous FoodSetup: Environment • Two types of food: poisonous (decreases energy) and edible (increases energy) • World size: 200 x 200 grid cells • Agents and food (no tokens, roads, etc). Both are variable in number. • Initial distribution of agents (500): uniform random over the grid space. • Initial distribution of food (10000): 5000 of each type of food uniform random over the same grid space as the agents.

  15. Experiment: Poisonous FoodSetup: Agent • Native knowledge • Identical to simple world experiment • Additional native knowledge • Can distinguish poisonous from edible plants • Relation with eating/picking up is not present • No random DQT-tree branches

  16. Experiment: Poisonous FoodMeasures • Population size • Welfare (energy) • Number of poisonous and edible plants • Complexity of controller (nr. of nodes) • Age

  17. Experiment: Poisonous FoodDemo

  18. Experiment: Poisonous Food Results

More Related