Emergent Representations and Reasoning in Adaptive Agents

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Emergent Representations and Reasoning in Adaptive Agents Joost Broekens, Doug DeGroot Leiden University, LIACS, Leiden. {broekens, degroot}@liacs.nl

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Overview • Introduction • Interactivism • Hypothesis • Computational Model based on Interactivist Concepts • Experiments • Results • Conclusion • Questions?

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Introduction • Adaptive Agents: • Flexible models of the world. (continuous online learning). • Efficient memory retrieval. • Efficient relevant reasoning context (how to select relevant information from a large collection of beliefs) • How to represent knowledge? • What is reasoning?

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Interactivism (1/3) • Interactivism (Bickhard) proposes: • Coupling of (properties of) situations and actions possible in that situation: Interaction Potential (IP) • IP concept as primitive for representations. • Potential Interactions are prepared by prior interactions.  An IP is conditional on prior interactions • Example: brush • IPs are organized in a hierarchical web-like fashion. • Parts of this web remain invariant under many other interactions • Example: brush • IPs stabilize and destabilize based on correct prediction/preparation

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands brush Put away Put away brush dry got home shower work Interactivism (2/3) brush/desk brush/desk Time

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Interactivism (3/3) • Interactivism and Reasoning. • Model-learning: (de)stabilization of IPs through continuous interaction with the world constructs representations of the world. • Representations have implicit content (certain properties of a situation a allows for x,y interactions, making a different from situation b lacking these properties). • Truth value (I tried an interaction x, but y happened, so it was not x). • Task-learning: preference between at least two interactions based on bias. • Reinforcement signal. • So: an IP has (at least) two properties: stability and expected return.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Hypothesis • Reasoning and Decision making are emergent properties of interactivist representational systems. • Create a computational model strictly based on interactivist assumptions. • Create a task that needs a decision by the agent. • Minimal reasoning: • “any observable behavior that reflects a beneficial decision between at least two possibilities that is neither explicable due to chance, nor without representations”.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Computational model (1/3) • Basis: hierarchical directed graph. • The agent’s actions and stimuli from the world are assumed to be the same kind of information. • Nodes represent interactions. • Nodes can be active (used) or prepared (hypothesized). • Primary nodes: stimulus (action or stimulus from the world). • Secondary nodes: interaction potentials. • Hierarchy of secondary nodes: IP hierarchy.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands D (D-1)-D a D-1 1-D D 1 D c ((D-1)-D)-2 (D-1)-D (1-D)-2 D-1 D-1 1-D D-2 D 1 D 1 D 2 d b Computational Model (2/3) • Example (1, 2 = location in a maze, d = down): • Model is empty at startup. • a: agent goes down, and builds node for “down” • b: agent arrives at location 1, and builds interaction • c: agent goes down, and builds interaction. • d: agent arrives at location 2, and builds interactions

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Computational Model (3/3) • Model learning and task-learning: exposure (continuous interaction) and reinforcement. • Exposure (local): • Build conditional probabilistic model of the environment, but only adapt locally: count activations of IPs. • If usage of IP is lower than arbitrary threshold, throw away node. • Reinforcement (local): • Update active IPs with current reinforcement signal. • Propagate reinforcement through IP hierarchy based on local probabilities of the environment, only use prepared IPs. • Biased selection: • Propose action based on WTA selection of proposed interactions.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (1/3) • Model learning: does the agent learn an adaptive model of the environment? • Test for reuse of old information in new situation (a, b,c,d, e). • Test for quick adaptation to a new maze (a, b, e). • Maze setup: c a b c d e Black: agentRed: lava (Rf=-1)Yellow: food (Rf=1)

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (2/3) • Selection task (simple reasoning): Is the agent able to make a beneficial informed decision. • Chose between two options, choice can be made only if there is knowledge (representation) about the other option (informed choice). (d, b, f) • Test for convergence in a randomly changing situation (g). • Maze setup: g d b f Black: agentRed: lava (Rf=-1)Yellow: food (Rf=1)

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Experiments (3/3) • Ran experiments for different maze setups : • 30 runs per setup. • In every run the agent has 100 trials to find the food. • Max 1000 steps per trial. • Plotted average learning curves of the trails over the 30 runs.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (1/3) • Agent learns adaptive model of the environment and reuses information:

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (2/3) • Agent learns to make a beneficial decision at the crossing.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Results (3/3) • A representation of a potential food location is learned: • The agent is able to try one location, and if the food is not there, try a second one. • This means the agent has a stable representation of “food is not here”. • Representation: content (food), truth value (food not here). • The ability to make an informed choice indeed emerges from an Interactivism based model: • The agent learns what a crossing is and how to use it: • The concept of a crossing is not introduced in the model. • The agent chooses a different action the second time it arrives at the crossing only if food has not been found earlier (informed choice).

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Conclusion • Interactivist based models are useful for the computational investigation of knowledge representation and reasoning in agents. • Representations and reasoning can indeed emerge from a computational model based on interactivist assumptions when used in an agent that continuously interacts with the environment. • Future work: • literature search into machine learning mechanisms • “imagination”. • Neuronal implementation.

Joost Broekens, Doug DeGroot, LIACS, University of Leiden, The Netherlands Questions?

Emergent Representations and Reasoning in Adaptive Agents