1 / 58

Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments

Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments. Lecture 3: Behavior Selection Gal A. Kaminka galk@cs.biu.ac.il. Previously, on Robots …. Multiple levels of control: Behaviors. Plan changes. Identify objects. Monitor Change. Map. Explore.

diella
Download Presentation

Introduction to Robots and Multi-Robot Systems Agents in Physical and Virtual Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Robots and Multi-Robot SystemsAgents in Physical and Virtual Environments Lecture 3: Behavior Selection Gal A. Kaminka galk@cs.biu.ac.il

  2. Previously, on Robots … Multiple levels of control: Behaviors Plan changes Identify objects Monitor Change Map Explore Wander Avoid Object

  3. Subsuming Layers • How to make sure overall output is coherent? • e.g., avoid object is in conflict with explore • Subsumption hierarchy: Higher levels modify lower Map Explore Wander Avoid Object

  4. This week, on Robots …. • Behavior Selection/Arbitration • Activation-based selection • winner-take-all selection • argmax selection (priority, utility, success likelihood, … ) • Behavior networks • Goal-oriented behavior-based control • Takes a direct aim at key weaknesses of reactive approach • Behavior hierarchies

  5. Behavior Selection (Arbitration) • One behavior takes over completely • All sensors, actions controlled by the behavior • Behaviors compete for control • Key questions: • How do we select the correct behavior? • When do we terminate the selected behavior?

  6. Maes’ Actions Selection Mechanism (MASM) Some key highlights: • Merges some planning with behavior-based control • Goal-oriented, allows predictions • Responsive, allows reactivity • “Speed vs. thought” trade-off • Lots of number-hacking • A later article addressed this issue with learning • However, complex environment may suffer from this

  7. Overall Structure Behavior Goal • Behaviors: preconditions, delete/add lists, activation • Activation links spread positive and negative activation Sensor Behavior Behavior Sensor Behavior Behavior Goal Behavior Sensor Behavior

  8. Behaviors • Similar to a fully-instantiated planning operator • No variables (i.e, pick-up-A, not pick-up(A) • Preconditions (what must be true to be executable) • Add/delete list (what changes once behavior executes) Behavior

  9. Connecting Behaviors Activation: • Sensors to behaviors with matching preconditions Behavior Sensor

  10. Connecting Behaviors Behavior Activation: • Sensors to behaviors with matching preconditions • Add lists to behaviors with matching preconditions Behavior Behavior Sensor

  11. Connecting Behaviors (Backward) Behavior Goal Activation: • Goals to behaviors with matching add lists • Behaviors to behaviors with matching add lists Behavior Behavior Sensor

  12. Connecting Behaviors (Backward) Behavior Goal Advantages: • Goal-orientedness (goal drives behaviors) • Reactivity (sensors drive behaviors) • Parameterized! Behavior Behavior Sensor

  13. Handling Conflicts Behavior Goal • Conflicting behaviors inhibit each other • This is a winner-take-all configuration Behavior Behavior Sensor Sensor Behavior Behavior Goal Behavior Sensor

  14. Winner Take All • A very basic structure in neural networks • Relies on recurrence • Key idea: Nodes compete by inhibiting each other • After some cycles, winner emerges • This is useful in many neural models of behavior

  15. Basic Structure • Each node excited by incoming information • Each node’s activation inhibits its competitors 1 + + - - 2 + + - - 3 + +

  16. First activation Darker == more activation (2 is most active, 1 least) 1 + + - - 2 + + - - 3 + +

  17. After a few cycles • 3 and 2 stronger than 1, so 1 quickly deactivates • 2 slightly stronger than 3, so 3 slowly deactivates 1 + + - - 2 + + - - 3 + +

  18. After a few more cycles • Once 1 is out of picture, only 2 and 3 compete • 2 becomes stronger: a weaker 3 inhibits 2 less 1 + + - - 2 + + - - 3 + +

  19. Until finally…. • Only output from 2 remains 1 + + - - 2 + + - - 3 + +

  20. Winner Take All • Output from winning node ends up being used • Typically, if over a threshold • Once node becomes active, never lets in any other • A basic problem. • Standard solutions: reset after some time, decay, … • This mechanism can be used to solve competition • Activation is key feature/requirement

  21. Running a behavior network Behavior Goal • Let activation spread for a while, wait for threshold • Once behavior over threshold, execute it • Reset activation after it’s done Behavior Behavior Sensor Sensor Behavior Behavior Goal Behavior Sensor

  22. Advantages • We’ve discussed planned vs. reactive behavior • Threshold value changes “speed vs. thought” • Larger threshold, more behaviors involved before selection • Small threshold, less likely to find optimal chain • This is not hybrid architecture—really something new!

  23. Criticisms Where will this fail? Succeed? What needs improvement? What does not? What tasks is it good for? As scientists, you must always ask yourself these questions

  24. Protected Goals • Sussman Anomaly: • Given: A on B, B on table, C on table • Do: A on B, B on C, C on table • No way to do this without undoing a subgoal • If one is not careful, might go into thrashing • Take off A, put A back, Take off A, …. • Maes added mechanism for protected goals • Not clear where protection comes from

  25. Other problems with MASM • No variables  Blow up in the number of behaviors • Thrashing: Behavior resets, then re-selected • Bug in activation algorithm: • Activation from goals is divided by number of goals • Thus a behavior satisfying more goals is not preferred • Additional minor issues like this found, corrected later • Tyrell 1993,1994, Dorer 1999, Blumberg 1994, …

  26. Reminder • We are talking about behavior selection • Multiple behaviors exist • Question is which one to choose • Behaviors compete for control of robot • Behavior networks have activation: • Goal priority “meets” sensor data (preconditions, effects) • Winner-take-all selection

  27. Activation-based selection • For each behavior, build an activation function • How useful it is (utility, value) • How urgent it is (priority) • How likely it is to succeed (likelihood of success) • How much it matches current state (applicability) • …. • Can of course combine these (e.g., utility X priority) • Select behavior with top activation • Let it run • Re-evaluate all activations

  28. Formal behavior selection • Behaviors are arranged in a DAG <B,E> • DAG: Directed Acyclic Graph • B set of behaviors (vertices) • E set of edges (a,b), where a, b in B. • The graph is structured hierarchically: • Single root behavior is most general • leaf behaviors correspond to primitive actions • A path from every behavior to at least one primitive behavior • children(b) = { all behaviors a, such that (b,a) is in E }

  29. Hierarchical behaviors • The root behavior is always active • An active behavior with no active child must select one • An active behavior can decide to deactivate itself WinGame Play Interrupt Attack-Center Zone Defense Attack Pincer Move Kick Pass Clear Turn

  30. argmax selection • At any given time, select behavior whose • priority • value • likelihood of success • applicability is greatest • No sequence of behaviors known in advance • Many instances of behaviors can co-exist, compete

  31. Formally …. • f(b) be a function which gives the behavior’s activation • Then the arbitration result is: argmaxc (f(c)), where c in children(b) • For instance, to choose by value, argmaxc (value(c)) • Or, to choose by priority, argmaxc (priority(c) • Or decision-theoretic choice, argmaxc (probability(c) * value(c))

  32. Subsumption as argmax selection • Subsumption level of behavior b, given by level(b) • Applicability of behavior b, given by app(b) --- 0 or 1 • Subsumption arbitration: argmaxb (app(b) * level(b)) Map Explore Wander Avoid Object

  33. Case Study: HandleBall Arbitrator (ChaMeleons’01) • HandleBall behavior triggered when player has ball • Must select between multiply-instantiated children: • shoot on goal, pass for shot, pass forward, dribble to goal • dribble forward, clear, pass to closer, …. • We defined a complex arbitrator combining: • priority, and • likelyhood of success

  34. HandleBall Example

  35. sensor value around threshold sensor value around threshold sensor value around threshold sensor value around threshold “Number-hacking”: Thrashing de-selection and re-selection of behaviors the time

  36. “Number-hacking”: Sensitivity • Sensitivity to specific values, ranges • Manually adjusting values by 0.1 to get a wanted result… • Where do the numbers come from? Learning? • e.g., programmer forgot a range of values? • e.g., programmer needs to extend range

  37. State-Based Selection • State-based selection • Look at world and internal state to make selection • Behaviors as operators? Almost. • Pre-conditions, termination-conditions • Selection control rules (non-numeric preferences, priorities) • Finite state machines and hierarchical machines

  38. State-Based Behavior Selection • Elements from reactive control, but with internal state • Quick response to sensor readings • Sensor-driven operation • Behaviors maintain internal state • e.g., previously-executed behaviors • e.g., previous sensor readings • …

  39. Behaviors as operators • Conditions: • Preconditions: When is it applicable? • Termination conditions: When is it done? • Conditions test sensors, internal state • Must maintain World Model • Can be simple (e.g., vector of sensor readings) • Or complex (e.g., internal variables, previous readings)

  40. State-Based Selection: Architecture Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  41. State-Based Selection: Architecture Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  42. State-Based Selection: Architecture Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  43. Conflicting Behaviors What if more than one behavior matches? Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  44. Preference Rules • Prefer one behavior over another • Provide “local guidance” • Do not consider all possible cases, nor global ranking • Test world model (which also records behaviors) Preference Rules Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  45. שאלות?

  46. What’s in a world model? Preference Rules Behavior Behavior Command Scheduling World Model (beliefs) Behavior Behavior

  47. What’s in a world model? • A vector of sensor readings • Distance front = 250 • Light Left = Detected • Battery = Medium level • A vector of virtual sensors • Distance front < 90 AND light front • Average front distances = 149.4 Simple Complex

  48. What’s in a world model? • A vector processed data • Estimated X, Y from detected landmarks • Seen purple blob at pixel 2,5 • Communication from teammate • A vector of world models • Position of opponent 2 seconds ago • My position 10 seconds ago Simple Complex

  49. Hierarchical Behaviors • Hierarchies allow designer to build reusable behaviors • At any given moment, a path is selected • All behaviors in the path are active • May issue action commands • Monitor sensors • This is different from a function call stack • What happens when behavior terminates?

  50. Case Study: ModSAF • Preference rules manage high-priority interrupts • Preconditions dictate ordering Execute Mission Fly Flight Plan Halt Wait-at-Point Join Scout Engage Fly Route Land Find Position NOE Low Contour Unmask Shoot

More Related