560 likes | 663 Views
DARPA ITO/MARS Program. Control and Coordination of Multiple Autonomous Robots Vijay Kumar GRASP Laboratory University of Pennsylvania http://www.cis.upenn.edu/mars. Motivation. We are interested in coordinated control of robots manipulation vision-based control Large number of modes
E N D
DARPA ITO/MARS Program • Control and Coordination of Multiple Autonomous Robots • Vijay Kumar • GRASP Laboratory • University of Pennsylvania • http://www.cis.upenn.edu/mars
Motivation • We are interested in coordinated control of robots • manipulation • vision-based control • Large number of modes • Scalability • Individual modes (behaviors) are • well understood, but the interaction between them is not. • Software design: modes - bottom up, protocols - top down
Learning Algorithms Analysis MARS CHARON Code (High level language) CHARON to Java Translator Java Libraries Drivers Java Code Control Code Generator Simulator Code Generator Human Interface
Outline of the Talk • Language and software architecture • CHARON • agents and modes • examples • Reactive control algorithms • mode switching • hierarchical composition of reactive control algorithms • results • From reactive to deliberative schemes • Simulation • Reinforcement learning • learn mode switching and composition rules • Future work
Participants • Rajeev Alur • Aveek Das • Joel Esposito • Rafael Fierro • Radu Grosu • Greg Grudic • Yerang Hur • Vijay Kumar • Insup Lee • Ben Southall • John Spletzer • Camillo Taylor • Lyle Ungar
Architectural Hierarchy in CHARON Agent Agent1 Agent2 sensor sensor processor processor actuator actuator Input Port Output Port Each agent can be represented as a parallel composition of sub-agents
Each agent consists of modes or behaviors Modes can in turn consist of sub modes Behavioral Hierarchy in CHARON Modes main awayTarget atTarget sensing control Entry Port Exit Port
CHARON • Individual components described as agents • Composition, instantiation, and hiding • Individual behaviors described as modes • Encapsulation, instantiation, and Scoping • Support for concurrency • Shared variables as well as message passing • Support for discrete and continuous behavior • Well-defined formal semantics
Pursue Motion Controller Avoid Obstacle Collision Recovery Range Mapper Robot Position Estimator Target Detector Edge Detector Collision Detector Color Blob Finder Actuators Frame Grabber Reactive Behaviors based on Vision
Robot Agent robotController = rmapper || cdetector || explore rmapper = rangeMapper() [rangeMap = rM]; cdetector = collisionDetector() [collisionDetected = cD]; explore = obstacleAvoider() [collisionDetected, rangeMap = cD, rM]; agent explore(){ mode top = exploreTopMode() } agent rangeMapper(){ mode top = rangeMapperTopMode() } agent collisionDetector(){ mode top = collisionDetectorTopMode() }
Collision Recovery mode collisionRecoveryMode(real recoveryDuration, real c) { entry enPt; exit exPt; readWrite diff analog real x; readWrite diff analog real phi; readWrite diff analog real recoveryTimer; diffEqn dRecovery { d(x) = -c; d(phi) = 0; d(recoveryTimer) = 1.0 } inv invRecovery { 0.0 <= recoveryTimer && recoveryTimer <= recoveryDuration } } // end of mode collisionRecoveryMode
Obstacle Avoidance mode obAvoidanceMode(){ entry enPt; exit exPt; read discrete bool collisionDetected; read RangeMap rangeMap; readWrite diff analog real x; readWrite diff analog real phi; diffEqn dObAvoidance {d(x) = computeSpeed(rangeMap); d(phi) = computeAngle(rangeMap)} inv invObAvoidance {collisionDetected = false} initTrans from obAvoidanceMode to obAvoidanceMode when true do{x = 0.0; phi = 0.0} }
Explore mode exploreTopMode() { entry enPt; read discrete bool collisionDetected; read RangeMap rangeMap; private diff analog real recoveryTimer; mode obAvoidance = obAvoidanceMode() mode collisionRecovery = collisionRecoveryMode(recoveryDuration, c) initTrans from obstacleAvoiderTopMod to obAvoidance when true do {recoveryDuration = 10.0; c = 1.0} // initialization trans OaToCr from obAvoidance.exPt to collisionRecovery.enPt when (collisionDetected == true) do {} trans CrToOa from collisionRecovery.exPt to obAvoidance.enPt when (recoveryTimer == recoveryDuration) do {recoveryTimer = 0.0} // reset the timer }
. . . phi = -k2 phi = 0 . x=k1r . x=-c recoveryTimer = 1 Explore Explore rangeMap collisionDetected collisionRecovery dRecovery obAvoidance dObAvoidance collision timeOut
Mode switching Multiple levels of abstraction of data from sensor Parallel composition of software agents obstacle avoidance wall following Vision Based Control with Mobile Robots
pos = target local diff analogtimer r2Est1 Robot1 dTimer r2Est2 . pos.x = v * cos(phi) pos.y = v * sin(phi) . r1Est1 r1Est2 timer/updateFreq = 0 . pos timer = 1 omega = k * (theta – phi) Multiagent Control awTarget dPlan iAway atTarget dStop iAt arrive moving dSteer aOmega iFreq sense sensing dStop iConst arrive move
Modular Simulation • Goal • Simulation is efficient and accurate • Integration of modes at different time scales • Integration of agents at different time scales • Modes are simulated using local information • Submodes are regarded as black-boxes • Submodes are simulated independently of other ones • Agents are simulated using local information • Agents are regarded as black-boxes • Agents are simulated independently of other ones
. . . y x z Time Round of a Mode (Agent) 1. Get integration time d and invariants from the supermode (or the scheduler). d, xInv 2. While (time t = 0; t <= d) do: dt, yInv - Simplify all invariants. - Predict integration step dt based on dand the invariants. - Execute time round of the active submode and get state s and time elapsed e. e, sz - Integrate for time e and get new state s. sy t+e, - Return s and t+e if invariants were violated. - Increment t = t+e. 3. Return s and d
time t+dt d e t Agents A1 A3 A2 Modular Simulation - Global execution • 1. Pick up the agents with minimum and second • minimum reached time. • 2. Compute the time round intervald for • the minimum agent, i.e. A2, such that its • absolute time may exceed with at most dt • the time reached by the second minimum • agent, i.e. A1. • 3. The time round may end before the time • intervald was consumed if the invariants • of A2 were violated. Then, an actual time • increment would be e. • 4. The agent A2 executes an update round to • synchronize the discrete variables with the • analog ones. • 5. The state of A2 get visible to other agents.
x1 x2 ratio of largest to smallest step size constant x3 time step size coupling Modular Multi-rate Simulation Use a different time step for each component to exploit multiple time scales, to increasing efficiency. • “Slowest-first” order of integration • Coupling is accommodated by using interpolants for slow variables • Tight error bound: O( hm+1 )
Synthesis of controllers include models of uncertainty Sensor fusion include models of noise Modular simulation Automatic detection of events mode switching transitions, guards Simulation and Analysis of Hierarchical Systems with Mode Switching
NASREM Architecture [Albus, 80] Implementations Demo III NASA robotic systems Traditional Model of Hierarchy
dynamics output input output dynamics Event detection Given: g(x) x(t) We re-parameterize time by controlling the integration step size: Event ! Using feedback linearization we select our “speed” (step-size) along the integral curves to converge to the event surface
. y = 2u x1 < a x2 = -1 Env u Hyst Hysteresis strMinus dY iStrM aStrM . x1 = u inc dX1 s2u up dY iUp aUp dec inc -a a+2 u2p 1 dec dX1 strPlus dY iStrP aStrP -1 -(a+2) a
-a a+2 1 -1 -(a+2) a Global versus Modular Simulation • Hysteresis example • 2 levels of hierarchy • global state is two dimensional • Significant potential for more complex systems
Current Implementation Status CHARON Specification • Work to date • CHARON semantics • Parser for CHARON • Internal representation • Type checker • Current work • Modular simulation scheme • Internal representation generator CHARON Parser Type Checker Syntax Tree Internal Representation Generator Internal Representation Control Code Generator Simulator Generator Model Checker
Reactive to Deliberative Schemes • Reactive scheme is a composition of • go to target • collision avoidance • Deliberative scheme • preplanned path around the nominal model • Reactive schemes • robust • easy to implement • may be limited in terms of being able to accomplish complex tasks • may not compare favorably to recursive implementations of deliberative controllers Nominal Model Obstacle
Toward a composition of reactive and deliberative decision making • u1 - vector field specified by a reactive planner • u2 - vector field specified by a deliberative planner • If u1 Î U, u2 Î U, then • au1 + (1- a) u2Î U
Composition of reactive and deliberative planners • Framework for decision making • U is the set of available control policies • Y is the uncertainty set • uncertainty in the environment model • uncertainty in dynamics • uncertainty in localization • Best decision under the worst uncertainty
Results • Minimization • weighting prior information and current information • resolving the discrepancy between prior and current plans • Closure property of “basis behaviors” ensures robustness • Requires a priori calculation of roadmap Worst Case Outcome Better than Worst-Case Outcomes
cross-section x cross-section Detailed Analysis • Global saddle point does not exist due to non-smooth solution Min-Max Max-Min Cost Function
More Results Open Loop Recursive Best under Worst Case Uncertainty Open Loop
Obstacle dynamics are known, exact inputs are not Deliberative and Reactive Behaviors in a Dynamic Setting obstacle robot target
Paradigm for Learning • Hierarchical structure allows learning at several levels • Lowest level • parameter estimation within each mode • algorithms for extracting the information state (features, position and velocity, high level descriptors) • Intermediate level • select best mode for a situation • determine the best partitioning of states for a given information state • Advanced level • transfer knowledge (programs, behaviors) between robots and human • Learning at any level forces changes at others Information state Situation partitions Modes Sensory information Action space
Successful RL (Kaelbling, et al 96) Low dimensional discrete state space 100,000’s training runs necessary Stochastic search required Robotics Systems Large, continuous state space A large number of training runs (e.g, 100, 000) may not be practical Stochastic search not desirable Reinforcement Learning and Robotics
Boundary Localized Reinforcement Learning • Our approach to robot control • Noisy state space • Deterministic modes • Our approach to Reinforcement Learning • Search only mode boundaries • Ignore most of the state space • Minimize stochastic search • RL using no stochastic search during learning
Mode Switching Controllers Mode of operation (action ai executed) Parameterization of boundary Mode Boundaries State Space
Reinforcement Learning for Mode Switching Controllers Initial Guess (prior knowledge) R Reinforcement Feedback “Optimal” parameterization
Reinforcement Learning • Markov Decision Process • Policy • Reinforcement Feedback (environment): rt • Goal: modify policy to maximize performance • Policy Gradient Formulation
Why Policy Gradient RL? • Computation linear in the number of parameters q • avoids blow-up from discretization as with other RL methods • Generalization in state space is implicitly defined by the parametric representation • generalization is important for high dimensional problems
Key Result #1 • Any q parameterized probabilistic policy can be transformed into a approximately deterministic policy parameterized by q • Deterministic everywhere except near mode boundaries.
Key Result #2 • Convergence to a locally optimal mode switching policies is obtained by searching near mode boundaries • All other regions of the state space can be ignored • This significantly reduces the search space
Stochastic Search Localized to Mode Boundaries Stochastic Search Regions State Space
Key Result #3 • Reinforcement learning can be applied to locally optimizing deterministic mode switching policies without using stochastic search if • robot takes small steps • value of executing actions (Q) is smooth w.r.t. state • These conditions are met almost everywhere in typical robot applications
Deterministic Search at Mode Boundaries Search Regions State Space
Simulation • State Space: Robot Position • Boundary Definitions: Gaussian centers and widths • 2 parameters per Gaussian for each dimension. • 20 parameters. • 2 types of modes: Toward a goal, away from an obstacle. • Reward: +1 for reaching a goal, -1 for hitting an obstacle.