DARPA ITO/MARS Program

DARPA ITO/MARS Program • Control and Coordination of Multiple Autonomous Robots • Vijay Kumar • GRASP Laboratory • University of Pennsylvania • http://www.cis.upenn.edu/mars

Motivation • We are interested in coordinated control of robots • manipulation • vision-based control • Large number of modes • Scalability • Individual modes (behaviors) are • well understood, but the interaction between them is not. • Software design: modes - bottom up, protocols - top down

Learning Algorithms Analysis MARS CHARON Code (High level language) CHARON to Java Translator Java Libraries Drivers Java Code Control Code Generator Simulator Code Generator Human Interface

Outline of the Talk • Language and software architecture • CHARON • agents and modes • examples • Reactive control algorithms • mode switching • hierarchical composition of reactive control algorithms • results • From reactive to deliberative schemes • Simulation • Reinforcement learning • learn mode switching and composition rules • Future work

Participants • Rajeev Alur • Aveek Das • Joel Esposito • Rafael Fierro • Radu Grosu • Greg Grudic • Yerang Hur • Vijay Kumar • Insup Lee • Ben Southall • John Spletzer • Camillo Taylor • Lyle Ungar

Architectural Hierarchy in CHARON Agent Agent1 Agent2 sensor sensor processor processor actuator actuator Input Port Output Port Each agent can be represented as a parallel composition of sub-agents

Each agent consists of modes or behaviors Modes can in turn consist of sub modes Behavioral Hierarchy in CHARON Modes main awayTarget atTarget sensing control Entry Port Exit Port

CHARON • Individual components described as agents • Composition, instantiation, and hiding • Individual behaviors described as modes • Encapsulation, instantiation, and Scoping • Support for concurrency • Shared variables as well as message passing • Support for discrete and continuous behavior • Well-defined formal semantics

Pursue Motion Controller Avoid Obstacle Collision Recovery Range Mapper Robot Position Estimator Target Detector Edge Detector Collision Detector Color Blob Finder Actuators Frame Grabber Reactive Behaviors based on Vision

Robot Agent robotController = rmapper || cdetector || explore rmapper = rangeMapper() [rangeMap = rM]; cdetector = collisionDetector() [collisionDetected = cD]; explore = obstacleAvoider() [collisionDetected, rangeMap = cD, rM]; agent explore(){ mode top = exploreTopMode() } agent rangeMapper(){ mode top = rangeMapperTopMode() } agent collisionDetector(){ mode top = collisionDetectorTopMode() }

Collision Recovery mode collisionRecoveryMode(real recoveryDuration, real c) { entry enPt; exit exPt; readWrite diff analog real x; readWrite diff analog real phi; readWrite diff analog real recoveryTimer; diffEqn dRecovery { d(x) = -c; d(phi) = 0; d(recoveryTimer) = 1.0 } inv invRecovery { 0.0 <= recoveryTimer && recoveryTimer <= recoveryDuration } } // end of mode collisionRecoveryMode

Obstacle Avoidance mode obAvoidanceMode(){ entry enPt; exit exPt; read discrete bool collisionDetected; read RangeMap rangeMap; readWrite diff analog real x; readWrite diff analog real phi; diffEqn dObAvoidance {d(x) = computeSpeed(rangeMap); d(phi) = computeAngle(rangeMap)} inv invObAvoidance {collisionDetected = false} initTrans from obAvoidanceMode to obAvoidanceMode when true do{x = 0.0; phi = 0.0} }

Explore mode exploreTopMode() { entry enPt; read discrete bool collisionDetected; read RangeMap rangeMap; private diff analog real recoveryTimer; mode obAvoidance = obAvoidanceMode() mode collisionRecovery = collisionRecoveryMode(recoveryDuration, c) initTrans from obstacleAvoiderTopMod to obAvoidance when true do {recoveryDuration = 10.0; c = 1.0} // initialization trans OaToCr from obAvoidance.exPt to collisionRecovery.enPt when (collisionDetected == true) do {} trans CrToOa from collisionRecovery.exPt to obAvoidance.enPt when (recoveryTimer == recoveryDuration) do {recoveryTimer = 0.0} // reset the timer }

. . . phi = -k2 phi = 0 . x=k1r . x=-c recoveryTimer = 1 Explore Explore rangeMap collisionDetected collisionRecovery dRecovery obAvoidance dObAvoidance collision timeOut

Mode switching Multiple levels of abstraction of data from sensor Parallel composition of software agents obstacle avoidance wall following Vision Based Control with Mobile Robots

Explore: Wall Following with Obstacles

Explore, Search, and Pursue

pos = target local diff analogtimer r2Est1 Robot1 dTimer r2Est2 . pos.x = v * cos(phi) pos.y = v * sin(phi) . r1Est1 r1Est2 timer/updateFreq = 0 . pos timer = 1 omega = k * (theta – phi) Multiagent Control awTarget dPlan iAway atTarget dStop iAt arrive moving dSteer aOmega iFreq sense sensing dStop iConst arrive move

Multiagent Control

Modular Simulation • Goal • Simulation is efficient and accurate • Integration of modes at different time scales • Integration of agents at different time scales • Modes are simulated using local information • Submodes are regarded as black-boxes • Submodes are simulated independently of other ones • Agents are simulated using local information • Agents are regarded as black-boxes • Agents are simulated independently of other ones

. . . y x z Time Round of a Mode (Agent) 1. Get integration time d and invariants from the supermode (or the scheduler). d, xInv 2. While (time t = 0; t <= d) do: dt, yInv - Simplify all invariants. - Predict integration step dt based on dand the invariants. - Execute time round of the active submode and get state s and time elapsed e. e, sz - Integrate for time e and get new state s. sy t+e, - Return s and t+e if invariants were violated. - Increment t = t+e. 3. Return s and d

time t+dt d e t Agents A1 A3 A2 Modular Simulation - Global execution • 1. Pick up the agents with minimum and second • minimum reached time. • 2. Compute the time round intervald for • the minimum agent, i.e. A2, such that its • absolute time may exceed with at most dt • the time reached by the second minimum • agent, i.e. A1. • 3. The time round may end before the time • intervald was consumed if the invariants • of A2 were violated. Then, an actual time • increment would be e. • 4. The agent A2 executes an update round to • synchronize the discrete variables with the • analog ones. • 5. The state of A2 get visible to other agents.

x1 x2 ratio of largest to smallest step size constant x3 time step size coupling Modular Multi-rate Simulation Use a different time step for each component to exploit multiple time scales, to increasing efficiency. • “Slowest-first” order of integration • Coupling is accommodated by using interpolants for slow variables • Tight error bound: O( hm+1 )

Synthesis of controllers include models of uncertainty Sensor fusion include models of noise Modular simulation Automatic detection of events mode switching transitions, guards Simulation and Analysis of Hierarchical Systems with Mode Switching

NASREM Architecture [Albus, 80] Implementations Demo III NASA robotic systems Traditional Model of Hierarchy

dynamics output input output dynamics Event detection Given: g(x) x(t) We re-parameterize time by controlling the integration step size: Event ! Using feedback linearization we select our “speed” (step-size) along the integral curves to converge to the event surface

. y = 2u x1 < a x2 = -1 Env u Hyst Hysteresis strMinus dY iStrM aStrM . x1 = u inc dX1 s2u up dY iUp aUp dec inc -a a+2 u2p 1 dec dX1 strPlus dY iStrP aStrP -1 -(a+2) a

-a a+2 1 -1 -(a+2) a Global versus Modular Simulation • Hysteresis example • 2 levels of hierarchy • global state is two dimensional • Significant potential for more complex systems

Modular Simulation Error

Current Implementation Status CHARON Specification • Work to date • CHARON semantics • Parser for CHARON • Internal representation • Type checker • Current work • Modular simulation scheme • Internal representation generator CHARON Parser Type Checker Syntax Tree Internal Representation Generator Internal Representation Control Code Generator Simulator Generator Model Checker

Reactive to Deliberative Schemes • Reactive scheme is a composition of • go to target • collision avoidance • Deliberative scheme • preplanned path around the nominal model • Reactive schemes • robust • easy to implement • may be limited in terms of being able to accomplish complex tasks • may not compare favorably to recursive implementations of deliberative controllers Nominal Model Obstacle

Toward a composition of reactive and deliberative decision making • u1 - vector field specified by a reactive planner • u2 - vector field specified by a deliberative planner • If u1 Î U, u2 Î U, then • au1 + (1- a) u2Î U

Composition of reactive and deliberative planners • Framework for decision making • U is the set of available control policies • Y is the uncertainty set • uncertainty in the environment model • uncertainty in dynamics • uncertainty in localization • Best decision under the worst uncertainty

Results • Minimization • weighting prior information and current information • resolving the discrepancy between prior and current plans • Closure property of “basis behaviors” ensures robustness • Requires a priori calculation of roadmap Worst Case Outcome Better than Worst-Case Outcomes

 cross-section x cross-section Detailed Analysis • Global saddle point does not exist due to non-smooth solution Min-Max Max-Min Cost Function

More Results Open Loop Recursive Best under Worst Case Uncertainty Open Loop

Obstacle dynamics are known, exact inputs are not Deliberative and Reactive Behaviors in a Dynamic Setting obstacle robot target

Paradigm for Learning • Hierarchical structure allows learning at several levels • Lowest level • parameter estimation within each mode • algorithms for extracting the information state (features, position and velocity, high level descriptors) • Intermediate level • select best mode for a situation • determine the best partitioning of states for a given information state • Advanced level • transfer knowledge (programs, behaviors) between robots and human • Learning at any level forces changes at others Information state Situation partitions Modes Sensory information Action space

Successful RL (Kaelbling, et al 96) Low dimensional discrete state space 100,000’s training runs necessary Stochastic search required Robotics Systems Large, continuous state space A large number of training runs (e.g, 100, 000) may not be practical Stochastic search not desirable Reinforcement Learning and Robotics

Boundary Localized Reinforcement Learning • Our approach to robot control • Noisy state space • Deterministic modes • Our approach to Reinforcement Learning • Search only mode boundaries • Ignore most of the state space • Minimize stochastic search • RL using no stochastic search during learning

Mode Switching Controllers Mode of operation (action ai executed) Parameterization of boundary Mode Boundaries State Space

Reinforcement Learning for Mode Switching Controllers Initial Guess (prior knowledge) R Reinforcement Feedback “Optimal” parameterization

Reinforcement Learning • Markov Decision Process • Policy • Reinforcement Feedback (environment): rt • Goal: modify policy to maximize performance • Policy Gradient Formulation

Why Policy Gradient RL? • Computation linear in the number of parameters q • avoids blow-up from discretization as with other RL methods • Generalization in state space is implicitly defined by the parametric representation • generalization is important for high dimensional problems

Key Result #1 • Any q parameterized probabilistic policy can be transformed into a approximately deterministic policy parameterized by q • Deterministic everywhere except near mode boundaries.

Key Result #2 • Convergence to a locally optimal mode switching policies is obtained by searching near mode boundaries • All other regions of the state space can be ignored • This significantly reduces the search space

Stochastic Search Localized to Mode Boundaries Stochastic Search Regions State Space

Key Result #3 • Reinforcement learning can be applied to locally optimizing deterministic mode switching policies without using stochastic search if • robot takes small steps • value of executing actions (Q) is smooth w.r.t. state • These conditions are met almost everywhere in typical robot applications

Deterministic Search at Mode Boundaries Search Regions State Space

Simulation • State Space: Robot Position • Boundary Definitions: Gaussian centers and widths • 2 parameters per Gaussian for each dimension. • 20 parameters. • 2 types of modes: Toward a goal, away from an obstacle. • Reward: +1 for reaching a goal, -1 for hitting an obstacle.

DARPA ITO/MARS Program

DARPA ITO/MARS Program

Presentation Transcript

Mars Exploration Program

DARPA Grand Challenge

Diminuatives using -ito

TOYO ITO

ITO

The DARPA

ITO

Darpa Network Challenge

Glass/ITO

Machine Translation at DARPA

DARPA: Ongoing Projects

Diminuatives using -ito

Mizuko Ito

Who’s Joi Ito?

Shoichi Ito Miyasaka Laboratory

DARPA

Mars Exploration Program Analysis Group “ MEPAG ”

DARPA ITO/MARS Project Update Vanderbilt University

DARPA Grand Challenge ‘05

Mars Exploration Program

DARPA

ITO