580 likes | 717 Views
Learning Control Knowledge for Planning. Yi-Cheng Huang. Outline. I. Brief overview of planning II. Planning with Control knowledge III. Learning control knowledge IV. Conclusion. I. Overview of Planning. Planning - a very general framework for many applications: Robot control;
E N D
Learning Control Knowledge for Planning Yi-Cheng Huang
Outline I. Brief overview of planning II. Planning with Control knowledge III. Learning control knowledge IV. Conclusion
I. Overview of Planning • Planning - a very general framework for many applications: • Robot control; • Airline scheduling; • Hubble space telescope control. • Planning – find a sequence of actions that leads from an initial state to a goal state.
Planning Is Difficult –Abundance of Negative Complexity Results • Domain-independent planning: PSPACE-complete or worse (Chapman 1987; Bylander 1991; Backstrom 1993). • Domain-dependent planning: NP-complete or worse (Chenoweth 1991; Gupta and Nau 1992). • Approximate planning: NP-complete or worse(Selman 1994).
Recent State-of-the-art Planners • Constraint-based Planners –Graphplan, Blackbox. • Heuristic Search Planners –HSP, FF. • Both kinds of planners can solve problems in seconds or minutes that traditional planners take hours or days.
Graphplan(Blum & Furst, 1995) Facts Actions Facts ... ... ... Time i Time i+1 Search on planning graph to find plan
Blackbox(Kautz & Selman, 1999) problem Satisfiability Tester( Chaff ,WalkSat, Satz, RelSat, ...) plan
Heuristic Search Based Planning(Bonet & Geffner, ‘97) • Use various heuristic functions to approximate the distance from the current state to the goal state based on the planning graph. • Use Best-First Search or A* search to find plans.
II. Planning With Control • General focus on planning:avoid search as much as possible. • Many real-world applications are tailored and simplified by domain specific knowledge. • TLPlan is an efficient planner when usingcontrol knowledgeto guild a forward-chaining search planner (Bacchus & Kabanza 2000) .
TLPlan Temporal Logic Control Formula
A Simple Control Rule Example (goal (at(obj loc)) at(obj loc)) Temporal logic operator: “always” “next” Goal Do NOT move an object at the goal location
Question: Whether the same level of control can be effectively incorporated into constraint-based planner?
Control Rules Categories • Rules involves only static information. • Rules depends on the current state. • Rules depends on the current state and require dynamic user-defined predicates.
Category I Control Rules(only depends on goal; toy example) a Goal a a L Do NOTunload an package from an airplane if the current location is not in the package’s goal
Pruning the Planning GraphCategory I Rules Facts Actions Facts ... ...
Category II Control Rules a L Do NOTmove an airplane if there is an object in the airplane that needs to be unloaded at that location.
Control by Adding Constraints Temporal Logic Control Rules Planning Formula Constraints Clauses
Rules Without Compact Encoding a Goal b a SFO b DC NYC ORL Do NOT move a vehicle unless (a) there is an object that needs to be picked up (b) there is an object in the vehicle that needs to be unloaded
Complex Encoding for Category III Rules • Need to define extra predicates: need_to_move_by_airplane; need_to_unload_by_airplane • Introduce extra literals and clauses. O(mn) ground literals; O(mn+km^2) clauses at each time step. m: #cities, n: #objects, k: #airports • No easy encoding for category III rules. • However, it appears category I & II rules do most of work.
Blackbox with Control Knowledge(Logistics domain with hand-coded rules) Note: Logarithmic time scale
Comparison of Blackbox and TLPlan(parallel plan length; “plan quality”)
Summary Adding Control Knowledge • We have shown how to add declarative control knowledge to a constraint-based planners by using temporal logic statements. • Adding such knowledge gives significant speedups (up to two orders of magnitude). • Pure heuristic search with control can be still faster but with much lower plan quality.
Motivation • Control Rules used in TLPlan and Blackbox arehand-coded. • Idea: learn control rules on a sequence of small problems solved by planner.
Learning System Framework Problem Blackbox Planner Plan Justification / Type Inference ILP Learning Module / Verification Control Rules
Target Concepts for Actions • Action Select Rule:indicate conditions under which the action can be performed immediately. • Action Reject Rule: indicate conditions under which it must not be performed.
Basic Assumption on Learning Control • Plan found by planner on simple problems are optimal or near-optimal. • Actions appear in an optimal plan must beselected. • Actions that can be executed but do not appear in the plan must berejected.
Definition • Real action: action appears in the plan. • Virtual action: action that its preconditions are hold but does not appear in the plan.
An Toy Planning Example Goal Initial Initial a b a b BOS NYC SFO
Real&VirtualActions for UnloadAirplane Time 1: LoadAirplane (P a BOS) Time 2: FlyAirplane (P SFO NYC) UnloadAirplane (P a BOS) Time 3: LoadAirplane (P b NYC) UnloadAirplane (P a NYC) Time 4: FlyAirplane (P NYC SFO) UnloadAirplane (P a NYC) UnloadAirplane (P b NYC) Time 5: UnloadAirplane (P a SFO) UnloadAirplane (P b SFO) Virtual Real
Rule Induction • Based on Quinlan’s FOIL (Quinlan 1990; 1996). Literal: • Xi = Xj , ex., loc1 = loc2 • P(X1,…, Xn), ex., at (pkg, loc) • goal (P(X1,…, Xn)), ex., goal (at (pkg, loc)) • negation of the above
Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt)
Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc))
Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc)) ^(apt != loc)
Learned Logistics Control Rules If an object’s goal location is at different city, do NOT unload the object from airplanes. Unload an object from a truck if the current location is an airport and it is not in the same city as the package’s goal location.
Summary of Learning for Planning • Introduced inductive logic programming methodology into constraint-based planning framework to obtain “trainableplanner”. • Demonstrated clear practical speedups on range of benchmark problems.
IV. Single-agent vs. Multi-agentplanning • Observations: heuristic planners degrade rapidly in multi-agent settings. They tend to assign all work to a single agent. • We studied this phenomenon by exploring different work-loaddistributions.
Force the Planners • There is no easy way to modify the heuristic search planners to find better quality plans. • Limit the number of feature actions an agent can performtoforcethe planners to find plans with the same level of participation of all agents.