Learning Control Knowledge for Planning

Learning Control Knowledge for Planning Yi-Cheng Huang

Outline I. Brief overview of planning II. Planning with Control knowledge III. Learning control knowledge IV. Conclusion

I. Overview of Planning • Planning - a very general framework for many applications: • Robot control; • Airline scheduling; • Hubble space telescope control. • Planning – find a sequence of actions that leads from an initial state to a goal state.

Planning Is Difficult –Abundance of Negative Complexity Results • Domain-independent planning: PSPACE-complete or worse (Chapman 1987; Bylander 1991; Backstrom 1993). • Domain-dependent planning: NP-complete or worse (Chenoweth 1991; Gupta and Nau 1992). • Approximate planning: NP-complete or worse(Selman 1994).

Recent State-of-the-art Planners • Constraint-based Planners –Graphplan, Blackbox. • Heuristic Search Planners –HSP, FF. • Both kinds of planners can solve problems in seconds or minutes that traditional planners take hours or days.

Graphplan(Blum & Furst, 1995) Facts Actions Facts ... ... ... Time i Time i+1 Search on planning graph to find plan

Blackbox(Kautz & Selman, 1999) problem Satisfiability Tester( Chaff ,WalkSat, Satz, RelSat, ...) plan

Heuristic Search Based Planning(Bonet & Geffner, ‘97) • Use various heuristic functions to approximate the distance from the current state to the goal state based on the planning graph. • Use Best-First Search or A* search to find plans.

II. Planning With Control • General focus on planning:avoid search as much as possible. • Many real-world applications are tailored and simplified by domain specific knowledge. • TLPlan is an efficient planner when usingcontrol knowledgeto guild a forward-chaining search planner (Bacchus & Kabanza 2000) .

TLPlan Temporal Logic Control Formula

A Simple Control Rule Example (goal (at(obj loc)) at(obj loc)) Temporal logic operator: “always” “next” Goal Do NOT move an object at the goal location

Question: Whether the same level of control can be effectively incorporated into constraint-based planner?

Control Rules Categories • Rules involves only static information. • Rules depends on the current state. • Rules depends on the current state and require dynamic user-defined predicates.

Category I Control Rules(only depends on goal; toy example) a Goal a a L Do NOTunload an package from an airplane if the current location is not in the package’s goal

Pruning the Planning GraphCategory I Rules Facts Actions Facts ... ...

Effect of Graph Pruning

Category II Control Rules a L Do NOTmove an airplane if there is an object in the airplane that needs to be unloaded at that location.

Control by Adding Constraints Temporal Logic Control Rules Planning Formula Constraints Clauses

Rules Without Compact Encoding a Goal b a SFO b DC NYC ORL Do NOT move a vehicle unless (a) there is an object that needs to be picked up (b) there is an object in the vehicle that needs to be unloaded

Complex Encoding for Category III Rules • Need to define extra predicates: need_to_move_by_airplane; need_to_unload_by_airplane • Introduce extra literals and clauses. O(mn) ground literals; O(mn+km^2) clauses at each time step. m: #cities, n: #objects, k: #airports • No easy encoding for category III rules. • However, it appears category I & II rules do most of work.

Blackbox with Control Knowledge(Logistics domain with hand-coded rules) Note: Logarithmic time scale

Comparison of Blackbox and TLPlan(Run Time)

Comparison of Blackbox and TLPlan(parallel plan length; “plan quality”)

Summary Adding Control Knowledge • We have shown how to add declarative control knowledge to a constraint-based planners by using temporal logic statements. • Adding such knowledge gives significant speedups (up to two orders of magnitude). • Pure heuristic search with control can be still faster but with much lower plan quality.

III. Can we learn domain knowledge from example plans?

Motivation • Control Rules used in TLPlan and Blackbox arehand-coded. • Idea: learn control rules on a sequence of small problems solved by planner.

Learning System Framework Problem Blackbox Planner Plan Justification / Type Inference ILP Learning Module / Verification Control Rules

Target Concepts for Actions • Action Select Rule:indicate conditions under which the action can be performed immediately. • Action Reject Rule: indicate conditions under which it must not be performed.

Basic Assumption on Learning Control • Plan found by planner on simple problems are optimal or near-optimal. • Actions appear in an optimal plan must beselected. • Actions that can be executed but do not appear in the plan must berejected.

Definition • Real action: action appears in the plan. • Virtual action: action that its preconditions are hold but does not appear in the plan.

An Toy Planning Example Goal Initial Initial a b a b BOS NYC SFO

Real&VirtualActions for UnloadAirplane Time 1: LoadAirplane (P a BOS) Time 2: FlyAirplane (P SFO NYC) UnloadAirplane (P a BOS) Time 3: LoadAirplane (P b NYC) UnloadAirplane (P a NYC) Time 4: FlyAirplane (P NYC SFO) UnloadAirplane (P a NYC) UnloadAirplane (P b NYC) Time 5: UnloadAirplane (P a SFO) UnloadAirplane (P b SFO) Virtual Real

Heuristics for Extracting Examples

Rule Induction • Based on Quinlan’s FOIL (Quinlan 1990; 1996). Literal: • Xi = Xj , ex., loc1 = loc2 • P(X1,…, Xn), ex., at (pkg, loc) • goal (P(X1,…, Xn)), ex., goal (at (pkg, loc)) • negation of the above

Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt)

Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc))

Reject Rule: UnloadAirplane UnloadAirplane (pln pkg apt) goal(at (pkg loc)) ^(apt != loc)

Learning Time

Logistics Domain

Learned Logistics Control Rules If an object’s goal location is at different city, do NOT unload the object from airplanes. Unload an object from a truck if the current location is an airport and it is not in the same city as the package’s goal location.

Briefcase Domain

Grid Domain

Gripper Domain

Mystery Domain

Tireworld Domain

Summary of Learning for Planning • Introduced inductive logic programming methodology into constraint-based planning framework to obtain “trainableplanner”. • Demonstrated clear practical speedups on range of benchmark problems.

IV. Single-agent vs. Multi-agentplanning • Observations: heuristic planners degrade rapidly in multi-agent settings. They tend to assign all work to a single agent. • We studied this phenomenon by exploring different work-loaddistributions.

Force the Planners • There is no easy way to modify the heuristic search planners to find better quality plans. • Limit the number of feature actions an agent can performtoforcethe planners to find plans with the same level of participation of all agents.

Sokoban Domain

Restricted Sokoban Domain

Learning Control Knowledge for Planning

Learning Control Knowledge for Planning

Presentation Transcript

Leading for Learning ----- eStrategic Planning

Planning for Foundation Learning

Learning control knowledge and case-based planning

Knowledge maps for e-learning

Learning Procedural Planning Knowledge in Complex Environments

Airmanship Knowledge Learning Outcome 1 Air Traffic Control

Planning for Learning

Planning for excellent learning

Planning and Monitoring for Learning

Knowledge Formulation for AI Planning

Budgeting for Planning and Control

Planning for WEED CONTROL

Framework for CSO Control Planning

Knowledge Engineering for Automated Planning

Lesson Planning Planning for Learning

Knowledge Engineering for Planning Domain Design

Planning and Monitoring for Learning

Knowledge Engineering for Automated Planning

Planning for Learning

Framework for CSO Control Planning

Airmanship Knowledge Learning Outcome 1 Air Traffic Control