260 likes | 275 Views
Learning control knowledge and case-based planning. Jim Blythe, with additional slides from presentations by Manuela Veloso. Motivation. Planning is hard. PSpace-hard. BUT.. this is a worst-case result In many domains there may exist efficient strategies for planning
E N D
Learning control knowledgeand case-based planning Jim Blythe, with additional slides from presentations by Manuela Veloso
Motivation • Planning is hard. PSpace-hard. • BUT.. this is a worst-case result • In many domains there may exist efficient strategies for planning • May be able to derive them automatically from experience
Controlling search • Every planning algorithm does search • Given a choice point, if makes incorrect choice, needs to backtrack and try other choices • If we can make the right choice the first time…
Prodigy • Explicit search control rules can apply to any decision point • Many different learning approaches have been implemented • Relatively old planning approach
Review of explanation-based learning MV Inputs: • Target concept definition • Training example • Domain theory • Operationality criterion Output: • Generalization of the training example that is • Sufficient to describe the target concept, and • Satisfies the operationality criterion
The safe-to-stack example MV Input: • Target concept: safe-to-stack(x,y) • Training example: on(obj1, obj2) isa(obj1, box) isa(obj2, endtable) color(obj1, red) color(obj2, blue) volume(obj1, 1) density(obj1, 0.1), …
The safe-to-stack example, cont. MV Input: Domain theory: • Not(fragile(y)) or lighter(x, y) => safe-to-stack(x,y) • Volume(x,v) and density(x,d) => weight(x, v*d) • Weight(x1, w1) and weight(x2, w2) and less(w1, w2) => lighter(x1, x2) • Isa(x, endtable) => weight(x, 5) • Less(0.1, 5), … Operationality criterion: Learned description should use terms that describe objects directly, or are ‘easy’ to evaluate, e.g ‘less’
The safe-to-stack example MV • Explain why obj1 is safe-to-stack on obj2 • Construct a proof • Do goal regression: regress target concept through the proof structure • Proof isolates relevant features
Generating operational knowledge MV • Generalize proof • Sometimes, simply replace constants by variables • Prove that all identified relevant features are necessary in general • Output: volume(x,v1) and density(x,d1) and isa(y, endtable) and less(v1*d1, 5) => safe-to-stack(x,y)
Using EBL to improve plan quality • Given: planning domain, evaluation function planner’s plan, a better plan • Learn: control knowledge to produce the better plan • Explanation used: explain why the alternative plan is better • Target concept: control rules that make choices based on the planner state and meta-state
EBL in Prodigy • Used by Minton (88) to improve efficiency of planning • Version used in Quality (95) to improve quality of solution
Explaining better plans recursively:target concept: shared subgoal
Discussion • EBL is always correct, but Quality isn’t – only learns why plan B is better than plan A • No guarantee of optimality • Linear additive evaluation function – how well does this model metrics we care about? • Generality of control rules