430 likes | 563 Views
Cross-Domain Action-Model Acquisition for Planning viaWeb Search. Hankz Hankui Zhuo a , Qiang Yang b , Rong Pan a and Lei Li a a Sun Yat-sen University, China b Hong Kong University of Science & Technology, Hong Kong. Motivation.
E N D
Cross-Domain Action-Model Acquisition for Planning viaWeb Search Hankz Hankui Zhuoa, Qiang Yangb, Rong Pana and Lei Lia aSun Yat-sen University, China bHong Kong University of Science & Technology, Hong Kong
Motivation • There are many domains that share knowledge with each other, e.g.,
Motivation • There are many domains that share knowledge with each other, e.g., • “walking” in the driverlog domain http://www.superstock.com/stock-photos-images/1778R-4701
Motivation • There are many domains that share knowledge with each other, e.g., • “walking” in the driverlog domain • “navigating” in the rovers domain http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm
Motivation • There are many domains that share knowledge with each other, e.g., • “walking” in the driverlog domain • “navigating” in the rovers domain • “moving” in the elevator domain • etc… http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm http://www.venusengineers.com/goods-lift.html
Motivation • These actions in these domains all share the common knowledge about location change, thus, • it may be possible to “borrow” knowledge from each other. • specifically, next slide … http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm http://www.venusengineers.com/goods-lift.html
Motivation http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm walk(?d-driver ?l1-loc ?l2-loc) :precondition (and (at ?d ?l1) (path ?l1 ?l2)) :effect (and (not (at ?d ?l1)) (at ?d ?l2)))
Motivation http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm walk(?d-driver ?l1-loc ?l2-loc) :precondition (and (at ?d ?l1) (path ?l1 ?l2)) :effect (and (not (at ?d ?l1)) (at ?d ?l2))) guess? navigate(?d-rover ?x-waypoint ?y-waypoint) :precondition ?? :effect ??
Motivation http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm walk(?d-driver ?l1-loc ?l2-loc) :precondition (and (at ?d ?l1) (path ?l1 ?l2)) :effect (and (not (at ?d ?l1)) (at ?d ?l2))) guess? navigate(?d-rover ?x-waypoint ?y-waypoint) :precondition (at ?x ?y) (visible ?y ?z) … :effect (not (at ?x ?y)) (at ?x ?z)
Motivation http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm walk(?d-driver ?l1-loc ?l2-loc) :precondition (and (at ?d ?l1) (path ?l1 ?l2)) :effect (and (not (at ?d ?l1)) (at ?d ?l2))) guess? navigate(?d-rover ?x-waypoint ?y-waypoint) :precondition (at ?x ?y) (visible ?y ?z) … :effect (not (at ?x ?y)) (at ?x ?z)
Motivation http://www.superstock.com/stock-photos-images/1778R-4701 http://www.pixelparadox.com/mars.htm walk(?d-driver ?l1-loc ?l2-loc) :precondition (and (at ?d ?l1) (path ?l1 ?l2)) :effect (and (not (at ?d ?l1)) (at ?d ?l2))) guess? navigate(?d-rover ?x-waypoint ?y-waypoint) :precondition (at ?d ?x) (visible ?x ?y) … :effect (not (at ?d ?x)) (at ?d ?y)
Motivation • In this work, we aim at learning action models from a target domain, • e.g., learning the model of “navigate” in rovers, • by transferring knowledge from another domain, called a source domain, • e.g., the knowledge of the model “walk” in driverlog.
Problem Formulation • Formally, our learning problem can be addressed: • Given as inputs: • Action models from a source domain: As • A few plan traces from the target domain: {<s0,a1,s1,…,an,sn>}, where siis a partial state, andaiis an action. • Action schemas from the targetdomain: A’ • Predicates from the targetdomain: P
Problem Formulation • Formally, our learning problem can be addressed: • Given as inputs: • Action models from a source domain: As • A few plan traces from the target domain: {<s0,a1,s1,…,an,sn>}, where siis a partial state, andaiis an action. • Action schemas from the targetdomain: A’ • Predicates from the targetdomain: P • Output: • Action models in the target domain: At
Problem Formulation • Our assumptions are: • based on STRIPS domain • people do not write action names randomly: • E.g., not using “eat” to express “move”! • no need to observe full intermediate states in plan traces, i.e., intermediate state can be partial or empty. • action sequences in plan traces are correct. • actions in plan traces are all ordered, i.e., there are no concurrent actions. • there is information available in the Web related to “actions”.
Our Algorithm: LAWS source As plan traces predicates action schemas Constraints from web searching Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Our Algorithm: LAMMAS source As plan traces predicates action schemas Constraints from states between actions Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Our Algorithm: LAMMAS source As plan traces predicates action schemas Build constraints Web constraints Constraints imposed on action models State constraints Action constraints Plan constraints Solve constraints Action models At
Our Algorithm: LAMMAS source As plan traces predicates action schemas Build constraints Web constraints State constraints Constraints to ensure causal links in traces. Action constraints Plan constraints Solve constraints Action models At
Our Algorithm: LAMMAS source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solving constraints Using a weighted MAXSAT solver. Solve constraints Action models At
Web constraints • Used to measure the similarity between two actions. • To do this, we search two actions in the Web. • Specifically, we build predicate-action pairs from the target domain as follows: • Where, • p is a predicate • a is an action schema • p’s parameters are included by a’s source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Web constraints • Similarly, we build predicate-action pairs from the source: • where, • PAspre, PAsadd, PAsdel, denote sets of precondition-action pairs, add-action pairs and del-action pairs. • Note that we require p∈PRE(a), which is different from PAt source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Web constraints • Next, we collect a set of web documents D={di} by searching keyword w=<p,a> ∈PAt. • We process each page di as a vector yiby calculating thetf-idf (Jones 1972). • As a result, we have a set of real-number vectors Y={yi}. • Likewise, we can easily get a set of vectors X={xi} by searching keyword w’=<p’,a’>∈PAspre. source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Web constraints • We define the similarity function between two keywords w and w’ as follows: similarity(w,w’)=MMD2(F, Y, X), source As plan traces predicates action schemas Build constraints Web constraints MMD is the Maximum Mean Discrepancy, which is given by (Borgwardt et al. 2006). The mathematics is like: State constraints Action constraints Plan constraints Solve constraints Action models At
Web constraints • We define the similarity function between two keywords w and w’ as follows: similarity(w,w’)=MMD2(F, Y, X), source As plan traces predicates action schemas Build constraints Web constraints MMD is the Maximum Mean Discrepancy, which is given by (Borgwardt et al. 2006). The mathematics is like: State constraints Action constraints Plan constraints Solve constraints Action models At where
Web constraints • We define the similarity function between two keywords w and w’ as follows: similarity(w,w’)=MMD2(F, Y, X), source As plan traces predicates action schemas Build constraints Web constraints MMD is Maximum Mean Discrepancy, which is given by (Borgwardt et al. 2006). The mathematics is like: State constraints Action constraints Plan constraints Solve constraints a set of feature mapping function of a Gaussian kernel. Action models At where
Web constraints • Finally, we generate weighted web constraints by the following steps: • For each w=<p,a>∈PAt, and w’=<p’,a’>∈PAspre , we calculate similarity(w,w’), • Generate a constraint p ∈PRE(a), and associate it with similarity(w, w’) as its weight. likewise for ADD(a) and DEL(a) source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
State constraints (given by Yang et.al 2007) Generally, if p frequently appears before a, it is probably a precondition of a. Specifically, source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints The weights of all the constraints are calculated by counting their occurrences in all the plan traces. Plan constraints Solve constraints Action models At
Action constraints (given by Yang et.al 2007) • Action constraints are imposed to ensure the learned action models are succinct, which is source As plan traces predicates action schemas Build constraints Web constraints State constraints • These constraints are associated with the maximal weight of all the state constraints to ensure these constraints are maximally satisfied. Action constraints Plan constraints Solve constraints Action models At
Plan constraints (given by Yang et.al 2007) • We require that causal links in plan traces are not broken. Thus, we build constraints as follows. • For each precondition p of an action ajin a plan trace, either p is in the initial state, or there is ai prior to aj that adds p, and no ak between ai and aj that deletes p: where i < k < j. • For each literal q in goal, either q is in the initial state s0, or there is ai that adds q and no ak that deletes q: source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Plan constraints (given by Yang et.al 2007) • To ensure these constraints are maximally satisfied, we assign these constraints with the maximal weight of state constraints. source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Solve constraints • Before solving all these constraints, we adjust the weights of web constraints by replacing the original weights wo with wo’: • where wm is the maximal value of weights of state constraints, and γ belongs to [0,1). • We can easily adjust wo’ from 0 to +∞ by varying γ from 0 to 1. source As plan traces predicates action schemas Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Solve constraints source As plan traces predicates action schemas • Solve these weighted constraints by running a weighted MAXSAT solver. • The attained result is converted to action models, e.g. Build constraints Web constraints State constraints Action constraints Plan constraints Solve constraints Action models At
Experimental Result • Example result: • (:action walk(?d - rover ?x - waypoint ?y - waypoint) :precondition (and (at ?d ?x) (visible ?x ?y)) :effect (and (not (at ?d ?x)) (at ?d ?y) (not (visible ?x ?y)))) Missing condition Extra condition By comparing to hand-written action models, we know that there is a missing/extra condition. We calculate the error rate by counting all the missing and extra conditions, and finally get the accuracy.
Experimental Result • We compared LAWS to t-LAMP (by Zhuo et. al. 2009) and ARMS (Yang et. al. 2007), where • t-LAMP “borrows” knowledge by building syntax mappings; • ARMS learns without “borrowing” knowledge. • The results are shown below:
Experimental Result • We can see that • LAWS > t-LAMP > ARMS: accuracies of LAWS are higher than t-LAMP and ARMS, which empirically shows the advantage of LAWS. • accuracies decrease when plan traces increase, which is consistent with our intuition, since more information will help learning.
Experimental Result • We also test the following three cases: • Case I(γ = 0): not borrowing knowledge; • Case II(γ = 0.5 and wo = 1): weights of web constraints are the same, i.e., not using similarity function; • Case III(γ = 0.5): using the similarity function. • The results are shown bellow:
Experimental Result • We can see that: • Case III > the other two: suggests the similarity function could really help improve the learning result; • Case II > Case I: suggests that web constraints is helpful;
Experimental Result • Next, we test different ratios of states: • Accuracy generally increases when the ratio increases; • This is consistent with our intuition, since the increasing information could help improve the learning result.
Experimental Result • We also test different values of γ: • When γ increases from 0 to 0.5, the accuracy increases, which exhibits when the effect of web knowledge enlarges, the accuracy gets higher; • However, when γ is larger than 0.5, the accuracy decreases when γ increases. This is because the impact of plan traces is relatively reduced. This suggests knowledge from plan traces is also important in learning high-quality action models.
Cpu Time • The Cpu time is smaller than 1,000 seconds on a typical 2 GHZ PC with 1GB memory. • It is quite reasonable in learning. However, it did not include web searching time, since it mainly depends on specific network quality.
Conclusion • In this paper, we propose an algorithm framework to “borrow” knowledge from another domain with web search, and empirically show the improvement of the learning quality. • Our work can be extended to more complex action models, e.g., PDDL models. • Can also be extended to multi-task action-model acquisition.