1 / 39

From high level goals to policies: a polynomial time algorithm for k-maintainable goals

This research focuses on developing languages for representing and achieving goals, applying reasoning to model cell behavior, drug effects, and medical diagnosis. It introduces a polynomial time algorithm for parameterized maintainability goals.

wilfredt
Download Presentation

From high level goals to policies: a polynomial time algorithm for k-maintainable goals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From high level goals to policies: a polynomial time algorithm for k-maintainable goals Chitta Baral Arizona State university (joint work with Marcus Bjareland, Thomas Eiter, Mutsumi Nakamura, and Tran Son)

  2. Quick overview of my research • Knowledge Representation and Reasoning • Language design; theoretical building blocks; implementation; applications. • Action, change and histories • Developing languages for representing actions, the structure of the world, and the effects of the actions on the world. • Developing languages for expressing goals or directives. • Developing ways to achieve goals • Formulating various kinds of reasoning (e.g. prediction, planning, explanation, diagnosis, counterfactuals, etc.) • Application of the above to modeling cell behavior • Prediction: (side) effect of drugs • Planning: Drug design • Explanation: explaining unusual behavior; medical diagnosis • Others: hypothesis generation

  3. Motivation: Parameterized maintainability goals • Always f, also written as □ f - too strong for many kind of maintainability (eg. maintain the room clean) • Always Eventually f, also written as □◊ f. - Weak in the sense it does not give an estimate on when f will be made true. - May not be achievable in presence of continuous interference by belligerent agents. • □ f ------------------ □◊k f -------------------------- □◊ f • □◊3 f is a shorthand for □ ( f VOf VOOf VOOOf ) • But if an external agent keeps interfering how is one supposed to guarantee □◊3 f .

  4. Motivation: a controller-agent transcript Controller (to the agent/robot):Your goal is to maintain the room clean. Robot/Agent:Can you be precise about what you mean by ‘maintain’? Also can I clean anytime or are there restrictions? Controller:You can only clean when the room is unoccupied. Controller:By ‘maintain’ I mean ALWAYSclean. Robot/Agent:I won’t be able to guarantee that. What if while the room is occupied some one makes it dirty? Controller:Ok, I understand. How about ALWAYS EVENTUALLLYclean. Controller’s Boss:‘Eventually’ is too lenient. We can’t have the room unclean for too long. We should put some bound.

  5. Controller-agent transcript (cont) Controller:Sorry, Sir. I should have made it more precise. ALWAYSEVENTUALLY3 clean Robot/Agent:Sorry. I can neither guarantee ALWAYS EVENTUALLLY clean nor guarantee ALWAYS EVENTUALLLY3 clean. What if the room is continuously being used and you told me I can not clean while it is being used. Controller:You have a good point. Let me clarify again. If you are given an opportunity of 3 units of time without the room being occupied (i.e., without any interference from external agents) then you should have the room clean during that time. Robot/Agent:I think I understand you. But as you know I am a robot and not that good at understanding English. Can you please input it in a precise language.

  6. Formulating k-maintainability: a system • A system is a quadrupleA = (S,A,Ф, poss), where – S is the set of system states; – A is the set of actions, which is the union of the set of agents actions, Aag, and the set of environmental actions, Aenv; – Ф : S x A → 2 S is a non-deterministic transition function that specifies how the state of the world changes in response to actions; – poss : S → 2 A is a function that describes which actions are possible to take in which states.

  7. A system s3 s6 a1 a4 s1 a1 s4 a5 a3 a2 s7 s2 s5 S = {s1,s2,s3,s4,s5,s6,s7} A = {a1, a2, a3,a4,a5} Ф : as shown in the picture poss(s1) = {a1,a2,a3} poss(s4) = {a4}

  8. a c d a a a’ a b f h e g S = {b,c,d,f,g,h} A = {a, a’, e} Aag = {a, a’} Aenv = {e} Ф : as shown in the picture poss(b) = {a} when our policy dictates a to be executed at b.

  9. Controls and super-controls • Given a system A = (S,A,Ф, poss) and a set Aag (subset of A) of agent actions, – a control policy for A w.r.t. Aag is a partial function K: S → Aag, such that K(s) is an element of poss(s) whenever K(s) is defined. – a super-control policy for A w.r.t. Aag is a partial function K : S → 2 Aag, such that K(s) is a subset of poss(s) and K(s) ≠ { } whenever K(s) is defined.

  10. Reachable states and closure • Reachable statesR(A,s): Given a system A = (S,A,Ф, poss) and a state s, R(A, s) (subset of S ) is the smallest set of states that satisfy the following conditions: (i) s is in R(A, s); and (ii) If s’ is in R(A, s) and a is in poss(s′), then Ф(s’, a) is a subset of R(A, s) . • Let A = (S,A,Ф, poss) be a system and let S be a subset of S. Then the closure of A w.r.t. S, denoted by Closure(S,A), is defined by Closure(S,A) = Us in S R(A, s) .

  11. a c d a a a’ a b f h e g A = (S,A,Ф, poss) R(A,d) = {d,h} R(A,f) = {f, g, h} Closure({d,f}, A) = {d,f,g,h}

  12. Unfoldk(s,A,K): • An element of Unfoldk(s,A,K) is a sequence of states of length at most k + 1 that the system may go through if it follows the control K starting from the state s. Formally: Let A = (S,A,Ф, poss) be a system, let s belong to S, and let K be a control for A. Then Unfoldk(s,A,K) is the set of all sequences σ = s0, s1, . . . , sl where l ≤ k and s0 = s, such that K (sj) is defined for all j<l, sj +1 belongs to Ф (sj, K(sj)), and if l<k, then K(sl) is undefined.

  13. a c d a a a’ a b f h e a g Consider policy K : Do action a in states b, c, and d Unfold3(b,A,K) = { <b,c,d,h>, <b,g>} Unfold3(c,A,K) = { <c,d,h> }

  14. Definition of k-maintainability: the parameters 1. a system A = (S,A,Ф, poss), 2. a set Aag ⊆ A of agent actions, 3. set of initial states S 4. a set of desired states E that we want to maintain, 5. Maintainability parameter k. 6. a function exo : S → 2 Aenv detailing exogenous actions, such that exo(s) is a subset of poss(s), and 7. a control K (mapping a relevant part of S to Aag) such that K(s) belongs to poss(s).

  15. Basic Idea • Ignoring interference: • From any state under consideration by following the control policy one should visit E in k steps. • Accounting for interference: • Broaden the states under consideration from the initial states to all states that can be reached due to the control policy and the environment. (Use the notion of Closure.) • When using Closure • take into account the control policy; ignore other agents actions besides the one dictated by the control policy. • Also only consider exogenous actions in exo(s).

  16. Definition of k-maintainability • possK,exo(s) is the set {K(s)} Uexo(s). • AK,exo = (S,A,Ф, possK,exo) • Given a system A = (S,A,Ф, poss),a set of agents action Aag (subset of A ) and a specification of exogenous action occurrence exo, we say that a control K for A w.r.t. Aag k-maintains subset Sof S with respect to subset E ofS, where k≥0, if - for each state s in Closure(S,AK,exo) and each sequence σ = s0, s1, . . . , srin Unfoldk(s,A,K)with s0 = s, it holds that {s0, s1, . . . , sr} ∩ E ≠ { }.

  17. a c d a a a’ a b f h e g Consider policy K: Do action a in states b, c, and d poss(b) = {a,a’} possK,exo(b) = {a} Closure({b,c},A)= {b,c,d,f,g,h} Closure({b,c},AK,exo)= {b,c,d,h}

  18. a c d a a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} Such a policy: Do a in b, c, and d

  19. a c d a e a a’ a b f h e g Goal: 3-maintainable policy for S={b} w.r.t. E={h} No such policy.

  20. Constructing k-maintainable control policies: pre-formulation attempts • Handwritten policies: subsumption architecture, RAPs, situation control rules, protocols. • Our initial motivation behind formulating maintainability was when we tried to formalize what a control module was doing. • Kaebling and Rosenschien 1991: In the control rule “if condition c is satisfied then do action a”, the action a is the action that leads to the goal from any state where the condition c is satisfied.

  21. a c d a a a’ a b f h e g Forward Search: If we use minimal paths or minimal cost paths we might pick a’; then we would have to backtrack. Backward Search: Should we include both d and f.

  22. Propositional Encoding of solutions • Input: An input I is a system A= (S, A,Φ, poss), set of goal states E  S , set of initial states S S, a set AagA, a function exo, and an integer k  0 • Output: A control K such that S is k-maintainable with respect to E (using the control K), if such a control exists. Otherwise the output is the answer that no such control exists. • AIM: Given an input I, we construct a SAT instance sat(I) in polynomial time such that sat(I) is satisfiable if and only if the input I allows for a k-maintainable control, and that the satisfying assignments for sat(I) encode possible such controls.

  23. Propositional encoding: notation • si denotes that • there is a path from state s to some state in E using only agent actions and at most i of them, to which we refer as “there is an a-path from s to E of length at most i,” and that • from each state s'reachable from s, there is an a-path from s' to E of length at most k.

  24. The encoding sat(I) (0) For all states s, and for all j, 0  j <k: sj sj+1 (1) For all s  E: s0 (2) For all states s, t such that Φ(a,s) = t for some action a  exo(s): sk tk (3) For all states s not in E and all i, 1  i  k: sit PS(s) ti-1, where PS(s) = {t  S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states not in E: sk (5) For all states s not in E:  s0

  25. Constructing policies from the models of sat(I) • Let M be a model of Sat(I). • CM = {sS| M╞sk} • LM (s): the smallest index j such that M╞sj(i.e., s0, s1 ,…, sj-1 are false and sj is true), which we call the level of s w.r.t. M. • K(s) is defined iff s CM \ E and K(s) {a Aag| Φ(s,a) = t , t CM , LM (t) < LM (s) }

  26. Proposition • Let I consist of a system A= (S, Aag, Φ, poss),where Φ is deterministic, a setAagA,sets of statesE  S, and S  S, an exogenous function exo, and a integer k. Then, (i) S is k-maintainable w.r.t E iff sat(I) is satisfiable. (ii) Given any model M of sat(I), any control K constructed from the algorithm above k-maintains S w.r.t. E.

  27. Reverse Encoding • a  b is equivalent to •  a  b is equivalent to •  ( b)   a is equivalent to • b  a is equivalent to • b’  a’ is equivalent to • a’  b’

  28. Rearranging sat(I) (0) For all states s and for all j, 0  j <k: sj sj+1 s’j  s’j+1 (1) For all s  E: s0 s’0 (2) For all states s, t such that Φ(a,s) = t for some action aexo(s): sk tk s’k tk' (3) For all state s not in E and all i, 1  i  k: sitPS(s) ti-1 , s’i ^tPS(s) t’i-1 where PS(s) = {t S|  a Aag poss(s): t= Φ(a,s) }; (4) For all initial states s not in E: sk s’k (5) For all states not in E:  s0 s’0

  29. a c d a a a’ a b f h e g (6) b’0, c’0, d’0, f’0, g’0 (From 5) (7) g’1, g’2, g’3 (From 3) (8) b’1, c’1 (From 6 and 3) (9) f’3 (From 7 and 2) (10) f’2 (From 9 and 0) (11) f’1 (From 10 and 0) (12) b’2 (From 8, 11, and 3) Thus M = {g’3, g’2, g’1 , g’0, f’3, f’2, f’1 , f’0, b’2, b’1, b’0, c’1, c’0, d’0} LM(b) = 3 LM(c) = 2 LM(d) = 1

  30. Polynomial time generation of control policy and maximal control policy • Computing a model of a Horn theory is a well-known polynomial problem (Dowling & Gallier 84). Thus, • Theorem: Under deterministic state transitions, problem k-MAINTAIN is solvable in polynomial time. • Maximal Control • Each satisfiable Horn theory T has the least model, MT, which is given by the intersection of all its models. • The least model is computable in linear time in the size of the encoding. • This model not only leads to a k-maintainable control, but also leads to a maximal control, in the sense that the control is defined on a greatest set of states outside E among all possible k-maintainable controls for S' w.r.t. E such that S is a subset ofS'.

  31. Dealing with non-deterministic transition functions • Notations: • We say that there exists an a-path of length at most k  0 from a state s to a set of states S' , if either s S', ors S' , k > 0 and there is some action a Aag poss(s) such that for every t Φ(s,a) there exists an a-path of length at most k-1 from tto S'. • s_ai, i > 0, will denote that there is an a-path from s to E of length at most i starting with action a. • The encoding sat'(I) has again groups (0)-(5) of clauses as follows: (0), (1), (4) and (5) are the same as in sat(I). (2) For any state s and tsuch that tΦ(a,s)for some action a  exo(s): sk tk

  32. Dealing with non-deterministic transition functions (cont.) (3) For every state s not in E and for all i, 1  i  k : (3.1) si(a  Aag poss(s) )s_ai; (3.2) for every a  Aag poss(s) and t Φ(s,a) : s_ai ti-1; (3.3) for every a Aag poss(s) if i < k: s_ais_ai+1;

  33. A direct algorithm • Initialization • For all states s not in E make s’0 true. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. • Repeat until no change or until s’k is true for some initial state s. • If s’i is true then make s’i-1 true. If s_a’i is true then make s_a’i-1 . true. • If tΦ(a,s) for some exogenous action a and t’kis true then make s’k true. • For any state s not in E • If tΦ(a,s) for some agent action a and t’i-1is true then make s_a’i true. • If for all agents actions a that is executable in s we have s_a’i then make s’i true.

  34. A direct algorithm (cont.) • If for some initial state s, s’k is true then the system is not k-maintainable, else construct super-control as follows: • For states s in E, K(s) is undefined and for other states K(s) = { a : s_a’k is not true}

  35. Direct algorithm using counters • Idea: c[s] = i means s’0 … s’i andc[s_a] = i means s_a’0 … s_a’i • Initialization • For all states s not in E make s’0 true. c[s]:= 0. • For all states s not in E without any outgoing edges with agents actions then make s’0 … s’k true. c[s] := k. • For all states s, if agent action a is not executable in s then make s_a’0 …s_a’k true. c[s_a] := k. • The other steps are similar. • The idea can then be extended to actions with durations (or costs).

  36. Computational Complexity • k-maintainability is PTIME-complete (under log-space reduction). PTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action • k-maintainability is EXPTIME-complete when we have a compact representation. EXPTIME-hardness holds for 1-maintainability, even if all actions are deterministic, and there is only one deterministic exogenous action

  37. Conclusion • High level goal specification is important. • Certain important goal specification notions can not be expressed using existing goal representation languages. • k-maintainability is an important notion. • finite-maintainability is reinvention of Dijkstra's notion of self-stabilization. • There is a big research community of self-stabilization in distributed control and fault tolerance. • But they have not much focused on automatic generation of control (protocol, in their parlance) • They have focused more on proving correctness of hand written protocol • Most specifications over infinite trajectories would be better of with k-maintainability like notions as part of the specification. • Role 1 of k: length of the window of opportunity • Role 2 of k: bound within which maintenance is guaranteed

  38. Conclusion (cont.) • Sat encoding to Horn logic program encoding – an interesting and novel approach to design polynomial algorithms • One often does not think in terms of negative propositions.

  39. THANK YOU!

More Related