1 / 54

Energy and Mean-Payoff Parity Markov Decision Processes

Energy and Mean-Payoff Parity Markov Decision Processes. Laurent Doyen LSV, ENS Cachan & CNRS Krishnendu Chatterjee IST Austria MFCS 2011. Games for system analysis. output. Spec: φ ( input,output ). System. Environment. input. Verification: check if a given system is correct

regis
Download Presentation

Energy and Mean-Payoff Parity Markov Decision Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy and Mean-Payoff Parity Markov Decision Processes Laurent DoyenLSV, ENS Cachan & CNRS Krishnendu ChatterjeeIST Austria MFCS 2011

  2. Games for system analysis output Spec: φ(input,output) System Environment input • Verification: check if a given system is correct •  reduces to graph searching

  3. Games for system analysis output ? Spec: φ(input,output) Environment input • Verification: check if a given system is correct •  reduces to graph searching • Synthesis : construct a correct system •  reduces to game solving – finding a winning strategy

  4. Games for system analysis output ? Spec: φ(input,output) Environment input • Verification: check if a given system is correct •  reduces to graph searching • Synthesis : construct a correct system •  reduces to game solving – finding a winning strategy This talk: environment is abstracted as a stochastic process output = Markov decision process (MDP) ? input

  5. Markov decision process

  6. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  7. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  8. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  9. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  10. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  11. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  12. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic

  13. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

  14. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic Strategy (policy) = recipe to extend the play prefix

  15. Markov decision process (MDP) Nondeterministic (player 1) Probabilistic (player 2) Strategy (policy) = recipe to extend the play prefix weak

  16. Objective Fix a strategy  (infinite) Markov chain Strategy is almost-sure winning, if with probability 1: - Büchi: visit accepting states infinitely often. - Parity: least priority visited infinitely often is even.

  17. Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective.

  18. Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. U • End-component = set of states U s.t. • if then some successor of q is in U • if then all successors of q are in U • strongly connected stronglyconnected

  19. Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. U • End-component = set of states U s.t. • if then some successor of q is in U • if then all successors of q are in U • strongly connected stronglyconnected End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME

  20. Decision problem Given an MDP, decide whether there exists an almost-sure winning strategy for parity objective. • End-component = set of states U s.t. • if then some successor of q is in U • if then all successors of q are in U • strongly connected stronglyconnected End-component is good if least priority is even Almost-sure reachability to good end-components in PTIME

  21. Objectives Qualitative Parity condition ω-regular specifications (reactivity, liveness,…)

  22. Objectives Qualitative Quantitative Parity condition Energy condition ω-regular specifications (reactivity, liveness,…) Resource-constrainedspecifications

  23. Energy objective Positive and negative weights(encoded in binary)

  24. Energy objective Positive and negative weights(encoded in binary) Energy level: 20, 21, 11, 1, 2,…(sum of weights) 20 21 11 11 1 2

  25. Energy objective 20 Positive and negative weights Energy level: 20, 21, 11, 1, 2,…(sum of weights) Initial credit 20 21 11 11 1 2

  26. Energy objective 20 Positive and negative weights Energy level: 20, 21, 11, 1, 2,…(sum of weights) Initial credit A play is winning if the energy level is always nonnegative. “Never exhaust the resource (memory, battery, …)”

  27. Decision problem Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative.

  28. Decision problem Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative. Equivalent to a two-player game: player 2 state If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

  29. Decision problem Given a weighted MDP, decide whether there exist an initial credit c0 and an almost-sure winning strategy to maintain the energy level always nonnegative. Equivalent to a two-player game: player 2 state If player 2 can force a negative energy level on a path, then the path is finite and has positive probability in MDP

  30. Energy Parity MDP

  31. Objectives Qualitative Quantitative Parity condition Energy condition ω-regular specifications (reactivity, liveness,…) Resource-constrainedspecifications Mixed qualitative-quantitative Energy parity MDP

  32. Energy parity MDP Strategy is almost-sure winning with initial credit c0, if with probability 1: energy condition and parity condition hold “never exhaust the resource”and“always eventually do something useful”

  33. Algorithm for Energy Büchi MDP For parity, probabilistic player is my friendFor energy, probabilistic player is my opponent

  34. Algorithm for Energy Büchi MDP For parity, probabilistic player is my friendFor energy, probabilistic player is my opponent Replace each probabilistic state by the gadget:

  35. Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game

  36. Algorithm for Energy Büchi MDP Reduction of energy Büchi MDP to energy Büchi game Reduction of energy parity MDP to energy Büchi MDP • Player 1 can guess an even priority 2i,and win in the energy Büchi MDP where: • Büchi states are 2i-states, and • transitions to states with priority <2i are disallowed

  37. Mean-payoff Parity MDP

  38. Mean-payoff Mean-payoff value of a play = limit-average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

  39. Mean-payoff Mean-payoff value of a play = limit-average of the visited weights Optimal mean-payoff value can be achieved with a memoryless strategy. Decision problem: Given a rational threshold , decide if there exists a strategy for player 1 to ensure mean-payoff value at least with probability 1.

  40. Mean-payoff games Memoryless strategy σ ensures nonnegative mean-payoff value iff all cycles are nonnegative in Gσ iff memoryless strategy σ is winning in energy game Mean-payoff games with threshold 0 are equivalent to energy games.

  41. Mean-payoff vs. Energy

  42. Mean-payoff parity MDPs Find a strategy which ensures with probability 1: - parity condition, and - mean-payoff value ≥ • Gadget reduction does not work: Player 1 almost-surely wins Player 1 loses

  43. Algorithm for mean-payoff parity End-component is good if - least priority is even - expected mean-payoff value ≥ • End-component analysis • almost-surely all states of end-component can be visited infinitely often stronglyconnected • expected mean-payoff value of all states in end-component is same Almost-sure reachability to good end-component in PTIME

  44. Algorithm for mean-payoff parity End-component is good if - least priority is even - expected mean-payoff value ≥ • End-component analysis • almost-surely all states of end-component can be visited infinitely often stronglyconnected • expected mean-payoff value of all states in end-component is same Almost-sure reachability to even end-component in PTIME

  45. MDP Qualitative Quantitative Parity MDPs Energy MDPs Mean-payoff MDPs Mixed qualitative-quantitative Energy parity MDPs Mean-payoff parity MDPs

  46. MDP Qualitative Quantitative Parity MDPs Energy MDPs Mean-payoff MDPs Mixed qualitative-quantitative Energy parity MDPs Mean-payoff parity MDPs

  47. Summary Algorithmiccomplexity

  48. Summary Algorithmiccomplexity Strategy complexity

  49. The end Thank you ! Questions ?

  50. The end

More Related