670 likes | 756 Views
Concurrent Reachability Games. Peter Bro Miltersen Aarhus University. My apologies …. For not getting slides ready in time for inclusion in booklet ! Slides available at http://www.daimi.au.dk/~bromille. Concurrent reachability games.
E N D
ConcurrentReachability Games Peter Bro Miltersen Aarhus University CTW 2009
My apologies… • For not getting slides ready in time for inclusion in booklet! • Slides available at http://www.daimi.au.dk/~bromille CTW 2009
Concurrentreachability games • Class of two-playerzero-sum games generalizingsimple stochastic games (Uri’s talk yesterday). • Studiedmainly by the formal methods (”Eurotheory”) community (but sometimes at suchvenues as FOCS and SODA). • Veryinteresting and challengingalgorithmic problems! CTW 2009
Slide stolen from Uri….. min-sink MAX-sink Simple Stochastic game (SSGs)Reachability version[Condon (1992)] 1/2 ZP’96 1/2 R min MAX RAND Two Players: MAX and min Objective:MAX/min the probability of getting to the MAX-sink
Another slide stolen from Uri….. Simple Stochastic games (SSGs)Strategies A generalstrategy may be randomized and history dependent A positional strategy is deterministicand history independent Positionalstrategy for MAX: choice of an outgoing edge from each MAX vertex
Last slide stolen from Uri (I promise!) Simple Stochastic games (SSGs)Values Every vertex i in the game has a valuevi general positional general positional Both players have positionaloptimal strategies There are strategies that are optimal for every starting position
ConcurrentReachability Games min-sink MAX-sink Simple Stochastic game (SSGs)Reachability version[Condon (1992)] 1/2 ZP’96 1/2 R min MAX RAND Two Players: MAX and min Objective:MAX/min the probability of getting to the MAX-sink
(Simple) concurrentreachability game • Arena: • Finitedirectedgraph. • One Max sink (”goal”) node. • Eachnon-sink node has assigned a 2x2 matrix of outgoingarcs. • Play: • A pebblemoves from node to node as in a simple stochastic game. • In each step, Max chooses a row and Min simultaneouslychooses a column of the matrix. • The pebblemovesalong the appropriatearc. • If Max reaches the goal node hewins • If thisneverhappens, Min wins. CTW 2009
Simulation MAX CTW 2009
Simulation min CTW 2009
Simulation 1/2 1/2 R …. Somewhat more subtlethatthisworks! CTW 2009
”Proof” of correctness • Wewantvalues in the CRG to be the same as in the SSG. • In particular, the value of the node simulating a cointossshouldbe the average of the values of the two nodes it points to. • If thesetwovaluesarethe same, this is ”clearly” the case. • If they have differentvaluesv1, v2, the simulatedcointoss nodes is a game of MatchingPennieswithpayoffsv1, v2. This game has value (v1+v2)/2. CTW 2009
Simple Stochastic games (SSGs)Values Concurrent Reachability Games (CRGs) Every vertex i in the game has a valuevi general positional general positional Both players have positionaloptimal strategies There are strategies that are optimal for every starting position
Simple Stochastic games (SSGs)Values Concurrent Reachability Games (CRGs) Every vertex i in the game has a valuevi sup general stationary inf general stationary Both players have stationaryoptimal strategies There are strategies that are optimal for every starting position Stationary: As positional, exceptthatweallowrandomization
min-sink MAX-sink Whyrandomizedstrategies? 0-1 matrix games canbeimmediatelysiimulated CTW 2009
min-sink MAX-sink Whysup/infinstead of max/min? CTW 2009
min-sink MAX-sink Whysup/infinstead of max/min? CTW 2009
Whysup/infinstead of max/min • ”Conditionallyrepeatedmatchingpennies”: • Min hides a penny • Max tries to guessif it is heads up ortails up. • If Max guessescorrectly, hegets the penny. • If Max incorrectlyguessestails, heloses (goesintomin-sink/trap) • If Max incorrectlyguessesheads, the game repeats. • What is the value of this game? 1 CTW 2009
Almost optimal strategy for Max • Guess ”heads” withprobability 1-² and ”tails” withprobability² (every time). • Guaranteed to winwithprobability 1-². • But nostrategy of Max winswithprobability 1. CTW 2009
Values and near-optimal strategies • Each position in a concurrent reachability game has a value. • For any ε>0, each player has a stationary strategy guaranteeing the value within ε (an ε-optimal strategy). • Shown in Everett, “Recursive games”, 1953.
Algorithmic problems • Qualitativelysolving a CRG. • Determiningwhich nodes have value 1. • Quantitativelysolving a CRG. • Approximatelycomputing the values of the nodes. • Strategicallysolving a CRG. • Computing an ²-optimal stationarystrategy for a given ². CTW 2009
QualitativelysolvingCRGs • De Alfaro, Henzinger, Kupferman, FOCS 1998. • Beautifulalgorithm! • Formal methodscommunity type algorithm! • Fixed point computationinside a fixed point computationinside a fixed point computation…. • Runs in time O(n2). • Open (I think): Canthis time boundbeimproved? (for SSGs the corresponding time is linear) CTW 2009
QuantitativelysolvingCRGs • Wewant to approximate the values of the positions. • Why not computethemexactly? CTW 2009
The value of a CRG maybe irrational! Ferguson, Game Theory Positive payoffsdifferent from 1 canbesimulatedwithscaling and cointossgadgets. Negative payoffsareharder to simulate but in this game wecan do it by adding a constant to all payoffs CTW 2009
QuantitativelysolvingCRGs • Wewant to approximate the values of the positions. • Why not computethemexactly? • Maybewewant to look at the decision problem consisting of comparing the value to a given rational? CTW 2009
SUM-OF-SQRT hardness • SUM-OF-SQRT: Given an epression E which is a weigthed (by integers) sum of squareroots (of integers), does E evaluate to a positive number? • Not known to be in P or NP oreven the polynomialhierarchy (open at leastsinceGarey and Johnson). • Etessami and Yannakakis, 2005: Comparing the value of a CRG to a rational number is hard for SUM-OF-SQRT. CTW 2009
Sketch of Proof • Wealreadysawhow to make games whosevaluesare the solution to certainquadraticequations, i.e., squareroots + rationals. • Oncewe have a bunch of such games, wecaneasilymake a game whosevalue is the average by a ”cointossgadget”. CTW 2009
QuantitativelysolvingCRGs • Wewant to approximate the values of the positions. • Why not computethemexactly? • Maybewewant to compare the value to a given rational? • Given ², wewant to compute an approximationwithin². CTW 2009
Valueiteration • Assign all nodes ”valueapproximation” 0 • Replace pointers withvalueapproximations. Each node is now a matrix game. • Solve and replaceapproximations. • Theorem: Valueapproximationsconverge to values (from below). • Proof sketch: The valueapproximationsare the exactvalues of a time limited version of the game. • How long time to getwitin 0.01 of actualvalues? • Even for SSGsthistakesexponential time (Condon’93). • For CRGs, an open problem untilrecently (seelater). CTW 2009
Anotheralgorithm for approximatingvalues • The property of being a numberlargerorsmallerthan the value of a CRG canbeexpressed by a polynomiallengthformula in the existentialfirstordertheory of the reals. • Thereexists a stationarystrategysuchthat…. • As a corollary to Renegar’89, approximating the value is in PSPACE. • This is the bestknown ”complexityclass” upper bound! • …. also the bestknownconcrete ”big-O” complexitybound (usingBasu et al instead of Renegar). CTW 2009
Whyno NP ÅcoNP upper bound? • Guess a strategy and verifythat it works? • Chatterjee, Majumdar, Jurdzinski, On Nash equilibria in stochastic games, CSL’04 claims such a result. • In 2007, KoushaEtessami found a technical issue in the proof and the authors retracted the claim. CTW 2009
MAX-sink Computingvalues vs. Findingstrategies • It is not obviousthatcomputing the values gives any information about the strategies. • In contrast, for SSGs, optimal strategiescanbecomputed from values in linear time (Andersson and M., ISAAC’09) CTW 2009
Algorithms strategically solving concurrent reachability games Chatterjee, de Alfaro, Henzinger. Strategy improvement for concurrent reachability games. QEST’06. Chatterjee, de Alfaro, Henzinger. Termination criteria for solving concurrent safety and reachability games, SODA’09. Policy improvement! No time bounds given….
“Hardness” of solving CRGs Theorem [Hansen, Koucky and M., LICS’09]: • Any algorithm that manipulates ε-optimal strategies of concurrent reachability games must use exponential space (so no NPÅcoNP algorithm comes from guessing strategies) • Value iteration requires worst case doubly exponential time to come within non-trivial distance of actual values (in contrast, value iteration on SSGs converges in only exponential time).
Dante in Purgatory 7 6 Purgatory has 7 terraces. 5 4 3 2 1 Dante enters Purgatory at terrace 1.
Dante in Purgatory 7 6 5 4 3 2 While in Purgatory, once a second, Dante must play Matching Pennies with Lucifer 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 6 5 4 3 If Dante wins, he proceeds to the next terrace 2 1
Dante in Purgatory 7 If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory. 6 5 4 3 2 1
Dante in Purgatory 7 If Dante wins Matching Pennies at terrace 7, he wins the game of Purgatory. 6 5 4 3 2 1