Uri Zwick – Tel Aviv Univ.

Uri Zwick –Tel Aviv Univ. Randomized pivoting rules for the simplex algorithm Lower bounds MDS summer school “The Combinatorics of Linear and Semidefinite Programming” August 14-16, 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

Deterministic pivoting rules Largest improvement Largest slope Dantzig’s rule – Largest modified cost Bland’s rule – avoids cycling Lexicographic rule – also avoids cycling All known to require an exponential number of steps, in the worst-case Klee-Minty (1972) Jeroslow (1973), Avis-Chvátal (1978), Goldfarb-Sit (1979), … , Amenta-Ziegler (1996)

Klee-Minty cubes (1972) Taken from a paper by Gärtner-Henk-Ziegler

Randomized pivoting rules Random-Edge Choose a random improving edge Random-Facet Described in previous lecture ☺ [Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Random-Facetis sub-exponential! Are Random-Edge and Random-Facet polynomial ???

Abstract objective functions (AOFs) Acyclic Unique Sink Orientations (AUSOs) Every face shouldhave a unique sink

AUSOs of n-cubes 2n facets2n vertices USOs and AUSOs Stickney, Watson (1978) Morris (2001) Szabó, Welzl (2001) Gärtner (2002) The directeddiameter is exactly n Exercise: Prove it.

AUSO results Random-Facet is sub-exponential[Kalai (1992)] [Matoušek-Sharir-Welzl (1996)] Sub-exponential lower bound for Random-Facet [Matoušek (1994)] Sub-exponential lower boundfor Random-Edge [Matoušek-Szabó (2006)] Lower bounds do not correspondto actual linear programs Can geometry help?

Random-Edge , Random-Facetare not polynomial for LPs Consider LPs that correspond toMarkov Decision Processes (MDPs) Simplex Policy iteration Obtain sub-exponential lower bounds for theRandom-Edge and Random-Facet variantsof the Policy Iteration algorithm for MDPs

Randomized Pivoting Rules Lower bounds obtained for LPs whose diameter is n [Kalai’92][Matousek-Sharir-Welzl’92] [Friedmann-Hansen-Z ’11]

3-bit counter

Turn-based 2-PlayerStochastic Games[Shapley ’53] [Gillette ’57] … [Condon ’92] Total reward version Discounted version Limiting average version Both players have optimal positional strategies Can optimal strategies be found in polynomial time?

Stopping condition For the total reward version assume: No matter what the players do, the game stops with probability 1. Exercise: Show that discounted games correspond directly to stopping total reward games

Strategies / Policies A deterministicstrategy specifies which actionto take given every possible history A mixedstrategy is a probability distributionover deterministic strategies A memorylessstrategy is a strategy that depends only on the current state A positionalstrategy is a deterministicmemoryless strategy

Values general positional general positional Both players have positionaloptimal strategies There are positional strategies that are optimal for every starting position

Markov Decision Processes [Shapley ’53] [Bellman ’57] [Howard ’60] … Total reward version Discounted version Limiting average version Optimal positionalpoliciescan be found using LP Is there a strongly polynomialtime algorithm?

Stochastic shortest paths (SSPs) Minimize the expected costof getting to the target

Turn-based non-Stochastic Games[Ehrenfeucht-Mycielski(1979)] Total reward version Easy Limiting average version Discounted version Both players have optimal positional strategies Still no polynomialtime algorithms known!

Turn-basedStochastic Games (SGs)long-term planning in a stochasticandadversarial environment 2½-players Non-StochasticGames (MPGs)adversarialnon-stochastic Markov Decision Processes (MDPs)non-adversarialstochastic 2-players 1½-players Deterministic MDPs (DMDPs) non-stochastic,non-adversarial 1-player

Parity Games (PGs) A simple example Priorities 2 3 2 1 4 1 EVEN wins if largest priorityseen infinitely often is even

8 3 ODD EVEN Parity Games (PGs) EVEN wins if largest priorityseen infinitely often is even Equivalent to many interesting problemsin automata and verification: Non-emptyness of -tree automata modal -calculus model checking

8 3 ODD EVEN Parity Games (PGs) Mean Payoff Games (MPGs) [Stirling (1993)] [Puri (1995)] Replace priority k by payoff (n)k Move payoffs to outgoing edges

Let’s focus on MDPs

Evaluating a policy MDP + policy  Markov Chain Values of a fixed policy can be found by solving a system of linear equations

Improving a policy (using a single switch)

Policy iteration for MDPs [Howard ’60]

Dual LP formulation for MDPs

Dual LP formulation for MDPs a is not an improving switch Basic solution  (positional) Policy

Primal LP formulation for MDPs Vertex  Complement of a Policy

TB2SG  NP  co-NP TB2SG  P ???

Policy iteration variants

Random-Facet for MDPs • Choose a random action not in the current policy and ignore it. • Solve recursively without this action. • If the ignored action is not an improving switch with respect to the returned policy,we are done. • Otherwise, switch to the ignored action and solve recursively.

Policy iteration for 2-player games • Keep a strategy of player 1 and an optimal counter-strategy of player 2. • Perform improving switches for player 1 and recompute an optimal counter-strategy for player 2. Exercise: Does it really work? Random-Facet yields a sub-exponential algorithmfor turn-based 2-player stochastic games!

Lower bounds for Policy Iteration Switch-All for Parity Games is exponential [Friedmann ’09] Switch-All for MDPs is exponential [Fearnley ’10] Random-Facet for Parity Games is sub-exponential [Friedmann-Hansen-Z ’11] Random-Facet and Random-Edge for MDPs and hence for LPs are sub-exponential [FHZ’11]

Lower bound for Random-Facet Implement a randomized counter

Lower bound for Random-Facet Implement a randomized counter • Lower bound for Random-Edge Implement a standard counter

3-bit counter (−N)15

3-bit counter 0 1 0

3-bit counter – Improving switches Random-Edge can choose eitherone of these improving switches… 0 1 0

Cycle gadgets Cycles close one edge at a time Shorter cycles close faster

Cycle gadgets Cycles open “simultaneously”

3-bit counter 23 1 0 1 0

From b to b+1 in seven phases Bk-cycle closes Ck-cycle closes U-lane realigns Ai-cycles and Bi-cycles for i<k open Ak-cycle closes W-lane realigns Ci-cycles of 0-bits open

3-bit counter 34 1 0 1

Size of cycles Various cycles and lanes compete with each other Some are trying to open while some are trying to close We need to make sure that our candidates win! Length of all A-cycles = 8n Length of all C-cycles = 22n Length of Bi-cycles = 25i2n O(n4)vertices for an n-bit counter Can be improved using a more complicated construction and an improved analysis (work in progress)

Concluding remarks and open problems “Game-theoretic” perspective help understandthe behavior of randomized pivoting rules Polynomial pivoting rule? Polynomialbound on diameter? Strongly polynomial algorithms for MDPs? Polynomialalgorithms 2-player games?

Uri Zwick – Tel Aviv Univ.

Uri Zwick – Tel Aviv Univ.

Presentation Transcript

Uri Zwick Tel Aviv University

talk-ppt - PowerPoint Presentation

David Wilson – Microsoft Research Uri Zwick – Tel Aviv Univ.

Tel Aviv immediate work plan

Uri Zwick – Tel Aviv Univ.

Eastern Mediterranean Regional Office Tel Aviv Presentation at AgroMashov

HEP Tel Aviv University

HEP Tel Aviv University

Uri Zwick Tel Aviv University

TEL AVIV

Amnon Ta-Shma Uri Zwick

Uri Zwick Tel Aviv University

TEL AVIV

Uri Zwick Tel Aviv University

Mike Paterson Uri Zwick

Tel Aviv Global City

Amir Levinson Tel Aviv University

Mike Paterson Uri Zwick

Tel Aviv/Jaffa

Mike Paterson Uri Zwick

RFC 2806bis: tel URI

Tel Aviv Global City