30 likes | 179 Views
Hybrid Markov Decision Processes. Quality of -HALP Approximation. Achieving d -Infeasibility. Factored - HALP formulation. Linear Value Function Approximation. Efficient Solution for Factored - HALP. Near Feasibility Implies Near Optimality. Irrigation Network Example.
E N D
Hybrid Markov Decision Processes Quality of -HALP Approximation Achieving d-Infeasibility Factored -HALP formulation Linear Value Function Approximation Efficient Solution for Factored -HALP Near Feasibility Implies Near Optimality Irrigation Network Example HALP Formulation Factored Hybrid MDPs Experimental Results A1 X1 X’1 R1 A2 X2 X’2 R2 A3 X3 X’3 Representing Conditional Probabilities Optimal Policy and Value Function Solving Factored MDPs with Continuous and Discrete Variables Milos Hauskrecht Department of Computer Science Carlos Guestrin Intel Research, Berkeley Branislav Kveton Intelligent Systems Program Introduction Factored -HALP Algorithm Approximate LP for HMDPs Experimental Results • Many real-world stochastic planning problems have continuous and discrete variables, naturally formulated as hybrid MDPs (HMDPs) • There are few methods for solving Hybrid MDPs • Value function represented as a linear combination of k basis functions: • Basis functions fi(x) depend on continuous and discrete variables. Optimization is performed over weights w • HALP formulation contains infinite number of constraints, one for each state x and action a • Discretization of continuous state and action variables to (1 / 2 + 1) equally spaced values • Total number points per factor exponential only in the dimension of factor • Number of constraints is finite, although exponential in the number of variables • Irrigation network is a network of irrigation channels connected by regulation devices • Transition functions represent water flows between channels given actions at regulation devices • Objective is the operation of valves to maintain optimal water levels • Reward function characterizes preferred water levels Large irrigation network n-ring-of-rings topology Inflow regulation device Hybrid MDPs are Complex to Solve Outflow regulation device • Traditional solution techniques are affected by the curse of dimensionality • Discrete-state MDPs • State and action spaces grow exponentially with the number of variables • Continuous-state MDPs • State and action spaces are infinitely large • Often, no closed-form representation for the value function exists • Naïve discretization often leads to exponential complexity n-ring topology • Hybrid approximate LP (HALP) formulation: • where • i is state relevance weight • Fi(x, a) is a difference between basis function fi(x) and its discounted backprojection • Discretize continuous state and action variables • Identify subsets of variables Xi and Ai (Xj and Aj) that the functions Fi(x, a) (Rj(x, a)) depend on • Compute Fi(xi, ai) and (Rj(xj, aj)) for all possible configurations of Xi and Ai (Xj and Aj) • Calculate state relevance weights ai • Use ALP algorithm for factored discrete-valued variables to find the vector of optimal weights w (Guestrin et al. ’01) Irrigation channel represented by a continuous variable Regulation device represented by a discrete action node • Multiagent factored hybrid MDP (HMDP) is a 4-tuple (X, A, P, R): • X is a vector of state variables (discrete or continuous) • A is a vector of action variables (discrete or continuous) • Continuous variables restricted to [0,1] • P is a transition model represented by DBN • R is a reward function is sum of local rewards • Continuous formulation of the irrigation network problem cannot be solved exactly by any MDP solver • Evaluation of solution quality (mean and standard deviation) and running time (in seconds): Quality of HALP Approximation • Solution of e-HALP likely violates constraints in the HALP • Proposition 2 Let w be an optimal solution of the HALP and w be an optimal solution of the e-HALP, such that solutionwis d-infeasible. Then: • Proposition 1 Let w be an optimal solution of the HALP. Then, for any Lyapunov function L(x): • Analogous to de Farias and Van Roy 2001 result for approximate LP for discrete MDPs The quality of the -HALP solution beats alternative approx. opt. techniques on the large irrigation network example • Theorem 1 Let w be an optimal solution of the e-HALP satisfying the d-infeasibility condition. Then, for any Lyapunov function L(x): Representational & Computational Challenges • Constraints require representation of backprojections, functions of continuous and discrete variables • HALP requires solution of (linear) convex problem with infinite number of constraints • Use parametric representation • Discrete child with discrete parents: • Use tabular, decision trees, noisy-or, etc. • Discrete child with continuous and discrete parents: • Use discriminant functions, dj(Par(Xi’))≥0: • Continuous child with continuous and discrete parents: • Mixture of Beta distributions: p(Xi’|Par(Xi’)) = Σ Beta(Xi’| hi1(Par(Xi’)), hi2(Par(Xi’))) hi1(Par(Xi’))>0 and hi2(Par(Xi’))>0 define moments Time complexity grows polynomially with network topology size n Solution quality improves with higher grid resolution Time complexity grows polynomially with higher grid resolution 1/ Choice of Representation • Appropriate choice of e-grid to achieve d-infeasibility • Lipschitz modulus of the discretized functions • Continuous basis functions defined as polynomials • Basis function decomposition along continuous and discrete factors • Closed-form representation of the objective function • Mixture of betas transition model for continuous factors • Decomposition of the constraints along continuous and discrete functions and closed-form representation Conclusions • HALP provides effective formulation for solving hybrid MDPs • Including bounds on the quality of the solution • Factored hybrid MDPs allow for closed-form representation of HALP constraints • Number of constraints remains infinite • Exploit factorization for efficient discretization, -HALP • Provide bounds on the effect of discretization • Lipschitz constant grows linearly in number of variables • Using factored LP decomposition to solve -HALP • For fixed tree-width, running time is polynomial in number of variables and in discretization level 1/ (xG, aG) is the closest e-grid point to the state-action pair (x, a) Worst-case Lipschitz constant over functions wiFi(x, a) and Rj(x, a) • Value function of an optimal policy satisfies the Bellman-Hamilton-Jacobi fixed point equation: Number of factors Summary of Factored -HALP Algorithm • Discretize continuous variables using a regular e spaced-grid • Formulate a linear program with constraints restricted only to grid points • Solve the LP using an ALP algorithm for factored discrete MDPs Approximate solutions Value function V(x) difficult to compute and represent Closed-form solution of the value function may not exist due to the recursive integral definition