1 / 51

Introduction to Dynamic Programming in Decision Processes

This presentation covers an overview of Dynamic Programming (DP) in Sequential Decision Processes, explaining the benefits, difficulties, and approaches to overcoming challenges. Learn about stages, states, decisions, rewards, transformations, policies, and functional equations in DP modeling. Discover the Principle of Optimality, separation properties, and why DP is ideal for dynamic processes. Explore the art vs. science aspect of modeling and decision-making. Illustrated with an equipment replacement problem in the context of DP analysis.

francesa
Download Presentation

Introduction to Dynamic Programming in Decision Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Dynamic ProgrammingCOR@L Seminar Series Joe Hartman ISE October 14, 2004

  2. Goals of this Talk • Overview of Dynamic Programming • Benefits of DP • Difficulties of DP • Art vs. Science • Curse of Dimensionality • Overcoming Difficulties • Approximation Methods

  3. Dynamic Programming • Introduced by Richard Bellman in the 1950s • DP has many applications, but is best known for solving Sequential Decision Processes • Equipment Replacement was one of the first applications.

  4. Sequential Decision Processes At each stage in a process, a decision is made given the state of the system. Based on the decision and state, a reward or cost in incurred and the system transforms to another state where the process is repeated at the next stage. Goal is to find the optimal policy, which is the best decision for each state of the system

  5. Stages • Stages define when decisions are to be made. • These are defined such that decisions can be ordered. • Stages are generally discrete and numbered accordingly (1,2,3,…), however they may be continuous if decisions are made at arbitrary times

  6. States • A state is a description of one of the variables that describe the condition (state) of the system under study • State space defined by all possible states which the system can achieve • States may be single variables, vectors, or matrices • States may be discrete or continuous, although usually made discrete for analysis

  7. Decisions • For each given state, there is a set of possible decisions that can be made • Decisions are defined ONLY by the current state of the system at a given stage • A decision or decision variable is one of the choices available from the decision set defined by the state of the system

  8. Rewards and/or Costs • Generally, a reward or cost is incurred when a decision is made for a given state in a given stage • This reward is only based on the current state of the system and the decision

  9. Transformation • Once a decision has been made, the system transforms from an initial state to its final state according to a transformation function • The transformation function and decision define how states change from stage to stage • These transformations may be deterministic (known) or stochastic (random)

  10. Policies • A decision is made at each stage in the process • As a number of stages are evaluated, the decisions for each state in each stage comprise a policy • The set of all policies is the policy space

  11. Returns • A return function is defined for a given state and policy. • The return is what is obtained if the process starts at a given state and decisions associated with the policy are used at each state which the process progresses through. • The optimal policy achieves the optimal return (depends on min or max)

  12. Functional Equation • These terms are all defined in the functional equation, which is used to evaluate different policies (sets of decisions) } } Transformation Function Stage State Reward Decision Discount Factor Decision Set

  13. Functional Equation • May be stochastic in that the resulting state is probabilistic. Note the recursion is backwards here. } S represents set of possible outcomes with probability p for each outcome

  14. Principle of Optimality • Key (and intuitive) to Dynamic Programming: If we are in a given state, a necessary condition for optimality is that the remaining decisions must be chosen optimally with respect to that state.

  15. Principle of Optimality Requires: • Separability of the objective function • Allows for process to be analyzed in stages • State separation property • Decisions for a given stage are only dependent on the current state of the system (not the past) • Markov property

  16. Why Use DP? • Extremely general in its ability to model systems • Can tackle various “difficult” issues in optimization (i.e. non-linearity, integrality, infinite horizons) • Ideal for “dynamic” processes

  17. Why NOT Use DP? • Curse of dimensionality: each dimension in the state space generally leads to an explosion of possible states = exponential run times • There is no “software package” for solution • Modeling is often an art… not science

  18. Art vs. Science • Many means to an end…. Let’s look at an equipment replacement problem.

  19. Replacement Analysis • Let’s put this all in the context of replacement analysis. • Stage: Periods when keep/replace decisions are to be made. Generally years or quarters. • State: Information to describe the system. For simplest problem, all costs are defined by the age of the asset. Thus, age is the state variable • Decisions: Keep or replace the asset at each stage.

  20. Replacement Analysis • Reward and/or Costs: • Keep Decision: pay utilization cost • Replace Decision: receive salvage value, pay purchase and utilization cost • Transformation: • Keep Decision: asset ages one period from stage to stage • Replace Decision: asset is new upon purchase, so it is one period old at end of stage • Goal: Min costs or max returns over horizon

  21. Replacement Analysis • Let’s start easy, assume stationary costs. • Assume the following notation: • Age of asset: i • Purchase Cost: P • Utilization Cost: C(i) • Salvage Value: S(i) • Assume S and P occur at beginning of period and C occurs at end of period.

  22. Example • Many solutions approaches to problem -- even with DP! • Map out decision possibilities and analyze by solving recursion backwards. • Define the initial state and solve forwards (with reaching)

  23. Decision Map i+3 K i+2 K R i+1 3 K K i 2 R K R 1 2 R K 1 R 1 R 0 1 2 3 T

  24. Example Decision Map 8 5 7 K 6 4 4 K R 3 5 K K 4 2 3 3 R K R 1 2 R K 1 2 2 R 1 R 1 1 0 1 2 3 4 5 T

  25. Functional Equation • Write functional equation: • Write a boundary condition for the final period (where we sell the asset): • Traditional approach: solve backwards.

  26. Functional Equation • Or the problem can be solved forwards, or with reaching. • Functional equation does not change: • Write a boundary condition for the initial period: • Benefit: don’t have to build network first.

  27. Art vs. Science • However, there are more approaches…

  28. Replacement Analysis II • A new approach which mimics that of lot-sizing: • Stage: Decision Period. • State: Decision Period. • Decisions: Number of periods to retain an asset.

  29. 1 2 3 4 5 Example Decision Map K4 K3 K2 K2 K1 K1 K1 K1 K2 K3

  30. Functional Equation • Can be solved forwards or backwards. • Write a boundary condition for the final period:

  31. Replacement Analysis III • A new approach which mimics that of solving integer knapsack problems: • Stage: One for each possible age of asset. • State: Number of years of accumulated service. • Decisions: Number of times an asset is utilized for a given length of time over the horizon. • Note: this is only valid for stationary costs.

  32. i+T/j i+2j T/i 3i 2i i 0 0 i+j i Example Decision Map

  33. Functional Equation • Can be solved forwards or backwards. • Where: • Write a boundary condition for the first period:

  34. Art vs. Science • Age as the state space: • Conceptually simple, easy to explain. • Period as the state space: • Computationally efficient • Can be generalized to non-stationary costs, multiple challengers easily • Length of service as the state space: • Easy to bound problem • Relates to infinite horizon solutions

  35. Curse of Dimensionality • To given an idea of state space explosion, consider a fleet management problem: • Assign trucks to loads • Loads must move from one destination to another within some given time frame • The arrivals of loads are probabilistic • State space: number of trucks (given type) at each location in time.

  36. Approximation Methods • These can generally be categorized as follows: • Reduction in granularity • Interpolation • Policy Approximation • Bounding/Fathoming • Cost to Go Function Approximations • Unfortunately, art wins over science here too. Requires intimate knowledge of problem.

  37. Decision Network 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 0 1 2 3 4 5 T

  38. Adjusting Granularity • Simply remove the number of possible states. Instead of evaluating 1,2,3,…,10, evaluate 1,5,10. • Advocate: Bellman 5 5 5 5 5 5 3 3 3 3 3 3 1 1 1 1 1 1

  39. 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Granularity continued… • Solve continuously finer granularity problems based on previous solution • Advocates: Bean and Smith (Michigan), Bailey (Pittsburgh)

  40. 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford)

  41. 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford) Solve Exactly.

  42. 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford) Solve Exactly. Interpolate.

  43. 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Interpolations over the entire state space often called spline methods. Neural networks also used. • Advocates: Johnson (WPI), Bertsekas (MIT) Solve Exactly. Interpolate.

  44. Policy Approximation • Reduce the number of possible decisions to evaluate • This merely reduces the number of arcs in the network • Advocates: Bellman

  45. Fathoming Paths • Like branch and bound: use an upper bound (to a minimization problem) to eliminate inferior decisions (paths) • Note: typical DP must be solved completely in order to find an upper bound to a problem • Most easily implemented in “forward” solution problems (not always possible) • Advocate: Martsen

  46. Approximating Cost to Go Functions • This is the hot topic in approximation methods • Highly problem specific • Idea: • Solving a DP determines the “cost-to-go” value for each state in the system -- value or cost to move from that state in a given stage to the final state in the final stage. • If I know this function a priori (or can approximate), then I don’t need to solve the entire DP

  47. Example: Fleet Management For a given location…. Value Number of Trucks If I know this function for each location, then this problem is solved…

  48. How Approximate? • Helps to know what the function looks like (can find by plotting small instances) • Powell (Princeton): Simulate demand and solve the deterministic problem (as a network flow problem) • Repeat and take average of values of each state to approximate functions • Use dual variables from network solutions to build cost-to-go functions

  49. How Approximate? • Bertsimas (MIT) proposes the use of heuristics to approximate the value function • Specifically, when solving a multidimensional knapsack problem, the value function is approximated by adaptively rounding LP relaxations to the problem.

  50. Implementing Approximations • Can use to approximate the final period values and then solve “full” DP from there • Can use approximations for each state and just “read” solution from table (always approximating and updating approximations)

More Related