520 likes | 646 Views
An Overview of Dynamic Programming COR@L Seminar Series. Joe Hartman ISE October 14, 2004. Goals of this Talk. Overview of Dynamic Programming Benefits of DP Difficulties of DP Art vs. Science Curse of Dimensionality Overcoming Difficulties Approximation Methods. Dynamic Programming.
E N D
An Overview of Dynamic ProgrammingCOR@L Seminar Series Joe Hartman ISE October 14, 2004
Goals of this Talk • Overview of Dynamic Programming • Benefits of DP • Difficulties of DP • Art vs. Science • Curse of Dimensionality • Overcoming Difficulties • Approximation Methods
Dynamic Programming • Introduced by Richard Bellman in the 1950s • DP has many applications, but is best known for solving Sequential Decision Processes • Equipment Replacement was one of the first applications.
Sequential Decision Processes At each stage in a process, a decision is made given the state of the system. Based on the decision and state, a reward or cost in incurred and the system transforms to another state where the process is repeated at the next stage. Goal is to find the optimal policy, which is the best decision for each state of the system
Stages • Stages define when decisions are to be made. • These are defined such that decisions can be ordered. • Stages are generally discrete and numbered accordingly (1,2,3,…), however they may be continuous if decisions are made at arbitrary times
States • A state is a description of one of the variables that describe the condition (state) of the system under study • State space defined by all possible states which the system can achieve • States may be single variables, vectors, or matrices • States may be discrete or continuous, although usually made discrete for analysis
Decisions • For each given state, there is a set of possible decisions that can be made • Decisions are defined ONLY by the current state of the system at a given stage • A decision or decision variable is one of the choices available from the decision set defined by the state of the system
Rewards and/or Costs • Generally, a reward or cost is incurred when a decision is made for a given state in a given stage • This reward is only based on the current state of the system and the decision
Transformation • Once a decision has been made, the system transforms from an initial state to its final state according to a transformation function • The transformation function and decision define how states change from stage to stage • These transformations may be deterministic (known) or stochastic (random)
Policies • A decision is made at each stage in the process • As a number of stages are evaluated, the decisions for each state in each stage comprise a policy • The set of all policies is the policy space
Returns • A return function is defined for a given state and policy. • The return is what is obtained if the process starts at a given state and decisions associated with the policy are used at each state which the process progresses through. • The optimal policy achieves the optimal return (depends on min or max)
Functional Equation • These terms are all defined in the functional equation, which is used to evaluate different policies (sets of decisions) } } Transformation Function Stage State Reward Decision Discount Factor Decision Set
Functional Equation • May be stochastic in that the resulting state is probabilistic. Note the recursion is backwards here. } S represents set of possible outcomes with probability p for each outcome
Principle of Optimality • Key (and intuitive) to Dynamic Programming: If we are in a given state, a necessary condition for optimality is that the remaining decisions must be chosen optimally with respect to that state.
Principle of Optimality Requires: • Separability of the objective function • Allows for process to be analyzed in stages • State separation property • Decisions for a given stage are only dependent on the current state of the system (not the past) • Markov property
Why Use DP? • Extremely general in its ability to model systems • Can tackle various “difficult” issues in optimization (i.e. non-linearity, integrality, infinite horizons) • Ideal for “dynamic” processes
Why NOT Use DP? • Curse of dimensionality: each dimension in the state space generally leads to an explosion of possible states = exponential run times • There is no “software package” for solution • Modeling is often an art… not science
Art vs. Science • Many means to an end…. Let’s look at an equipment replacement problem.
Replacement Analysis • Let’s put this all in the context of replacement analysis. • Stage: Periods when keep/replace decisions are to be made. Generally years or quarters. • State: Information to describe the system. For simplest problem, all costs are defined by the age of the asset. Thus, age is the state variable • Decisions: Keep or replace the asset at each stage.
Replacement Analysis • Reward and/or Costs: • Keep Decision: pay utilization cost • Replace Decision: receive salvage value, pay purchase and utilization cost • Transformation: • Keep Decision: asset ages one period from stage to stage • Replace Decision: asset is new upon purchase, so it is one period old at end of stage • Goal: Min costs or max returns over horizon
Replacement Analysis • Let’s start easy, assume stationary costs. • Assume the following notation: • Age of asset: i • Purchase Cost: P • Utilization Cost: C(i) • Salvage Value: S(i) • Assume S and P occur at beginning of period and C occurs at end of period.
Example • Many solutions approaches to problem -- even with DP! • Map out decision possibilities and analyze by solving recursion backwards. • Define the initial state and solve forwards (with reaching)
Decision Map i+3 K i+2 K R i+1 3 K K i 2 R K R 1 2 R K 1 R 1 R 0 1 2 3 T
Example Decision Map 8 5 7 K 6 4 4 K R 3 5 K K 4 2 3 3 R K R 1 2 R K 1 2 2 R 1 R 1 1 0 1 2 3 4 5 T
Functional Equation • Write functional equation: • Write a boundary condition for the final period (where we sell the asset): • Traditional approach: solve backwards.
Functional Equation • Or the problem can be solved forwards, or with reaching. • Functional equation does not change: • Write a boundary condition for the initial period: • Benefit: don’t have to build network first.
Art vs. Science • However, there are more approaches…
Replacement Analysis II • A new approach which mimics that of lot-sizing: • Stage: Decision Period. • State: Decision Period. • Decisions: Number of periods to retain an asset.
1 2 3 4 5 Example Decision Map K4 K3 K2 K2 K1 K1 K1 K1 K2 K3
Functional Equation • Can be solved forwards or backwards. • Write a boundary condition for the final period:
Replacement Analysis III • A new approach which mimics that of solving integer knapsack problems: • Stage: One for each possible age of asset. • State: Number of years of accumulated service. • Decisions: Number of times an asset is utilized for a given length of time over the horizon. • Note: this is only valid for stationary costs.
i+T/j i+2j T/i 3i 2i i 0 0 i+j i Example Decision Map
Functional Equation • Can be solved forwards or backwards. • Where: • Write a boundary condition for the first period:
Art vs. Science • Age as the state space: • Conceptually simple, easy to explain. • Period as the state space: • Computationally efficient • Can be generalized to non-stationary costs, multiple challengers easily • Length of service as the state space: • Easy to bound problem • Relates to infinite horizon solutions
Curse of Dimensionality • To given an idea of state space explosion, consider a fleet management problem: • Assign trucks to loads • Loads must move from one destination to another within some given time frame • The arrivals of loads are probabilistic • State space: number of trucks (given type) at each location in time.
Approximation Methods • These can generally be categorized as follows: • Reduction in granularity • Interpolation • Policy Approximation • Bounding/Fathoming • Cost to Go Function Approximations • Unfortunately, art wins over science here too. Requires intimate knowledge of problem.
Decision Network 5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 0 1 2 3 4 5 T
Adjusting Granularity • Simply remove the number of possible states. Instead of evaluating 1,2,3,…,10, evaluate 1,5,10. • Advocate: Bellman 5 5 5 5 5 5 3 3 3 3 3 3 1 1 1 1 1 1
5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Granularity continued… • Solve continuously finer granularity problems based on previous solution • Advocates: Bean and Smith (Michigan), Bailey (Pittsburgh)
5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford)
5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford) Solve Exactly.
5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Solve for some of the states exactly and then interpolate solutions for “skipped” states • Advocates: Kitanidis (Stanford) Solve Exactly. Interpolate.
5 5 5 5 5 5 4 4 4 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 1 1 1 1 1 1 Interpolation • Interpolations over the entire state space often called spline methods. Neural networks also used. • Advocates: Johnson (WPI), Bertsekas (MIT) Solve Exactly. Interpolate.
Policy Approximation • Reduce the number of possible decisions to evaluate • This merely reduces the number of arcs in the network • Advocates: Bellman
Fathoming Paths • Like branch and bound: use an upper bound (to a minimization problem) to eliminate inferior decisions (paths) • Note: typical DP must be solved completely in order to find an upper bound to a problem • Most easily implemented in “forward” solution problems (not always possible) • Advocate: Martsen
Approximating Cost to Go Functions • This is the hot topic in approximation methods • Highly problem specific • Idea: • Solving a DP determines the “cost-to-go” value for each state in the system -- value or cost to move from that state in a given stage to the final state in the final stage. • If I know this function a priori (or can approximate), then I don’t need to solve the entire DP
Example: Fleet Management For a given location…. Value Number of Trucks If I know this function for each location, then this problem is solved…
How Approximate? • Helps to know what the function looks like (can find by plotting small instances) • Powell (Princeton): Simulate demand and solve the deterministic problem (as a network flow problem) • Repeat and take average of values of each state to approximate functions • Use dual variables from network solutions to build cost-to-go functions
How Approximate? • Bertsimas (MIT) proposes the use of heuristics to approximate the value function • Specifically, when solving a multidimensional knapsack problem, the value function is approximated by adaptively rounding LP relaxations to the problem.
Implementing Approximations • Can use to approximate the final period values and then solve “full” DP from there • Can use approximations for each state and just “read” solution from table (always approximating and updating approximations)