180 likes | 285 Views
OR II GSLM 52800. Discounted Problem. the value of $1 in period n +1 is only $, 0 < < 1, of period n. solvable, M +1 equations, M +1 unknowns. Evaluating the Expected Value of a Fixed Policy. = 0.9
E N D
Discounted Problem • the value of $1 in period n+1 is only$, 0 < < 1, of period n solvable, M+1 equations, M+1 unknowns 3
Evaluating the Expected Value of a Fixed Policy • = 0.9 • the optimal policy for long-term average cost: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3 4
Policy Improvement • the improvement over a given policy • similar procedure to MDP for long-term average cost 5
Policy Improvement • 1 Value Determination: Fix policy R. Solve • 2 Policy Improvement: For each state i, find action k as argument minimum of • 3 Form a new policy from actions in 2. Stop if this policy is the same as R; else go to 1 6
Policy Improvement • can be proven • vi(Rn+1) vi(Rn), for all i, n • the algorithm stops in finite number of iterations 7
Example • Iteration 1: • Policy Improvement • nothing can be done at state 0 and machine must be replaced at state 3 • possible decisions at • state 1: decision 1 (do nothing, $1000) decision 3 (replace, $6000) • state 2: decision 1 (do nothing, $3000) decision 2 (overhaul, $4000) decision 3 (replace, $6000) 8
Example • Iteration 1: • Policy Improvement minimum 9
Example • policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3 • no change in policy, i.e., optimum 10
Linear Programming Approach • yik = discounted expected time being in state i and adopting decision k • j = initial probability at state j • expected total discounted cost depends on {j}, though the minimum policy does not 11
Linear Programming Approach • choose jsuch that • solve 12
Linear Programming Approach • take j = 1/4 13
Successive Approximation • the policy is defined by the argument minimum of the recursive equations • stop when the policy converges 14
Successive Approximation • Iteration 1 15
Successive Approximation • Iteration 2 policy: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3; no change optimal 16
Successive Approximation • Iteration 3 policy converged: do nothing at states 0 and 1, overhaul at state 2, and replace at state 3 17