140 likes | 259 Views
Math 419/519 Prof. Andrew Ross. Markov Decision Processes. Highway Pavement Maintenance. Thanks to Pablo Durango-Cohen for this example. Though I have made up the numbers. Classify highway pavement condition as: Good Fair Poor Can do 4 kinds of repairs: Expensive Moderate Cheap
E N D
Math 419/519 Prof. Andrew Ross Markov Decision Processes
Highway Pavement Maintenance • Thanks to Pablo Durango-Cohen for this example. • Though I have made up the numbers. • Classify highway pavement condition as: • Good • Fair • Poor • Can do 4 kinds of repairs: • Expensive • Moderate • Cheap • Nothing
Timeline • April: check condition of road, decide on action. • Summer: repair road as decided • Fall/Winter: road might deteriorate • April: check condition, etc.
Markov Assumptions • How we got to current condition does not matter. • Future deterioration depends only on the present condition and action. • When choosing an action, we will only look at the present condition, not the past • This is a policy decision, not a statement about road physics. We could change this policy, but it would make the problem bigger.
If we do Nothing • Road deteriorates according to this transition matrix: • Do the zeros make sense? • Does it make sense that the probabilities decrease from right to left?
If we do Cheap repairs • Road improves/deteriorates according to this transition matrix:
If we do Moderate repairs • Road improves/deteriorates according to this transition matrix:
If we do Expensive repairs • Road improves/deteriorates according to this transition matrix:
Repair Policy • Natural to say: “If it's in Good condition, do Nothing. If it's in Fair condition, do ___. If it's in Poor condition, do ___.” • Rather than if/then, let's make a Policy Matrix:
Mixed Policies? • Maybe we can't afford to do Expensive repairs each time the road becomes Poor—only 30% of the time? Etc.
“The” transition matrix? • Changes when you change your policy matrix. • Pr(Good next | Fair now) = Pr(Good next | Fair now, do Nothing)*Pr(Nothing|Fair) + Pr(Good next | Fair now, do Cheap)*Pr(Cheap|Fair) + Pr(Good next | Fair now, do Moderate)*Pr(Moderate|Fair) + Pr(Good next | Fair now, do Expensive)*Pr(Expensive|Fair) • And that's just one of 9 entries in the 3x3 matrix!
Overall Cost • Given a policy matrix, find the transition matrix • Then find the steady-state distribution • Then find how often we do each action • Then account for the cost of each action • Then change the policy matrix a little, try to find a cheaper overall cost. • See the book for the math notation.
Other Thoughts • Can find optimal policy through: • “Policy Iteration” • “Value Iteration” • Related to Dynamic Programming
References • Wayne Winston: “Introduction to Operations Research” book • Ronald A. Howard: “Comments on the Origin and Application of Markov Decision Processes” article in journal “Operations Research”, Vol 50 issue 1.