70 likes | 216 Views
Some Final Thoughts. Abhijit Gosavi. From MDPs to SMDPs. The Semi- MDP is a more general model in which the time for transition is also a random variable. The MDP Bellman equations can be extended to SMDPs to accommodate time. SMDPs (contd.).
E N D
Some Final Thoughts Abhijit Gosavi
From MDPs to SMDPs • The Semi-MDP is a more general model in which the time for transition is also a random variable. • The MDP Bellman equations can be extended to SMDPs to accommodate time.
SMDPs (contd.) • In the average reward case, we would be interested in maximizing the average reward per unit time. • For the discounted reward case, we will need to discount proportionate to the time spent in each transition. • The Q-Learning algorithm for discounted reward has a direct extension. • For average reward, we have a family of algorithms called R-SMART (see book for references).
Policy Iteration • Another method to solve the MDP: an alternative to SMDPs • Slightly more involved mathematically • Sometimes more efficient than value iteration • Its Reinforcement Learning counterpart is called Approximate Policy Iteration
Other Applications • Supply Chain Problems • Disaster Response Management • Production Planning in Remanufacturing Systems • Continuous event systems (LQG control)
What you’ve learned (hopefully ) • Markov chains and how they can be employed to model systems • Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains • Some concepts from Artificial Intelligence • Some (hopefully) cool applications of Reinforcement Learning • Some coding (for those who were not averse to doing it) • Systems thinking • Coding iterative algorithms • Some discrete-event simulation • HOPE YOU’VE ENJOYED THE CLASS!