1 / 7

Some Final Thoughts

Some Final Thoughts. Abhijit Gosavi. From MDPs to SMDPs. The Semi- MDP is a more general model in which the time for transition is also a random variable. The MDP Bellman equations can be extended to SMDPs to accommodate time. SMDPs (contd.).

geri
Download Presentation

Some Final Thoughts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Some Final Thoughts Abhijit Gosavi

  2. From MDPs to SMDPs • The Semi-MDP is a more general model in which the time for transition is also a random variable. • The MDP Bellman equations can be extended to SMDPs to accommodate time.

  3. SMDPs (contd.) • In the average reward case, we would be interested in maximizing the average reward per unit time. • For the discounted reward case, we will need to discount proportionate to the time spent in each transition. • The Q-Learning algorithm for discounted reward has a direct extension. • For average reward, we have a family of algorithms called R-SMART (see book for references).

  4. Policy Iteration • Another method to solve the MDP: an alternative to SMDPs • Slightly more involved mathematically • Sometimes more efficient than value iteration • Its Reinforcement Learning counterpart is called Approximate Policy Iteration

  5. Other Applications • Supply Chain Problems • Disaster Response Management • Production Planning in Remanufacturing Systems • Continuous event systems (LQG control)

  6. What you’ve learned (hopefully ) • Markov chains and how they can be employed to model systems • Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains • Some concepts from Artificial Intelligence • Some (hopefully) cool applications of Reinforcement Learning • Some coding (for those who were not averse to doing it) • Systems thinking • Coding iterative algorithms • Some discrete-event simulation • HOPE YOU’VE ENJOYED THE CLASS!

  7. HAPPY HOLIDAYS!!!

More Related