Some Final Thoughts

Some Final Thoughts Abhijit Gosavi

From MDPs to SMDPs • The Semi-MDP is a more general model in which the time for transition is also a random variable. • The MDP Bellman equations can be extended to SMDPs to accommodate time.

SMDPs (contd.) • In the average reward case, we would be interested in maximizing the average reward per unit time. • For the discounted reward case, we will need to discount proportionate to the time spent in each transition. • The Q-Learning algorithm for discounted reward has a direct extension. • For average reward, we have a family of algorithms called R-SMART (see book for references).

Policy Iteration • Another method to solve the MDP: an alternative to SMDPs • Slightly more involved mathematically • Sometimes more efficient than value iteration • Its Reinforcement Learning counterpart is called Approximate Policy Iteration

Other Applications • Supply Chain Problems • Disaster Response Management • Production Planning in Remanufacturing Systems • Continuous event systems (LQG control)

What you’ve learned (hopefully ) • Markov chains and how they can be employed to model systems • Markov decision processes: the idea of optimizing systems (controls) driven by Markov chains • Some concepts from Artificial Intelligence • Some (hopefully) cool applications of Reinforcement Learning • Some coding (for those who were not averse to doing it) • Systems thinking • Coding iterative algorithms • Some discrete-event simulation • HOPE YOU’VE ENJOYED THE CLASS!

HAPPY HOLIDAYS!!!

Some Final Thoughts

Some Final Thoughts

Presentation Transcript

Final Thoughts

Some thoughts!!

Some final thoughts…

FINAL THOUGHTS

Final Thoughts

Final Thoughts

Some thoughts….

Final thoughts

Some introductory thoughts…..

Final Thoughts?

Final Thoughts

Final Thoughts

FINAL THOUGHTS

Final thoughts

Final Thoughts

Some Stray Thoughts

FINAL THOUGHTS

some final thoughts on other care

UDL Final Thoughts

Some Final Thoughts…

SMT – Final thoughts