1 / 19

Machine Learning Chapter 13. Reinforcement Learning

Machine Learning Chapter 13. Reinforcement Learning. Tom M. Mitchell. Control Learning. Consider learning to choose actions, e.g., Robot learning to dock on battery charger Learning to choose actions to optimize factory output Learning to play Backgammon

smithamber
Download Presentation

Machine Learning Chapter 13. Reinforcement Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningChapter 13. Reinforcement Learning Tom M. Mitchell

  2. Control Learning Consider learning to choose actions, e.g., • Robot learning to dock on battery charger • Learning to choose actions to optimize factory output • Learning to play Backgammon Note several problem characteristics: • Delayed reward • Opportunity for active exploration • Possibility that state only partially observable • Possible need to learn multiple tasks with same sensors/effectors

  3. One Example: TD-Gammon Learn to play Backgammon Immediate reward • +100 if win • -100 if lose • 0 for all other states Trained by playing 1.5 million games against itself Now approximately equal to best human player

  4. Reinforcement Learning Problem

  5. Markov Decision Processes Assume • finite set of states S • set of actions A • at each discrete time agent observes state st S and chooses action atA • then receives immediate reward rt • and state changes to st+1 • Markov assumption : st+1 = (st, at ) and rt = r(st, at ) • i.e., rt and st+1 depend only on current state and action • functions  and r may be nondeterministic • functions  and r not necessarily known to agent

  6. Agent's Learning Task

  7. Value Function

  8. What to Learn

  9. Q Function

  10. Training Rule to Learn Q

  11. Q Learning for Deterministic Worlds

  12. Nondeterministic Case

  13. Nondeterministic Case(Cont’)

  14. Temporal Difference Learning

  15. Temporal Difference Learning(Cont’)

  16. Subtleties and Ongoing Research

More Related