1 / 20

Tópicos Especiais em Aprendizagem

Tópicos Especiais em Aprendizagem. Prof. Reinaldo Bianchi Centro Universitário da FEI 2012. Objetivo desta Aula. Esta é uma aula mais informativa. Aprendizado por Reforço: Planning and Learning . Relational Reinforcement Learning Uso de Heurísticas para acelerar o AR .

marged
Download Presentation

Tópicos Especiais em Aprendizagem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tópicos Especiais em Aprendizagem Prof. Reinaldo Bianchi Centro Universitário da FEI 2012

  2. Objetivo desta Aula Esta é uma aula mais informativa • Aprendizado por Reforço: • Planning and Learning. • RelationalReinforcement Learning • Uso de Heurísticas para acelerar o AR. • DimensionsofRL - Conclusões. • Aula de hoje: • Capítulos 9 e 10 do Sutton & Barto. • Artigo Relational RL, MLJ, de 2001 • Tese de Doutorado Bianchi.

  3. Dimensions of Reinforcement Learning Capítulo 10 do Sutton e Barto

  4. Objetivos • Review the treatment of RL taken in this course • What have left out? • What are the hot research areas?

  5. Three Common Ideas • Estimation of value functions • Backing up values along real or simulated trajectories • Generalized Policy Iteration: maintain an approximate optimal value function and approximate optimal policy, use each to improve the other

  6. Backup Dimensions

  7. Other Dimensions • Function approximation • tables • aggregation • other linear methods • many nonlinear methods • On-policy/Off-policy • On-policy: learn the value function of the policy being followed • Off-policy: try learn the value function for the best policy, irrespective of what policy is being followed

  8. Still More Dimensions • Definition of return: episodic, continuing, discounted, etc. • Action values vs. state values vs. afterstate values • Action selection/exploration: e-greed, softmax, more sophisticated methods • Synchronous vs. asynchronous • Replacing vs. accumulating traces

  9. Still More Dimensions • Real vs. simulated experience • Location of backups (search control) • Timing of backups: part of selecting actions or only afterward? • Memory for backups: how long should backed up values be retained?

  10. Frontier Dimensions • Prove convergence for bootstrapping control methods. • Trajectory sampling • Non-Markov case: • Partially Observable MDPs (POMDPs) • Bayesian approach: belief states • construct state from sequence of observations • Try to do the best you can with non-Markov states • Modularity and hierarchies • Learning and planning at several different levels • Theory of options

  11. More Frontier Dimensions • Using more structure • factored state spaces: dynamic Bayes nets • factored action spaces

  12. Still More Frontier Dimensions • Incorporating prior knowledge • advice and hints • trainers and teachers • shaping • Lyapunov functions • etc.

  13. Conclusões do Sutton

  14. Getting the degree of abstraction right • Actions can be low level (e.g. voltages to motors), high level (e.g., accept a job offer), "mental" (e.g., shift in focus of attention) • Situations: direct sensor readings, or symbolic descriptions, or "mental" situations (e.g., the situation of being lost) • An RL agent is not like a whole animal or robot, which consist of many RL agents as well as other components

  15. Getting the degree of abstraction right • The environment is not necessarily unknown to the RL agent, only incompletely controllable • Reward computation is in the RL agent's environment because the RL agent cannot change it arbitrarily

  16. A Unified View • There is a space of methods, with tradeoffs and mixtures • rather than discrete choices • Value function are key • all the methods are about learning, computing, or estimating values • all the methods store a value function • All methods are based on backups • different methods just have backups of different • shapes and sizes, • sources and mixtures

  17. Some of the Open Problems • Incomplete state information • Exploration • Structured states and actions • Incorporating prior knowledge • Using teachers • Theory of RL with function approximators • Modular and hierarchical architectures • Integration with other problem–solving and planning methods

  18. Lessons Learned • 1. Approximate the Solution, Not the Problem • 2. The Power of Learning from Experience • 3. The Central Role of Value Functionsin finding optimal sequential behavior • 4. Learning and Planning can be Radically Similar • 5. Accretive Computation "Solves Dilemmas" • 6. A General Approach need not be Flat, Low-level

  19. Final Summary • RL is about a problem: • goal-directed, interactive, real-time decision making • all hail the problem! • Value functions seem to be the key to solving it,and TD methods the key new algorithmic idea. • There are a host of immediate applications • large-scale stochastic optimal control and scheduling • There are also a host of scientific problems

  20. Fim!

More Related