Markov Decision Processes AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

Markov Decision ProcessesAIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

From utility to optimal policy • The utility function U(s) allows the agent to select the action that maximizes the expected utility of the subsequent state:

The Bellman equation • Now, if the utility of a state is the expected sum of discounted rewards from that point onwards, then there is a direct relationship between the utility of a state and the utility of its neighbors: The utility of a state is the immediate reward for that state plus the expected discounted utility of the next state, assuming that the agent chooses the optimal action the Bellman equation

The Bellman equation

The value iteration algorithm • For problem with n states, there are n Bellman equations, and n unknowns, however NOT linear • Start with random U(s), update iteratively • Guaranteed to converge to the unique solution Demo: http://people.cs.ubc.ca/~poole/demos/mdp/vi.html

Policy iteration algorithm • It is possible to get an optimal policy even when the utility function estimate is inaccurate • If one action is clearly better than all others, then the exact magnitude of the utilities on the states involved need not be precise Compute utilities of states Compute utilities of states for a given policy Compute policy for the given state utilities Compute optimal policy Policy iteration Value iteration

Policy iteration algorithm Linear equation

Policy evaluation • n linear equations, n unknowns for problem with n states, solved in n cubic time, can also use iterative scheme

Summary • Markov decision processes • Utility of state sequence • Utility of states • Value iteration algorithm • Policy iteration algorithm

Markov Decision Processes AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

Markov Decision Processes AIMA: 17.1, 17.2 (excluding 17.2.3), 17.3

Presentation Transcript

decision making (modern) powerpoint presentation content: 16

eLearning Decision Making eLearning sites on: Multiple Criteria Decision Analysis Decision Making Under Uncertainty Nego

Markov Random Fields

Chapter 16 Consumer Decision Making and Beyond

Introduction to Medical Decision Making and Decision Analysis

Business Driven Information Systems 2e

Markov Decision Processes: A Survey

Stock Returns Predictability using Markov Regime Switching Model

Markov Logic

PRODUCTION PROCESSES AND EQUIPMENT

Planning under Uncertainty with Markov Decision Processes: Lecture II

Weakly Coupled Stochastic Decision Systems

Partially Observable Markov Decision Processes

Reinforcement Learning

OOS, OOE, OOT, OOL and correct decision making

DECISION MAKING

WELDING PROCESSES

Simulation Algorithms for Lattice QCD

A Contribution to Reinforcement Learning; Application to Computer Go

Programming with Decision Procedures

Learning Optimal Strategies for Spoken Dialogue Systems