ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains

ECE-517: Reinforcement Learning in Artificial IntelligenceLecture 4: Discrete-Time Markov Chains September 1, 2010 Dr. Itamar Arel College of Engineering Department of Electrical Engineering & Computer Science The University of Tennessee Fall 2010

p 1-p 1-q 0 1 q Simple DTMCs e • “States” can be labeled (0,)1,2,3,… • At every time slot a “jump” decision is made randomly based on the current state 1 0 a c b f d 2 (Sometimes the arrow pointing back to the same state is omitted)

1-D Random Walk p 1-p X(t) Time is slotted The walker flips a coin every time slot to decide which way to go

Single Server Queue • Consider a queue at a supermarket • In every time slot: • A customer arrives with probability p • The HoL customer leaves with probability q • We’d like to learn about the behavior of such a system Geom(q) Bernoulli(p)

3 0 2 1 Birth-Death Chain • Our queue can be modeled by a Birth-Death Chain (a.k.a. Geom/Geom/1 queue) • Want to know: • Queue size distribution • Average waiting time, etc. • Markov Property • The “Future” is independent of the “Past” given the “Present” • In other words: Memoryless • We’ve mentioned memoryless distributions: Exponential and Geometric • Useful for modeling and analyzing real systems

Discrete Time Random Process (DTRP) • Random Process: An indexed family of random variables • Let Xn be a DTRP consisting of a sequence of independent, identically distributed (i.i.d.) random variables with common cdfFX(x). This sequence is called the i.i.d. random process. • Example: Sequence of Bernoulli trials (flip of coin) • In networking: traffic may obey a Bernoulli i.i.d. arrival Pattern • In reality, some degree of dependency/correlation exists between consecutive elements in a DTRP • Example: Correlated packet arrivals (video/audio stream)

Discrete Time Markov Chains • A sequence of random variables {Xn} is called a Markov Chain if it has the Markov property: • States are usually labeled {(0,)1,2,…} • State space can be finite or infinite • Transition Probability Matrix • Probability of transitioning from state i to state j • We will assume the MC is homogeneous/stationary: independent of time • Transition probability matrix: P = {pij} • Two state MC:

Stationary Distribution • Define then pk+1 = pkP (p is a row vector) • Stationary distribution: if the limit exists • If p exists, we can solve it by • These are called balance equations • Transitions in and out of state i are balanced

General Comment & Conditions for p to Exist (I) • If we partition all the states into two sets, then transitions between the two sets must be “balanced”. • This can be easily derived from the balance equations • Definitions: • State j is reachable by state iif • State iand jcommute if they are reachable by each other • The Markov chain is irreducible if all states commute

Conditions for p to Exist (I) (cont’d) • Condition: The Markov chain is irreducible • Counter-examples: • Aperiodic Markov chain … • Counter-example: p=1 1 3 2 4 2 1 3 0 1 0 1 1 1 0 0 2

Conditions for p to Exist (II) • For the Markov chain to be recurrent… • All states imust be recurrent, i.e. • Otherwise transient • With regards to a recurrent MC … • State i is recurrent if E(Ti)<1, where Tiis time between visits to state i • Otherwise the state is considered null-recurrent

Solving for p: Example for two-state Markov Chain p 1-p 1-q 0 1 q p

Birth-Death Chain • Arrival w.p. p; departure w.p. q • Let u = p(1-q), d = q(1-p),r= u/d • Balance equations: 1-u-d 1-u-d 1-u-d u u u u 1-u 0 2 1 3 d d d d

Birth-Death Chain (cont’d) Continuing this pattern, we observe that: p(i-1)u = p(i)d Equivalently, we can draw a bi-section between statei and state i-1 Therefore, we have where r = u/d. What we are interested in is the stationary distribution of the states, so …

Birth-Death Chain (cont’d)

ECE-517: Reinforcement Learning in Artificial Intelligence Lecture 4: Discrete-Time Markov Chains