Stochastic Processes

Stochastic Processes Basics

Probability Spaces • A probability space comprises: a set  a structure § (a ¾ -algabra) a measure of confidence P (probability)

¾ -algebra • A ¾ -algebra § is the class of subsets for which we wish to assign probabilities (the measurable sets) • 2§ • If A 2§then AC2§ • If {Ai} are all in §, then [iAi2§

Probability • Assigns positive measure (confidence) to subsets of A µ • Let A 2§, then 0 · P(A) · 1, is called the probability of A • Let A,B 2 be disjoint, then P(A [ B) = P(A) + P(B) • Let {Ai} 2 be disjoint, then P([iAi) = iP(Ai) • P() = 1

Intersections of ¾ -algebra • There is at least 1 ¾–algebra of subsets of , namely 2 (powerset) • If there are more ¾ –algebras §1 and §2, then §1Å§2 is a ¾ -algebra (exercise) • Let ¡ be a class of ¾ – algebras, then Å(§2¡) is a ¾ -algebra (exercise)

Generating a ¾ -algebra • Let ¤ be a class of subsets of  • Let ¡ be the class of ¾ -algebras, so that ¤µ§ for all §2¡ • §(¤) = Å(§2¡) • §(¤) is called the ¾ -algebra generated by ¤ • §(¤) is the smallest ¾ -algebra including ¤ (Prove: exercise)

Example •  = N (natural numbers) = {1,2,..} • ¤ = {{1}} • §(¤)={N,;,N\{1},{1}}

More examples •  = R (real numbers) • ¤ = the open intervals of R • §(¤) is called the Borel sets. • If ¤ is a ¾ –algebra then §(¤) = ¤ (prove:exercise)

Examples (probabilities) •  = N (natural numbers) = {1,2,..} • §= 2N(all subsets) • P(n)=1/n (wrong, why: exercise) • P(n)=K/n2 (find K: exercise) • ¤ = {{1}} • §(¤)={N,;,N\{1},{1}} • P({1}) = 0.5 • Specifying P(A) for all A 2¤ is sufficient for specifying P(A) for all A in §(¤)

Random (stochastic) variables • Consider two sets  and X, each equipped with ¾ –algebras, § and F, resp. • A mapping x:  -> X is called measurable If for any A 2 F, x-1(A) 2§ (inverse sets of measurable sets are measurable) • There exists set A 2 st. x(A)  F !!! (Souslin) • A random variable is a measurable mapping. • Piecewise continuous mappings g: R->R are measurable w.r.t. the Borel sets.

Induced probability distributions • Let (,§,P) be a probability space • Let x:  -> R (Borel sets) • Then for A µ R (Borel set) define P(A)=P(x-1(A)) • Define F:R->[0,1] by F(y)=P((-1,y]) • F is called a distribution function (CDF)

Integration • Let A be measurable • Let P be a probability • Let Â(A)be its indicator function, then we define its integral by sÂ(A)dP = sA dP = P(A) • A step function (simple function) is a weighted sum of indicator functions, i.e. • g = iaiÂ (A_i) • s g dP = iaiP(Ai) • For a positive measurable mapping g, define the integral s g dP = sup all simple functions g’ · g s g’ dP • Generally for a measurable mapping g s g dP = s g+ dP - sg- dP

Intragrals Examples • Let g: [0,1]->R be continuous. • ¹([a,b])=b-a (Lebesque measure) • Then s g d¹ is the Riemann integral • Let g: [0,1]-> {0,1} be defined by: g(x)=1 for x 2 Q (rational) g(x)= 0 elsewhere and • Let ¹ be the Lebesque measure, then s g d¹ = 0 , whereas the Riemann integral is undefined (prove: exercise)

Integrals Examples • Let P(A) = Â(x 2 A) (probability measure concentrated on x) • P({x})=1 • Then s g dP = g(x) (sampling – Dirac) • Let P be concentrated on sequence of points {xi}, i.e. P(A) = iPiÂ(xi2 A) (P({xi})= Pi) (Exercise: prove) then s g dP = iPi g(xi)

Density functions • Let two measures ¹ and P be defined on some measure space  • Assume ¹(A)=0 => P(A)=0, then we say that P is absolutely continuous (AC) w.r.t. ¹ • Theorem (Radon Nikodym) Let P be AC w.r.t. ¹, then a measurable function f: -> exists such that P(A) = sf d¹ • f is called the density funtion of P

Density functions Example • Consider the reals R, the Borel sets and the Lebesque measure ¹ • Let P be absolutely continuous w.r.t. Lebesque measure, e.g. P({x})=0 for x 2R Then f exists, such that P(A) = sf d¹ (AKA sf(x) dx)

Conditional Probabilities • Classic (Bayes) • Let A,B be subsets P(A|B) = P(A Å B) / P(B) • In measure theory • Let P be a probility measure Let Fµ§ be ¾ –algebras (F is a sub ¾ –algebra of §) • Define for A 2§, P(A|F) to be some function, measurable in F, such that for any F 2F P(A Å F) = sF P(A | F) dP

Conditional Probabilities • Theorem P(A | F) exists !! • Proof: Let ¹A(F)= P(A Å F) (measure on F ) ¹A is AC w.r.t. P RN guarantees existence of P(A | F) measurable in F, i.e. P(A Å F) = ¹A(F)=sF P(A | F) dP

Conditional Probabilities • Theorem P(A | F) is (almost unique) I.e., let g and g’ be candidates Then sF g dP = sF g’ dP for all F 2F • Proof: (exercise)

Conditional Probabilities Examples • Let  = R2 • Let § be the smallest ¾–algebra including B £ B • Let x and y: R2-> R (the natural projections) be two random variables. • Let §x and §y be the smallest ¾ – algebras such that x and y are measurable (generated by x and y resp.) • Both §xµ§ and §yµ§

Conditional Probabilities Examples • Lemma §x = {B £R | B 2B} • Proof: {B £R | B 2B} is a ¾ –algebra x-1(B) = B £R for any B 2B Assume B £R §x, for some B 2B then x-1(B) §x (contradiction) So {B £R | B 2B} is minimal

Conditional Probabilities Examples • Lemma P(y 2 Y| x) is a function of x • Proof: Define g(x,y)=P(y 2 Y|§x) (= P(y 2 Y| x)) Assume g(x,y1)=g1g(x,y2)=g2 for y1y2 Thus (x,y1) 2g-1(g1) and (x,y2) g-1(g1) , so that g-1(g1)  {B £R | B 2B} = §x (contradiction) q.e.d.

Moments • Let x be a random variable • Expectation E(x)=s x dP • p-moment E(|x|p) • Variance E((x-E(x))2) = E(x2)-E2(x)

Conditional moments • Fµ§ be ¾ –algebras • Let x and y be § measurable (random variables) • E(y| F) is an F measurable function st for all F 2F st sFE(y| F) dP = sF y dP • Define ¹F(A) = P(A | F) • Then (Prove as an exercise) E(y|F)=sy d¹F

Another identity for conditional expectations • Let Fµ§ be ¾ –algebras • If y is F measurable then E(y| F) = y w.P.1 • Proof: for all F in F sF E(y| F) dP = sF y dP

Stochastic Proces • Let T be an index (time) set (R, R+, Z, N) • Let x: £T-> R where for each t 2T , x(.,t) is a random variable • Then we say that x is a stochastic (random) proces.

Stochastic Proces Example • Let ={0,1} • P(1)=P(0)=1/2 • x(0,t)=0 • x(1,t)=t • E(x(.,t))=s x(.,t) dP = ½ ¢ 0 + ½ ¢ t = t/2 • E(x2(.,t))=sx2(.,t) dP = ½ ¢ 0 + ½ ¢t2 = t2/2

Cylinders • Let {t1,..,tn} be a finite ordered subset of T • Let {A1,..,An} be subsets of R • Then C={x(.,t1) 2A1Å ..Å x(.,tn) 2An} µ is called a cylinder • Let C be defined as above then Ch={x(.,t1+h) 2A1Å ..Å x(.,tn+h) 2An}

Stationarity • A random proces is stationary iff • P(Ch) = P(C) for all C and h • x stationary => E(x(.,t))=E(x)=E (constant in t) • x stationary => E(x2(.,t))=E(x2) • x stationary => E((x(.,t+h)-E)(x(.,t)-E))=Cxx(h) stationarity => wide sense stationarity

Filtrations • A filtration is an increasing set of ¾ –algebras {§t} • §tµ§ • t1 < t2 => §t1µ§t2 • Let x be a random proces • §xt = ¾(x-1(C,¿) for ¿· t) (i.e. the smallest ¾ –algebra including all inverse cylinder sets of x(.,¿), ¿ · t) • §x= {§xt} is called the natural filtration of x

Adapted processes • A random proces x is adapted to a filtration {§t} iff for every t, x(.,t) is measurable w.r.t §t • x is adapted to its natural filtration • Lemma: Let x be adapted to {§t}, and t1 <..< tn then {x(.,t1) 2A1[ .. [ x(.,tn) 2An} 2§tn Proof: {x(.,tj) 2 Aj} 2§tjµ§tn for all j· n Since §tn is a ¾ –algebra the result follows

Adapted processes • Lemma: Let x be adapted to {§t}, and t1 <..< tn then C={x(.,t1) 2 A1Å ..Å x(.,tn) 2 An} 2§tn Proof: {x(.,t1) 2 A1C[ .. [ x(.,tn) 2AnC} 2§tn  {x(.,t1) 2 A1C[ .. [ x(.,tn) 2 AnC}C2§tn  C={x(.,t1) 2 A1Å ..Å x(.,tn) 2 An} 2§tn

Stochastic Convergence • i) Convergence in probability (Stochastic convergence) limt! 1 P(|x(t)|>±)=0 8±>0 • ii) Convergence in mean/moment limt! 1 E[|x(t)|²]=0 • iii) Almost sure convergence P(limt! 1 x(t)=0)=1 • iv) Convergence in distribution (weak,law) limt ! 1 P(x(t) · x) = FX(x)

Convergence relations • ii) => i) • Proof: Markov inequality: P(|x(t)| ¸ a) · E(|x(t)|/a) P(|x(t)|¸ a 1/²) = P(|x(t)|²¸ a) · E(|x(t)|²)/a

Convergence relations • iii) => i) • Proof: A={! | P(limt !1 x(!,t)=0} P(A)=1 *) limt ! 1 x(!,t)=0 => 9 n, s.t. |x(!,t)| ·± for all t ¸ n For a given !, let n(!) be the smallest such n Let Am = {! | n(!) = m} {Am} are disjoint *) implies A =[mAm and in turn 1 = P(A) =P([mAm) = mP(Am), i.e. P(|x(!,t)| ·±for all t ¸l) = m=1lP(Am) -> 1 for l-> 1

Convergence relations • i) does not imply iii) • Proof by counterexample Define c(k) uniformly distributed in {2k,..,2k+1} Let {c(k)} be independent for n 2 {2k,..,2k+1}, let x(n)=1 if c(k)=n x(n)=0 else P(x(n) ·±)=P(x(n)=0) ¸1-2-k (convergence in probability) {x(n)} converges to 0 nowhere (no a.s. convergence) • i) does not imply ii) (exercise)

Convergence relations • iii) does not imply ii) • Let  = [1,1) • Let ! ~ f(!)=K !-± (density) • Let x(!,t)=!/t for t ¸ 1 • P(limt ! 1 x(!,t)=0)=1 (a.s. convergence) • E(x(!,t)²)=1/t²s!² K !-± d! =K/t²s!²-±! =K/t²/(1+²-±) [! 1+²-± ]11 • Finite only for 1+²-± < 0

More generally (Wikipedia)

Types of Stochastic Processes • Markov processes • Poisson processes • Martingales • Brownian motions • Ito processes • Ito diffusions • Levy processes • Renewal processes

Markov Processes • Let X be a random process adapted to the filtration (Ft) • Then X is a Markov process w.r.t. (Ft) iff for any measurable function f E(f(Xt+h)|Ft)= E(f(Xt+h)|Xt) w.P.1 • In particular if (Ft) is generated by X and f is the indicator function for some set A P(Xt+h2 A|Xt,Xt1,..,Xtk)= P(Xt+h2 A|Xt) for any selection of times t ¸ t1 ¸ t2 ¸ .. ¸ tk

Markov Processes(Stopping times) • Let ¿ be a random variable, such that Â(¿(!) · t) is adapted to (Ft) • The ¿ is said to be a stopping time w.r.t. (Ft) • If (Ft) is generated by a random process X a stopping time ¿ (w.r.t.) (Ft)) depends only on the history of X • Example (first escape time) Let X(0) 2 U ¿ = inf {s | X(s)  U}

Strong Markov Property • Let X be a Markov Process generating (Ft) • The X is said to be a strong Markov process iff E(f(X¿+h)|F¿)= E(f(X¿+h)|X¿) for any stopping time ¿ w.r.t. (Ft) (F¿ is generated by the history of X until ¿) • Every discrete time Markov process (chain) is strong

Non strong Markov ProcessExample • Let g: R -> R2 (1-D curve in the plane) • Let x1 < x2 • Let G=g(x1)=g(x2) and g(x)  g(y) elsewhere • G1={g(x)|x<x1}, G2={g(x)|x1<x<x2}, G3={g(x)|x2<x} • Let X: £ R -> R be a strong Markov process, such that 0 < P(X(!,t+h) · y|X(!,t)=x)=syHh(´-x) d´ < 1 for all x,y (P(X(!,t)=x1)=P(X(!,t)=x2)=0 for any fixed t > 0) and X(!,0)  x1and X(!,0) x2 Let • ¿ = inf {s | g(s)=G} = inf {s | X(s) 2 {x1,x2}} • Assume P(X(!,¿)=x1), P(X(!,¿)=x2) > 0

Non strong Markov ProcessExample • The process g is adapted to ¾({Xs}), therefore ¾({gs}) µ¾({Xs}) • Since X is (strong) Markov: E(fog(Xt+h)|¾({Xt}))=E(fog(Xt+h)|Xt) • Prove: E(f(gt+h)|¾({gt}))= E(f(gt+h)|gt) • Since Xt = g-1(t,gt) (w.P.1), there is a version of E(fog(Xt+h)|Xt) which is a function of gt, i.e. E(fog(Xt+h)|¾({Xt}))= E(fog(Xt+h)|Xt) = E(f(gt+h)|gt) w.P.1 • Thus gt is a Markov process

Non strong Markov ProcessExample • P(g¿+h 2 G2 |¾({g¿})) = P(g¿+h 2 G2|¾({X¿})) w.P.1 • P(g¿+h 2 G2|¾({X¿})) = P(g¿+h 2 G2|X¿) (X is strong) = P(X¿ 2 {x1,x2}|X ¿) = sx1x2 Hh(´-X¿) d´ • On {!| X(!,¿)=x1} P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x1) d´ • On {!| X(!,¿)=x2} P(X¿+h 2 {x1,x2}|X ¿) = sx1x2 Hh(´-x2) d´ • Let f(g) = Â(g 2 G2), then E(f(g¿+h)| g¿) = P(g¿+h 2 G2 |g¿) • Since E(f(g¿+h)| g¿) is measurable in ¾(g¿) it has to be constant on G • However since sx1x2 Hh(´-x1) d´ sx1x2 Hh(´-x2) d´ it cannot coincide with P(g¿+h 2 G2 |¾({g¿}))

Markov Processesremarks • Models uncertain state dynamics • BJ models have Markov representation • Queing models are typically Markov • Extensions: Semi Markov, GSMP • Extensions: HMM, MDP • Discrete time -> Markov chains

Stochastic Processes