Kakutani’s interval splitting scheme

Kakutani’s interval splitting scheme Willem R. van Zwet University of Leiden Bahadur lectures, Chicago 2005

Kakutani's interval splitting scheme Random variables X1, X2 , … : X1 has a uniform distribution on (0,1); Given X1, X2 , …, Xk-1 , the conditional distribution of Xk is uniform on the longest of the k subintervals created by 0, 1, X1, X2 , …,Xk-1.

0———x1——————————1 0———x1———————x2——1 0———x1——x3————x2——1 0———x1`——x3——x4—x2——1 Kakutani (1975): As n →∞, do these points become evenly (i.e. uniformly) distributed in (0,1)?

Empirical distribution function of X1,X2,…,Xn : Fn(x) = n-1 Σ1in1(0, x] ( Xi). Uniform d.f. on (0,1) : F(x) = P(X1  x) = x , x(0,1) . Formal statement of Kakutani's question: (*) limn sup 0<x<1 Fn(x) - x  0 with probability 1 ?

We know : if X1,X2,…,Xn are independent and each is uniformly distributed on (0,1), then (*) is true (Glivenko-Cantelli). So (*) is "obviously" true in this case too!! However, distribution of first five points already utterly hopeless!!

Stopping rule: 0<t<1 : Nt = first n for which all subintervals have length  t ; t1 : Nt = 0 . Stopped sequence X1, X2,…, XN(t) has the property that any subinterval will receive another random point before we stop iff it is longer than t.

May change order and blow up:Given that X1=x 0———x—————————1 L( NtX1=x) = L( Nt/x + N*t/(1-x) + 1), 0<t<1 \ / independent copies L(Z) indicates the distribution (law) of Z .

L( NtX1=x) = L( Nt/x + Nt/(1-x) + 1), 0<t<1\ / independent copies (t) = E Nt = (0,1) {(t/x) + (t/(1-x)) + 1} dx(t) = E Nt = (2/t) - 1 , 0 <t< 1. Similarly2(t) = E (Nt - (t))2 = c/t , 0 <t<  1/2 .

(t) = E Nt = (2/t) - 1 , 0 <t< 1. 2(t) = E (Nt - (t))2 = c/t , 0 <t  1/2 . E{Nt / (2/t)} = 1 - t/2 → 1 as t → 0, 2 (Nt / (2/t)) = ct/4 → 0 as t → 0. lim t→0Nt / (2/t) = 1 w.p. 1 We have built a clock! As t→0, the longest interval (length t) tells the time n ~ (2/t) .

Nt ~ (2/t) as t → 0 w.p. 1. Define Nt Nt (x) = Σ 1(0, x] ( Xi) , x(0,1) . i=1 Nt (x) ~ Nt/x ~ (2x/t) as t → 0 w.p. 1. Nt (x): 0—.—.—x—.——.—.——.—.——1 Nt/x : 0—.—-—x—.——.—.——.—.——1  FNt (x) = Nt (x) / Nt → x as t→0 w.p. 1

FNt (x) = Nt (x) / Nt → x as t→0 w.p. 1.and as Nt → ∞ when t→0,Fn (x) → x as n → ∞ w.p. 1 sup Fn(x) - x 0 w.p. 1 . 0<x<1Kakutani was right (vZ Ann. Prob. 1978)

We want to show that Fn(x)  x faster than in the i.i.d. uniform case. E.g. by considering the stochastic processes Bn (x) = n1/2 (Fn(x) - x) , 0x1 . If X1,X2,…,Xn independent and uniformly distributed on (0,1), then Bn D B0 as n , where D refers to convergence in distribution of bounded continuous functions and B0denotes the Brownian bridge.

Refresher course 1 W : Wiener processon [0,1], i.e. W ={ W(t): 0t1} with W(0) = 0 ; W(t) has a normal (Gaussian) distribution with mean zero and variance E W2 (t) = t W has independent increments. B0 : Brownian bridge on [0,1], i.e. B0 ={B0(t): 0t1} is distributed as W conditioned on W(1)=0. Fact: {W(t) – t W(1): 0t1 } is distributed as B0

So: If X1,X2,…,Xn are independent and uniformly distributed on (0,1), then Bn D B0 as n . If X1,X2,…,Xn are generated by Kakutani's scheme, then (Pyke & vZ, Ann. Prob. 2004) Bn D a.B0 as n , with a = ½ σ( N½) = (4 log 2 - 5/2)½ = 0.5221…. . Half a Brownian bridge! Converges twice as fast!

Refresher course 2 Y : random variable with finite k-th moment μk = E Yk = ∫ Yk dP < ∞ and characteristic function ψ(t) = E e itY = 1 + Σ1jkμj (it) j/ j! + o(tk) . Then logψ(t) = Σ1jkj (it) j/ j! + o(tk) . j : j-th cumulant 1 = μ1 = E Y ; 2 = 2 = E(Y- μ1)2 3 = E(Y- μ1)3 ; 4 = E(Y- μ1)4 - 34 ; etc. j =0 for j3 iff Y is normal.

If Y1 and Y2 are independent, then characteristic functions multiply and hence cumulants add up: • j(Y1+Y2) = j(Y1) + j(Y2) . • Let Y1, Y2, … be i.i.d. with mean μ=EY1=0 and all moments finite. Define • Sn = n-½ (Y1 + Y2 + … + Yn) . • Then for j≥3, • κj (Sn) = n-j/2 κj(Σ Yi) = n1-j/2 κj (Y1) → 0. • Sn asymptotically normal by a standard moment convergence argument. Poor man’s CLT, but sometimes very powerful.

We know E Nt = (2/t) - 1 , 0 <t< 1, 2 (Nt) = c/t , 0<t≤½ . Similarly κj (Nt) = cj /t , 0<t≤ 1/j , j=3,4,… . Define Is = N1/s + 1, i.e. the number of intervals at the first time when all intervals are ≤ 1/s .Then κj (Is) = cj s , s>j, j=1,2,…, with c1 = 2, c2 = c.

κj (Is) = cj s , s>j, j=1,2,…, For growing s, Is behaves more and more like an independent increments process!! Define Wt (x) = (t/c)½(Nt (x) - 2x/t), 0≤x≤ 1. Then for s=1/t, and s (i.e. t0), Wt (x) D (t/c)½(Nt/x - 2x/t) D (cs)- ½ (Ixs - 2xs) D W(x) because of the cumulant argument. (the proof of tightness is very unpleasant !)

Now Wt (x) - x Wt (1) = (t/c)½{Nt (x) - x Nt (1)}  2(ct)- ½ {Nt (x) - x Nt (1)} / Nt = 2(ct)- ½ (FNt(x) - x) . Hence fort=2/n and M=N2/n  n, n½ (FM(x) - x) D (c/2)½ B0(x) = a B0(x), but the randomness of M is a major obstacle !

Now M = N2/n n , but it is really nasty to show that n½ sup |FM (x) - Fn (x)| →P 0 . But then we have n½ (Fn (x) - x) D a . B0(x), with a = (c/2)½ .

Kakutani’s interval splitting scheme