210 likes | 321 Views
Algorithmic information transfer and it’s practical use. B. Bauwens, L. Boullart, P. Santens. Introduction dia 3-8 Current approach dia 9-17 Test and results dia 18-21. Introduction. Problem setting. Given: 2 measured signals: x,y Asked: Are underlying sources cooperating?
E N D
Algorithmic information transfer and it’s practical use B. Bauwens, L. Boullart, P. Santens
Introduction dia 3-8 Current approach dia 9-17 Test and results dia 18-21
Problem setting Given: 2 measured signals: x,y Asked: • Are underlying sources cooperating? • x influences y ? • y influences x ?
Exemple: how do different brain areas relate ?
Goals : The algorithm must infer causalities: • Using little data • For complex interactions • Having a notion of confidence • Without setting parameters by the final user • As objective as possible
Literature • Granger causality: Predict xt from x1…t-1 and Mx. Predict xt from x1…t-1 ; y1…t-1 and Mxy If the second prediction is better: B influences A.
(Shannon) information transfer Fit probablity distribution to X x X+d x Y Calculate IT(x ← y) = I(xt+d; yt | xt) shuffle y to test significance
Dependency test • Let d: B x B → N such that (cfr. sum-m test) ∑x,y m(x)m(y) 2d(x,y) <1 m universal prior • I(x;y) is a sum-m dependency test • A dependency test d’ dominates d iff for some c, for all x,y : d’(x,y)+c > d(x,y) c not dependant on x and y
Optimality of I(x;y) a test d in a set A of tests is universal for A if it dominates every test in A. • There exists no universal recursive dependency test. • Exists upper-enumerable dependency test? • Exists lower-enumerable dependency test? Assume the class of all functions which are enumerable using the halting sequence. Universal element: K(x)+K(y)-K(x,y | ξ) = I(x;y) + I(x,y ; ξ)
Consequence: Assume M: B x B → B a recursive function If I(x;y|M(x,y)) is a dependancy sum-m test Then I(x;y) – I(x;y|M(x,y)) > -I(x,y ; ξ) (1)
Monotone-Conditional Algorithmic Complexity: K(x | y ↑) =min {|p| : k=0..n; U(p,y1…k)=xk+1 & U(p,y1…n)=ε} K(x|y) <c K(x | y ↑) <c’ K(x) Algorithmic information transfer: IT(x ← y) = K(x | y ↑) - K(x)
Decomposition of I(x;y) One has: I(x;y) >c IT(x ← y) + IT(y ← x) I(x;y) = IT(x ← y) + IT(y ← x) + IT(x = y) IT(x=y) = K(x | y ↑) + K(y | x ↑) – K(x,y) = I(p; q) – dI(x,y) 0 < dI(x,y) < K(p|x,y)+K(q|x,y) < log(|x|)
IT(x ← y) is as hard to calulate ? If x ~ m and y~m independently, then Prob{(x, y)|dI(x, y) > k} < ck32−k. If x ~ m and y~m independently; then IT(x ← y) and IT(y ← x) is computable from ξ1…k with probability ck32−k ξ is the halting sequence (domain U)
Comparision with litterature Granger causality can be stated as IT(x ← y | M(x,y) ) = K(x | y ↑, M(x,y)) – K(x|M(x,y)) Here: IT(x ← y ) = K(x | y ↑) – K(x) In case of autoregressive coefficients, M(x,y) are the coefficients. • No sum-m dependency test (condition: K(M|x)+K(M|y) – K(M) > c)
Differences: • Predictibility improvement = better data compression • Complexity of model is incorporated Advantages: • Arbitrary complex models can be evaluated in the same framework. • If the estimate of K is sufficiently good: we have a sum-m test. • The IT is more objective (one can use different models at the same time and compare them naturally)
Testing Artificial data coming from: • Coupled oscillators dx/dt=ω1 + e1 sin(y-x) + n1 dy/dt=ω2 + e2 sin(x-y) + n2 • Rössler system Amount of samples 1000, 10 000, 100 000
Methods Compressors where built using: • Naive bayesian prediction • Linear regression • Recurrent neural networks • Support vector machines • Lempel Ziv algorithm • Combinations of above
Results Results compared to Shannon information transfer: • For 10 000 and 100 000 there was perfect agrement. • For 1000 there is a bias for inferring causality from the simplest signal for most compressors reason: not fully understood