Slightly beyond Turing’s computability for studying Genetic Programming

Slightly beyond Turing’s computability for studying Genetic Programming Olivier Teytaud, Tao, Inria, Lri, UMR CNRS 8623, Univ. Paris-Sud, Pascal, Digiteo

Outline • What is genetic programming • Formal analysis of Genetic Programming • Why is there nothing else than Genetic Programming ? • Computability point of view • Complexity point of view

What is Genetic Programming (GP) • GP = mining Turing-equivalent spaces of functions • Typical example: symbolic regression. • Inputs: • x1,x2,x3,…,xN in {0,1}* • y1,y2,y3,…,yN in {0,1} yi=f(xi) • (xi,yi) assumed independently identically distributed (unknown distribution of probability) • Goal: • Finding g such that E|g(x)-y| + C E Time(g,x) as small as possible

How does GP works ? • GP = evolutionary algorithm. • Evolutionary algorithm: • P = initial population • While (my favorite criterion) • Selection = best functions in P according to some score • Mutations = random perturbations of progs in the Selection • Cross-over = merging of programs in the Selection • P ≈ Selection + Mutations + Cross-over

How does GP works ? • GP = evolutionary algorithm. • Evolutionary algorithm: • P = initial population • While (my favorite criterion) • Selection = best functions in P according to some score • Mutations = random perturbations of progs in the Selection • Cross-over = merging of programs in the Selection • P ≈ Selection + Mutations + Cross-over Does it work ?

How does GP works ? • GP = evolutionary algorithm. • Evolutionary algorithm: • P = initial population • While (my favorite criterion) • Selection = best functions in P according to some score • Mutations = random perturbations of progs in the Selection • Cross-over = merging of programs in the Selection • P ≈ Selection + Mutations + Cross-over Does it work ? Definitely, yes for robust and multimodal optimization in complex domains (trees, bitstrings,…).

How does GP works ? • GP = evolutionary algorithm. • Evolutionary algorithm: • P = initial population • While (my favorite criterion) • Selection = best functions in P according to some score • Mutations = random perturbations of progs in the Selection • Cross-over = merging of programs in the Selection • P ≈ Selection + Mutations + Cross-over Does it work ?

How does GP works ? • GP = evolutionary algorithm. • Evolutionary algorithm: • P = initial population • While (my favorite criterion) • Selection = best functions in P according to some score • Mutations = random perturbations of progs in the Selection • Cross-over = merging of programs in the Selection • P ≈ Selection + Mutations + Cross-over Which score ? A nice question for mathematicians

Why studying GP ? • GP is studied by many people • 5440 articles in the GP bibliography [5] • More than 880 authors • GP seemingly works • Human-competitive results http://www.genetic-programming.com/humancompetitive.html • Nothing else for mining Turing-equivalent spaces of programs • Probably better than random search • Not so many mathematical fundations in GP • Not so many open problems in computability, in particular with applications

Formalization of GP What is typically GP ? • No halting criterion. We stop when time is exhausted. • No use of prior knowledge; no use of f, whenever you know it. People (often) do not like GP because: • It is slow and has no halting criterion • It uses the yi=f(xi) and not f (different from automatic code generation)  Are these two elements necessary ?

Iterative algorithms

Black-box ?

Formalization of GP Summary: GP uses only the f(xi) and the Time(f,xi). GP never halts: O1, O2, O3, … . Can we do better ?

Known results Whenever f is available (and not only the f(xi) ), computing O such that • O≡f • O optimal for size (or speed, or space …) is not possible. (i.e. there’s no Turing machine performing that task for all f)

A first (easy) good reason for GP. Whenever f isavailable (and not only the f(xi) ), computing O1, O2, …, such that • Op ≡ f for p sufficiently large • Lim size(Op) optimal is possible, with proved convergence rates, e.g. by bloat penalization: - while (true) - select the best program P for a compromise relevance on the n first examples + penalization of size, e.g. Sum |P(xi)-yi |+ C( |P| , n ) i < n - n=n+1 (see details of the proof and of the algorithm in the paper)

A first (easy) good reason for GP. Whenever f is notavailable (and not only the f(xi) ), computing O1, O2, …, such that • Op ≡ f for p sufficiently large • Lim size(Op) optimal is possible, with proved convergence rates, e.g. by bloat penalization: - consider a population of programs; set n=1 - while (true) - select the best program P for a compromise relevance on the n first examples + penalization of size, e.g. Sum |P(xi)-yi |+ C( |P| , n ) i < n - n=n+1 (see details of the proof and of the algorithm in the paper)

A first (easy) good reason for GP.  Asymptotically (only!), finding an optimal function O ≡ f is possible.  No halting criterion is possible (avoids the use of an oracle in 0’)

Outline • What is genetic programming • Formal analysis of Genetic Programming • Why is there nothing else than Genetic Programming ? • Computability point of view • Complexity point of view: • Kolmogorov’s complexity with bounded time • Application to genetic programming

Kolmogorov’s complexity • Kolmogorov’s complexity of x : Minimum size of a program generating x • Kolmogorov’s complexity of x with time at most T : Minimum size of a program generating x in time at most T. Kolmogorov’s complexity in bounded time = computable.

Outline • What is genetic programming • Formal analysis of Genetic Programming • Why is there nothing else than Genetic Programming ? • Computability point of view • Complexity point of view: • Kolmogorov’s complexity with bounded time • Application to genetic programming

Kolmogorov’s complexity and genetic programming • GP uses expensive simulations of programs • Can we get rid of the simulation time ? e.g. by using f not only as a black box ? • Essentially, no: • Example of GP problem: finding O as small as possible with • ETime(O,x)<Tn, • |O|<Sn • O(x)=y • If Tn = Ω(2n) and some Sn = O(log(n)), this requires time at least Tn/polynomial(n) • Just simulating all programs shorter than Sn and « faster » than Tn is possible in time polynomial(n)Tn

Outline • What is genetic programming • Formal analysis of Genetic Programming • Why is there nothing else than Genetic Programming ? • Computability point of view • Complexity point of view: • Kolmogorov’s complexity with bounded time • Application to genetic programming • Conclusion

Conclusion • Summary • GP is typically solving approximately problems in 0’ • A lot of work about approximating NP-complete problems, but not a lot about 0’ • We provide a theoretical analysis of GP • Conclusions: • GP uses expensive simulations, but the simulation cost can anyway not be removed. • GP has no halting criterion, but no halting criterion can be found. • Also, « bloat » penalization ensures consistency  this point proposes a parametrization of the usual algorithms.

Conclusion • Summary • GP is typically solving approximately problems in 0’ • A lot of work about approximating NP-complete problems, but not a lot about 0’ • We provide a mathematical analysis of GP • Conclusions: • GP uses expensive simulations, but the simulation cost can anyway not be removed. • GP has no halting criterion, but no halting criterion can be found. • Also, « bloat » penalization ensures consistency  this point proposes a parametrization of the usual algorithms.

Slightly beyond Turing’s computability for studying Genetic Programming