Language Learning Week 7

Language Learning Week 7 Pieter Adriaans: pietera@science.uva.nl Sophia Katrenko: katrenko@science.uva.nl

Coming weeks • Central question: Why does MDL work? • Problem: complexity theory by itself does not explain this • Study computation as a physical process • Merge: information theory, thermodynamics, complexity theory, learning theory

Claims • Information theory is a fundamental science • Nature is a sloppy implementation of information theory • Learnability is a thermo dynamical issue • Our brain is a data compression machine

Some issues • First and Second law of thermodynamics, Landauer’s principle, Turing machines, Universal Turing machines, uncomputable numbers, dioganalization, the Halting set, recursive sets, recursively enumerable sets, dovetailing computations, Krafts inequality, Kolmogorov complexity, randomness deficiency, Minimum description length, Shannon information, entropy, free energy, intensive and extensive datasets.

Research Program • Study learning capacities of human beings in terms of data compression • Identify bias that make the process efficient

JPEG Compressie 32 K = Orde 639 K = Chaos 132 K = Facticiteit

3 2 1 4 5 6

50 % noise 25 % noise 75 % noise 100 % noise

= + 25 % NOISE + 50 % NOISE = = + 75 % NOISE = Two-part code optimization Data = Theory + Theory(Data) + 100 % NOISE = 100 % NOISE

JPEG File Size 8 Kb JPEG File Size 7 Kb JPEG File Size 7 Kb

First law of thermodynamics • The increase in the internal energy (dU) of a thermodynamic system is equal to the amount of heat energy added to the system (Q) minus the work done by the system on the surroundings (W) .

Second law of thermodynamics • The entropy S of an isolated system not in equilibrium will tend to increase over time, approaching a maximum value at equilibrium.

Landauer's Principle (1961) • "any logically irreversible manipulation of information, such as the erasure of a bit or the merging of two computation paths, must be accompanied by a corresponding entropy increase in non-information bearing degrees of freedom of the information processing apparatus or its environment". • Specifically, each bit of lost information will lead to the release of an amount kT ln 2 of heat.

Boltzmann constant • k or kB is the physical constant relating temperature to energy. • k = 1.380 6505(24)×10−23 joule/kelvin • Sloppy? • Landauers principle criticized • Bennet (1973), reversible computing

What is a computer? The matematician Alan Turing developed the notion of a Turing machine. The Turing machine manipulates symbols the same way a mathematician would do behind his desk. Mathamatician In tray, manipulating data Out tray IN OUT Computer The Turing machine Is an abstract model of a mathematician INPUT OUTPUT

Principles of a Turing machine The tape has squares, Containing symbols that can be read by The Turing machine Read/write head example: (b=blank) . . . . . . . b 1 0 1 0 0 b . . . . . . Etc. . . . . . .

Schematic representation of a Turing machine read./write head State Program

An example state (read 0) (read 1) (read b) of a simple DTM programm q0 q0,0,+1 q0,1,+1 q1,b,-1 Is in the matrix q1 q2,b,-1 q3,b,-1 qN,b,-1 q2qy,b,-1 qN,b,-1 qN,b,-1 q3qN,b,-1 qN,b,-1 qN,b,-1 The machine is In state q0 The read/ write head reads 0 Writes a 0 moves (+1) one place to the right State changes To q0 +1 b b 1 0 1 0 0 b b b (state q0) q0 program This program accepts string that end with ’00’

Turing machines • An enumeration of Turing machines • Tx(y) Turing machine x with input y • Universal Turing machine Ui(yx) • Tx(y) is defined if x stops on input y in an accepting state after a finite number of steps. • Minsky: there is a universal Turing machine with 7 states and 4 tape symbols

Uncomputable numbers • Define a recursive function g • g(x,y)=1 if Tx(y) is defined, and 0 otherwise • Since g is recursive there will be a Turing machine r such that Tr(y)=1 if g(y,y)=0 and Tr(y)=0 if g(y,y)=1 • But then we have Tr(r)=1 if g(r,r)=0 and since Tr(r) is defined g(r,r)=1 • Paradox: ergo g(x,y) can not be recursive

Recursive sets, recursively enumerable sets • A set A is recursively enumerable iff it is accepted by a Turing machine Tx, i.e. Tx, stops for each element of A, but not necessarily for elements in the complement of A • A is recursive: Tx stops for every element of A in an accepting state and for every element in the complement of A in a non-accepting state

Halting Set • Halting set: Ko = {<x,y>: Tx(y)< } • Diagonalization (Cantor) • Dovetailing computations • Church Turing thesis: the class of algorithmically computable numerical functions coincides with the class of partial recursive functions

Some issues • First and Second law of thermodynamics, Landauer’s principle, Turing machines, Universal Turing machines, uncomputable numbers, dioganalization, the Halting set, recursive sets, recursively enumerable sets, dovetailing computations, Krafts inequality, Kolmogorov complexity, randomness deficiency, Minimum description length, Shannon information, entropy, free energy, intensive and extensive datasets.

Language Learning Week 7