1 / 17

A Syntactic Justification of Occam’s Razor

A Syntactic Justification of Occam’s Razor. John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学. Email: andy.evans@nottingham.edu.cn john.woodward@nottingham.edu.cn paul.dempster@nottingham.edu.cn. Overview. Occam’s Razor

kostya
Download Presentation

A Syntactic Justification of Occam’s Razor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Syntactic Justification ofOccam’s Razor John Woodward, Andy Evans, Paul Dempster Foundations of Reasoning Group University of Nottingham Ningbo, China 宁波诺丁汉大学 Email: andy.evans@nottingham.edu.cn john.woodward@nottingham.edu.cn paul.dempster@nottingham.edu.cn

  2. Overview • Occam’s Razor • Sampling of Program Spaces (Langdon) • Definitions • Assumptions • Proof • Further Work • Context

  3. Occam’s Razor • Occam’s Razor says has been adopted by the machine learning community to mean; • “Given two hypotheses which agree with the observed data, pick the simplest, as this is more likely to make the correct predictions”

  4. Definitions Program Hypothesis Size • Function • Set of predictions (concept) • Complexity

  5. Langdon 1 (Foundation of Genetic Programming) 1. The limiting distribution of functions is independent of program size! There is a correlation between the frequency in the limiting distribution and the complexity of a function.

  6. Langdon 2 (Foundation of Genetic Programming)

  7. Hypothesis-Concept Spaces

  8. Notation • P is the hypothesis space (i.e. a set of programs). • |P| is the size of the space (i.e. the cardinality of the set of programs). • F is the concept space (i.e. a set of functions represented by the programs in P). • |F| is the size of the space (i.e. the cardinality of the set of functions). • If two programs pi and pj map to the same function (i.e. they are interpreted as the same function, I(pi)=f=I(pj)), they belong to the same equivalence class (i.e. pi is in [pj] ↔ I(pi)=I(pj)). The notation [p] denotes the equivalence class which contains the program p (i.e. given I(pi)=I(pj),[pi]=[pj]). The size of an equivalence class [p] is denoted by |[p]|.

  9. Two assumptions • Uniformly sample the hypothesis space, probability of sampling a given program is 1/|P|. • There are fewer hypotheses that represent complex functions • |[p1]|>|[p2]|↔c(f1)<c(f2), where I(p1)=f1 and I(p2)=f2. • Note that |[p1]|/|P| = p(I(p1))=p(f1), that is |[p1]|/|P|=p(f1) • i.e. the probability of sampling a function is given by the ratio of the size of the equivalence class containing all the programs which are interpreted that function, divided by the size of the hypothesis space.

  10. Proof • starting from a statement of the assumption • |[p1]|>|[p2]| ↔c(f1)< c(f2) • Dividing the left hand side by |P|, • |[p1]|/|P|>|[p2]|/|P| ↔c(f1)< c(f2) • As |[p1]|/|P| = p(I(p1)) =p(f1), we can rewrite as • p(f1)>p(f2) ↔c(f1)< c(f2) • a mathematical statement of Occam’s razor.

  11. Restatement of Occam’s Razor • Often stated as “prefer the shortest consistent hypothesis” • Restatement of Occam’s Razor: The preferred function is the one that is represented most frequently. The equivalence class which contains the shortest program is represented most frequently.

  12. Summary • Occam’s razor states “pick the simplest hypothesis consistent with data” • We agree, but for a different reason. • Restatement. Pick the function that is represented most frequently (i.e. belongs to the largest equivalence class). Occam’s razor is concerned with probability, and we present a simple counting argument. • Unlike many interpretations of Occam’s razor we do not throw out more complex hypotheses we count them in [p]. • We offer no reason to believe the world is simple, our razor only gives a reason to predict using the simplest hypothesis.

  13. further work To prove Assumption 2 • there are fewer hypotheses that represent complex functions: |[p1]|>|[p2]|↔c(f1)<c(f2), • Why are some functions represented more frequently that other functions. • The base functions may contain functions which are: • Symmetrical i.e. f(x, y) = f(y, x), e.g. nand. • Complementary, i.e. f(g(x))= g(f(x)) e.g. inc and dec.

  14. Further work • Further work -> to prove out assumptions. • Does it depend on the primitive set??? • How are the primitive linked together (e.g. tree, lists, directed acyclic graphs…)

  15. How does nature compute? • Heuristics such as Occam’s razor need not be explicitly present as rules. • Random searches of an agents generating capacity may implicitly carry heuristics. • Axiomatic reasoning probably comes late.

  16. Thanks & Questions? • Thomas M. Cover and Joy A. Thomas. Elements of information theory. John Wiley and Sons 1991. • Michael J. Kearns and Umesh V. Vazirani. An introduction to computational learning theory. MIT Press, 1994. • William B. Langdon. Scaling of program fitness spaces. Evolutionary Computation, 7(4):399-428,1999. • Tom M. Mitchell. Machine Learning. McGraw-Hill 1997. • S. Russell and P. Norvig. Artificial Intelligence: A modern approach. Prentice Hall, 1995. • G. I. Webb. Generality is more significant than complexity: Toward an alternative to occam’s razor. In 7th Australian Joint Conference on Artificial Intelligence – Artificial Intelligence: Sowing the Seeds for the Future, 60-67, Singapore, 1994, World Scientific. • Ming Li and Paul Vitanyi . An Introduction to Kolmogorov Complexity and Its Applications (2nd Ed.). Springer Verlag.

More Related