1.29k likes | 1.3k Views
This study provides an alternative interpretation of Ockham's Razor, exploring the balance between simplicity and truth in theories. It questions the traditional views of simplicity bias and risk minimization, delving into the complexities of Bayesianism and convergence. The analysis showcases how different strategies can lead to convergence in theories over time and discusses the limitations of prior-based and convergence-based explanations. Through the lens of risk minimization, the study examines how estimates centered on truth may vary in constraint and spread, highlighting the importance of proximity to the true theory. The text also touches upon practical applications in causal data mining, emphasizing the correlation versus causation dilemma when multiple variables are involved. Overall, it aims to redefine the essence of Ockham's Razor in the pursuit of scientific and philosophical truth.
E N D
Simplicity and Truth:an Alternative Explanation of Ockham's Razor Kevin T. Kelly Conor Mayo-Wilson Department of Philosophy Joint Program in Logic and Computation Carnegie Mellon University www.hss.cmu.edu/philosophy/faculty-kelly.php
Ockham Says: Choose the Simplest!
But Why? Gotcha!
Puzzle • An indicator must be sensitive to what it indicates. simple
Puzzle • A reliable indicator must be sensitive to what it indicates. complex
Puzzle • But Ockham’s razor always points at simplicity. simple
Puzzle • But Ockham’s razor always points at simplicity. complex
Puzzle • How can a broken compass help you find something unless you already know where it is? complex
1. Prior Simplicity Bias • Bayes, BIC, MDL, MML, etc. 2. Risk Minimization SRM, AIC, cross-validation, etc. Standard Accounts
1. Prior Simplicity Bias The simple theory is more plausible now because it was more plausible yesterday.
More Subtle Version • Simple data are a miracle in the complex theory but not in the simple theory. Regularity: retrograde motion of Venus at solar conjunction Has to be! P C
However… • e would not be a miracle given P(q); Why not this? P C
The Real Miracle C P Ignoranceabout model: p(C) p(P); +Ignoranceabout parameter setting: p’(P(q) | P) p(P(q’ ) | P). = Knowledge about C vs. P(q): p(P(q)) << p(C). q q q q q q q q Leadinto gold. Perpetual motion. Free lunch. Ignorance is knowledge. War is peace. I love Big Bayes.
Standard Paradox of Indifference Ignoranceofred vs. not-red +Ignoranceover not-red: =Knowledgeabout red vs. white. q q Knognorance = All the priveleges of knowledge With none of the responsibilities Yeah!
The Ellsberg Paradox 1/3 ? ?
b c a c Human Preference 1/3 ? ? b a b > <
b c a c Human View 1/3 ? ? knowledge ignorance b a b > ignorance knowledge <
b c a c Bayesian View 1/3 ? ? knognorance knognorance b a b > knognorance knognorance >
In Any Event The coherentist foundations of Bayesianism have nothing to do with short-run truth-conduciveness. Not so loud!
Bayesian Convergence • Too-simple theories get shot down… Updated opinion Theories Complexity
Bayesian Convergence • Plausibility is transferred to the next-simplest theory… Updated opinion Plink! Theories Complexity Blam!
Bayesian Convergence • Plausibility is transferred to the next-simplest theory… Updated opinion Plink! Theories Complexity Blam!
Bayesian Convergence • Plausibility is transferred to the next-simplest theory… Updated opinion Plink! Theories Complexity Blam!
Bayesian Convergence • The true theory is nailed to the fence. Updated opinion Zing! Theories Complexity Blam!
Convergence • But alternative strategies also converge: • Anything in the short run is compatible with convergence in the long run.
Summary of Bayesian Approach • Prior-based explanations of Ockham’s razor are circular and based on a faulty model of ignorance. • Convergence-based explanations of Ockham’s razor fail to single out Ockham’s razor.
2. Risk Minimization • Ockham’s razor minimizes expected distance of empirical estimates from the true value. Truth
Unconstrained Estimates • are Centered on truth but spread around it. Pop! Pop! Pop! Pop! Unconstrained aim
Constrained Estimates • Off-center but less spread. Truth Clamped aim
Constrained Estimates • Off-center but less spread • Overall improvement in expected distance from truth… Pop! Pop! Pop! Pop! Truth Clamped aim
Doesn’t Find True Theory • The theory that minimizes estimation risk can be quite false. Four eyes! Clamped aim
Makes Sense …when loss of an answer is similar in nearby distributions. Close is good enough! Loss Similarity p
But Truth Matters …when loss of an answer is discontinuous with similarity. Loss Close is no cigar! Similarity p
E.g. Science If you want true laws, false laws aren’t good enough.
E.g. Science You must be a philosopher. This is a machine learning conference.
E.g., Causal Data Mining Protein A Protein C Cancer protein Protein B Now you’re talking! I’m on a cilantro-only diet to get my protein C level under control. Practical enough?
Central Idea Correlation does imply causation if there are multiple variables, some of which are common effects. [Pearl, Spirtes, Glymour and Scheines] Protein A Protein C Cancer protein Protein B
Core assumptions Joint distribution p is causally compatible with directed, acyclic graph G iff: Causal Markov Condition: each variable X is independent of its non-effects given its immediate causes. Faithfulness Condition: no other conditional independence relations hold in p.
F2 Tell-tale Dependencies C C H F1 F Given F, H gives some info about C (Faithfulness) Given C, F1 gives no further info about F2 (Markov)
Common Applications • Linear Causal Case: each variable X is a linear function of its parents and a normally distributed hidden variable called an “error term”. The error terms are mutually independent. • Discrete Multinomial Case: each variable X takes on a finite range of values.
Genetics Smoking Cancer A Very Optimistic Assumption No unobserved latent confounding causes I’ll give you this one. What’s he up to?
Current Nutrition Wisdom Protein A Protein C Cancer protein Protein B Are you kidding? It’s dripping with Protein C! English Breakfast?
As the Sample Increases… Protein A Protein C Cancer protein weak Protein B Protein D This situation approximates The last one. So who cares? I do! Out of my way!
As the Sample Increases Again… Protein E Protein A weak Protein C Cancer protein weak Protein B weak Protein D Wasn’t that last approximation to the truth good enough? Aaack! I’m poisoned!
Causal Flipping Theorem No matter what a consistent causal discovery procedure has seen so far, there exists a pair G, p satisfying the assumptions so that the current sample is arbitrarily likely and the procedure produces arbitrarily many opposite conclusions in p as sample size increases. oops I meant oops I meant oops I meant
The Wrong Reaction • The demon undermines justification of science. • He must be defeated to forestall skepticism. • Bayesian circularity • Classical instrumentalism Urk! Grrrr!
Another View • Many explanations have been offered to make sense of the here-today-gone-tomorrow nature of medical wisdom — what we are advised with confidence one year is reversed the next — but the simplest one is that it is the natural rhythm of science. • (Do We Really Know What Makes us Healthy, NY Times Magazine, Sept. 16, 2007).
Zen Approach • Get to know the demon. • Locate the justification of Ockham’s razor in his power.