310 likes | 456 Views
Implicit learning of common sense for reasoning. Brendan Juba Harvard University. A convenient example.
E N D
Implicit learning of common sense for reasoning Brendan Juba Harvard University
A convenient example “Thomson visited Cooper’s grave in 1765. At that date, he had been traveling[resp.: dead] for five years.“Who had been traveling [resp.: dead]?”(The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012]) Our approach: learn sufficient knowledge to answer such queries from examples.
The task • The examples may be incomplete (a * in the table) • GivenIn_grave(Cooper), we wish to infer¬Traveling(Cooper) • Follows from In_grave(x)⇒¬Alive(x), Traveling(x)⇒Alive(x) • These two rules can be learned from this data • Challenge: how can we tell which rules to learn?
This work Given: examples, KB, and a query… • Proposes a criterion for learnability of rules in reasoning: “witnessed evaluation” • Presents a simple algorithm for efficiently considering all such rules for reasoning in any “natural” (tractable) fragment • “Natural” defined previously by Beame, Kautz, Sabharwal (JAIR 2004) • Tolerant to counterexamples as appropriate for application to “common sense” reasoning
This work • Only concerns learned “common sense” • Cf. Spelke’s “core knowledge:” naïve theories, etc. • But: use of logical representations provide potential “hook” into traditional KR • Focuses on confirming or refuting query formulas on a domain(distribution) • As opposed to: predicting missing attributes in a given example (cf. past work on PAC-Semantics)
Why not use… Bayes nets/Markov Logic/etc.? • Learning is the Achilles heel of these approaches:Even if the distributions are described by a simple network, how do we find the dependencies?
Outline • PAC-Semantics: model for learned knowledge • Suitable for capturing learned common sense • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee
PAC Semantics (for propositional logic) Valiant, (AIJ 2000) • Recall: propositional logic consists of formulas built from variables x1,…,xn, and connectives, e.g., ∧(AND), ∨(OR), ¬(NOT) • Defined with respect to a background probability distributionD over {0,1}n (Boolean assignments to x1,…,xn) • Definition. A formula φ(x1,…,xn) is (1-ε)-valid under D if PrD[φ(x1,…,xn)=1] ≥ 1-ε. A RULE OF THUMB…
Examples In_grave(x)⇒¬Alive(x) Buried Alive!! Appears to be ≈86%-valid… Grave-digger
Examples Traveling(x)⇒Alive(x) Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ
The theorem, informally Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples. • Can’t afford to explicitly consider all rules! • Won’t even be able to identify rules simulated • Thus: rules are “learned implicitly”
Outline • PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee
Masking processesMichael, (AIJ 2010) • A masking functionm : {0,1}n → {0,1,*}ntakes an example (x1,…,xn) to a partial example by replacing some values with * • A masking processM is a masking functionvalued random variable • NOTE: the choice of attributes to hide may depend on the example!
Restricting formulas Given a formula φ and masked example ρ, the restriction of φ under ρ, φ|ρ, is obtained by “plugging in” the values of ρifor xiwhenever ρi≠ * and recursively simplifying(using game-tree evaluation). I.e., φ|ρ is a formula in the unknown values. ∧ =1 ρ: x=0, y=0 ¬z ∨ ∨ =0 ¬x y ¬z z =1
Witnessed formulas We will learn rules that can be observed to hold under the given partial information: • Definition.ψ is (1-ε)-witnessed under a distribution over partial examples M(D) ifPrρ∈M(D)[ψ|ρ=1] ≥ 1-ε • We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula… Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs
Outline • PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee
Example: Resolution (“RES”) • A proof system for refuting CNFs (AND of ORs) • Equiv., for proving DNFs (ORs of ANDs) • Operates on clauses—given a set of clauses {C1,…,Ck}, may derive • (“weakening”) Ci∨l from any Ci(where l is any literal—a variable or its negation) • (“cut”) C’i∨C’jfrom Ci=C’i∨xand Cj=C’j∨¬x • Refute a CNF by deriving empty clause from it
Tractable fragments of RES • Bounded-width • Treelike, bounded clause space ∅ xi ¬xi Space-2 ≡ “unit propagation,” simulates chaining … ¬xi∨xj ¬xi∨¬xj
Tractable fragments of RES • Bounded-width • Treelike, bounded clause space • Applying a restriction to every step of proofs of these forms yields proofs of the same form(from a refutation of φ, we obtain a refutation of φ|ρ of the same syntactic form) • Def’n (BKS’04): such fragments are “natural”
Other “natural” fragments… • Bounded width k-DNF resolution • L1-bounded, sparse cutting planes • Degree-bounded polynomial calculus • (more?) Requires that restrictions preserve the special syntactic form
Outline • PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee
The basic algorithm • Given query DNF φ and masked ex’s {ρ1,…,ρk} • For each ρi, search for a refutation of ¬φ|ρi • If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject. • CAN INCORPORATE KB CNF Φ: REFUTE [Φ∧¬φ]|ρi
Example space-2 treelike RES refutation ∅ Refute ¬Traveling Traveling ¬Alive ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Supporting “common sense” premises
Example [Traveling∧In_grave]|ρ1 ∅ Trivial refutation Refute ¬Traveling Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Example ρ1: In_grave= 0, Alive = 1
Example [Traveling∧In_grave]|ρ2 Trivial refutation ∅ Refute =T ¬Traveling =T Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Exampleρ2: Traveling = 0, Alive = 0
The theorem, formally The algorithm uses 1/γ2log1/δ partial examples to distinguish the following cases w.p. 1-δ: • The queryφ is not (1-ε-γ)-valid • There exists a (1-ε+γ)-witnessed formula ψfor which there exists a proof of the query φ from ψ LEARN ANYψTHAT HELPS VALIDATE THE QUERYφ. N.B.: ψMAY NOT BE 1-VALID
Analysis • Note that resolution is sound… • So, whenever a proof of φ|ρi exists, φwas satisfied by the example from D • If φ is not (1-ε-γ)-valid, tail bounds imply that it is unlikely that a (1-ε) fraction satisfied φ • On the other hand, consider the proof of φfrom the (1-ε+γ)-witnessed CNF ψ… • With probability (1-ε+γ), all of the clauses of ψsimplify to 1 • The restricted proof does not require clauses of ψ “Implicitly learned”
Recap: this work… • Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation” • Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment • “Natural” defined by Beame, Kautz, Sabharwal (JAIR 2004) means: “closed under plugging in partial info.” • Tolerant to counterexamples as appropriate for application to “common sense” reasoning
Prior work: Learning to Reason • Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples • No mention of theorem-proving whatsoever! • Could only handle low-width queries under incomplete information (Mach. Learn. 1999) • Noise-tolerant learning captures (some kinds of) common sense (Roth, IJCAI’95)
Work in progress • Further integration of learning and reasoning • Deciding general RES for limited learning problems in quasipoly-time: arXiv:1304.4633 • Limits of this approach: ECCC TR13-094 • Integration with “fancier” semantics (e.g., naf) • The point: want to consider proofs using such “implicitly learned” facts & rules
Future work • Empirical validation • Good domain? • Explicit learning of premises • Not hard for our fragments under “bounded concealment” (Michael AIJ 2010) • But: this won’t tolerate counterexamples!