Implicit learning of common sense for reasoning

Implicit learning of common sense for reasoning Brendan Juba Harvard University

A convenient example “Thomson visited Cooper’s grave in 1765. At that date, he had been traveling[resp.: dead] for five years.“Who had been traveling [resp.: dead]?”(The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012]) Our approach: learn sufficient knowledge to answer such queries from examples.

The task • The examples may be incomplete (a * in the table) • GivenIn_grave(Cooper), we wish to infer¬Traveling(Cooper) • Follows from In_grave(x)⇒¬Alive(x), Traveling(x)⇒Alive(x) • These two rules can be learned from this data • Challenge: how can we tell which rules to learn?

This work Given: examples, KB, and a query… • Proposes a criterion for learnability of rules in reasoning: “witnessed evaluation” • Presents a simple algorithm for efficiently considering all such rules for reasoning in any “natural” (tractable) fragment • “Natural” defined previously by Beame, Kautz, Sabharwal (JAIR 2004) • Tolerant to counterexamples as appropriate for application to “common sense” reasoning

This work • Only concerns learned “common sense” • Cf. Spelke’s “core knowledge:” naïve theories, etc. • But: use of logical representations provide potential “hook” into traditional KR • Focuses on confirming or refuting query formulas on a domain(distribution) • As opposed to: predicting missing attributes in a given example (cf. past work on PAC-Semantics)

Why not use… Bayes nets/Markov Logic/etc.? • Learning is the Achilles heel of these approaches:Even if the distributions are described by a simple network, how do we find the dependencies?

Outline • PAC-Semantics: model for learned knowledge • Suitable for capturing learned common sense • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee

PAC Semantics (for propositional logic) Valiant, (AIJ 2000) • Recall: propositional logic consists of formulas built from variables x1,…,xn, and connectives, e.g., ∧(AND), ∨(OR), ¬(NOT) • Defined with respect to a background probability distributionD over {0,1}n (Boolean assignments to x1,…,xn) • Definition. A formula φ(x1,…,xn) is (1-ε)-valid under D if PrD[φ(x1,…,xn)=1] ≥ 1-ε. A RULE OF THUMB…

Examples In_grave(x)⇒¬Alive(x) Buried Alive!! Appears to be ≈86%-valid… Grave-digger

Examples Traveling(x)⇒Alive(x) Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ

The theorem, informally Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples. • Can’t afford to explicitly consider all rules! • Won’t even be able to identify rules simulated • Thus: rules are “learned implicitly”

Outline • PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee

Masking processesMichael, (AIJ 2010) • A masking functionm : {0,1}n → {0,1,*}ntakes an example (x1,…,xn) to a partial example by replacing some values with * • A masking processM is a masking functionvalued random variable • NOTE: the choice of attributes to hide may depend on the example!

Restricting formulas Given a formula φ and masked example ρ, the restriction of φ under ρ, φ|ρ, is obtained by “plugging in” the values of ρifor xiwhenever ρi≠ * and recursively simplifying(using game-tree evaluation). I.e., φ|ρ is a formula in the unknown values. ∧ =1 ρ: x=0, y=0 ¬z ∨ ∨ =0 ¬x y ¬z z =1

Witnessed formulas We will learn rules that can be observed to hold under the given partial information: • Definition.ψ is (1-ε)-witnessed under a distribution over partial examples M(D) ifPrρ∈M(D)[ψ|ρ=1] ≥ 1-ε • We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula… Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs

Example: Resolution (“RES”) • A proof system for refuting CNFs (AND of ORs) • Equiv., for proving DNFs (ORs of ANDs) • Operates on clauses—given a set of clauses {C1,…,Ck}, may derive • (“weakening”) Ci∨l from any Ci(where l is any literal—a variable or its negation) • (“cut”) C’i∨C’jfrom Ci=C’i∨xand Cj=C’j∨¬x • Refute a CNF by deriving empty clause from it

Tractable fragments of RES • Bounded-width • Treelike, bounded clause space ∅ xi ¬xi Space-2 ≡ “unit propagation,” simulates chaining … ¬xi∨xj ¬xi∨¬xj

Tractable fragments of RES • Bounded-width • Treelike, bounded clause space • Applying a restriction to every step of proofs of these forms yields proofs of the same form(from a refutation of φ, we obtain a refutation of φ|ρ of the same syntactic form) • Def’n (BKS’04): such fragments are “natural”

Other “natural” fragments… • Bounded width k-DNF resolution • L1-bounded, sparse cutting planes • Degree-bounded polynomial calculus • (more?) Requires that restrictions preserve the special syntactic form

The basic algorithm • Given query DNF φ and masked ex’s {ρ1,…,ρk} • For each ρi, search for a refutation of ¬φ|ρi • If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject. • CAN INCORPORATE KB CNF Φ: REFUTE [Φ∧¬φ]|ρi

Example space-2 treelike RES refutation ∅ Refute ¬Traveling Traveling ¬Alive ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Supporting “common sense” premises

Example [Traveling∧In_grave]|ρ1 ∅ Trivial refutation Refute ¬Traveling Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Example ρ1: In_grave= 0, Alive = 1

Example [Traveling∧In_grave]|ρ2 Trivial refutation ∅ Refute =T ¬Traveling =T Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Exampleρ2: Traveling = 0, Alive = 0

The theorem, formally The algorithm uses 1/γ2log1/δ partial examples to distinguish the following cases w.p. 1-δ: • The queryφ is not (1-ε-γ)-valid • There exists a (1-ε+γ)-witnessed formula ψfor which there exists a proof of the query φ from ψ LEARN ANYψTHAT HELPS VALIDATE THE QUERYφ. N.B.: ψMAY NOT BE 1-VALID

Analysis • Note that resolution is sound… • So, whenever a proof of φ|ρi exists, φwas satisfied by the example from D • If φ is not (1-ε-γ)-valid, tail bounds imply that it is unlikely that a (1-ε) fraction satisfied φ • On the other hand, consider the proof of φfrom the (1-ε+γ)-witnessed CNF ψ… • With probability (1-ε+γ), all of the clauses of ψsimplify to 1 • The restricted proof does not require clauses of ψ “Implicitly learned”

Recap: this work… • Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation” • Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment • “Natural” defined by Beame, Kautz, Sabharwal (JAIR 2004) means: “closed under plugging in partial info.” • Tolerant to counterexamples as appropriate for application to “common sense” reasoning

Prior work: Learning to Reason • Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples • No mention of theorem-proving whatsoever! • Could only handle low-width queries under incomplete information (Mach. Learn. 1999) • Noise-tolerant learning captures (some kinds of) common sense (Roth, IJCAI’95)

Work in progress • Further integration of learning and reasoning • Deciding general RES for limited learning problems in quasipoly-time: arXiv:1304.4633 • Limits of this approach: ECCC TR13-094 • Integration with “fancier” semantics (e.g., naf) • The point: want to consider proofs using such “implicitly learned” facts & rules

Future work • Empirical validation • Good domain? • Explicit learning of premises • Not hard for our fragments under “bounded concealment” (Michael AIJ 2010) • But: this won’t tolerate counterexamples!

Implicit learning of common sense for reasoning

Implicit learning of common sense for reasoning

Presentation Transcript

COMMON SENSE

Common Sense

Common Sense Reasoning for Interactive Applications

Common Sense

Implicit learning of common sense for reasoning

COMMON SENSE

Implicit learning

Automating Common Sense Reasoning

Common Sense

Automating Common Sense Reasoning

Detecting False Captioning Using Common Sense Reasoning