1 / 24

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases . Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald. Information Extraction. YAGO/DBpedia et al. bornOn(Jeff, 09/22/42). gradFrom(Jeff, Columbia). hasAdvisor(Jeff, Arthur). hasAdvisor(Surajit, Jeff).

conroy
Download Presentation

URDF Query-Time Reasoning in Uncertain RDF Knowledge Bases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. URDFQuery-Time Reasoning in Uncertain RDF Knowledge Bases Ndapandula Nakashole Mauro Sozio Fabian Suchanek Martin Theobald

  2. Information Extraction YAGO/DBpedia et al. bornOn(Jeff, 09/22/42) gradFrom(Jeff, Columbia) hasAdvisor(Jeff, Arthur) hasAdvisor(Surajit, Jeff) knownFor(Jeff, Theory) >120 M facts for YAGO2 (mostly from Wikipedia infoboxes) type(Jeff, Author)[0.9] author(Jeff, Drag_Book)[0.8] New fact candidates author(Jeff,Cind_Book)[0.6] worksAt(Jeff, Bell_Labs)[0.7] type(Jeff, CEO)[0.4] 100’s M additional facts from Wikipedia text

  3. Outline • Motivation & Problem Setting • URDF running example: people graduating from universities • Efficient MAP Inference • MaxSAT solving with soft & hard constraints • Grounding • Deductive grounding of soft rules (SLD resolution) • Iterative grounding of hard rules (closure) • MaxSAT Algorithm • MaxSAT algorithm in 3 steps • Experiments & Future Work Query-Time Reasoning in Uncertain RDF Knowledge Bases

  4. URDF: Uncertain RDF Data Model • Extensional Layer (information extraction & integration) • High-confidence facts:existing knowledge base (“ground truth”) • New fact candidates: extracted facts with confidence values • Integration of different knowledge sources: Ontology merging or explicit Linked Data(owl:sameAs, owl:equivProp.) Large “Uncertain Database” of RDFfacts • Intensional Layer (query-time inference) • Soft rules:deductive grounding & lineage (Datalog/SLD resolution) • Hard rules:consistency constraints (more general FOL rules) • Propositional&probabilisticconsistency reasoning Query-Time Reasoning in Uncertain RDF Knowledge Bases

  5. Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints • People may live inmore than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) • People are not born indifferent places/on different dates bornIn(x,y)  bornIn(x,z)  y=z • People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2) [0.8] [0.5] Query-Time Reasoning in Uncertain RDF Knowledge Bases

  6. Soft Rules vs. Hard Rules Rule-based (deductive) reasoning: Datalog, RDF/S, OWL2-RL, etc. (Soft) Deduction Rules vs. (Hard) Consistency Constraints • People may live inmore than one place livesIn(x,y)  marriedTo(x,z)  livesIn(z,y) livesIn(x,y)  hasChild(x,z)  livesIn(z,y) • People are not born indifferent places/on different dates bornIn(x,y)  bornIn(x,z)  y=z • People are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t1)  marriedTo(x,z,t2)  y≠z  disjoint(t1,t2) [0.8] [0.5] FOL constraints (in particular mutex): Datalog with constraints, X-tuples in Prob. DB’s owl:FunctionalProperty, etc. Query-Time Reasoning in Uncertain RDF Knowledge Bases

  7. URDF Running Example KB:RDF Base Facts First-OrderRules hasAdvisor(x,y)  worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Computer Scientist type[1.0] type[1.0] type[1.0] hasAdvisor[0.7] hasAdvisor[0.8] Jeff Surajit David graduatedFrom[0.9] graduatedFrom[?] graduatedFrom[0.6] graduatedFrom[?] graduatedFrom[?] graduatedFrom[?] graduatedFrom[0.7] Stanford Princeton • Derived Facts • gradFrom(Surajit,Stanford) • gradFrom(David,Stanford) worksAt[0.9] type[1.0] type[1.0] University Query-Time Reasoning in Uncertain RDF Knowledge Bases

  8. Basic Types of Inference • Maximum-A-Posteriori (MAP) Inference • Find the most likely assignment to query variables y under a given evidence x. • Compute: argmaxy P( y | x)(NP-hard for propositional formulas, e.g., MaxSAT over CNFs) • Marginal/Success Probabilities • Probability that query y is true in a random world under a given evidence x. • Compute: ∑y P( y | x)(#P-hard for propositional formulas) Query-Time Reasoning in Uncertain RDF Knowledge Bases

  9. General Route: Grounding & MaxSAT Solving Query graduatedFrom(x, y) • 1) Grounding • Consider only facts (and rules)which are relevant for answering the query • 2) Propositional formula in CNF, consisting of • Grounded hard & soft rules • Uncertain base facts • 3) Propositional Reasoning • Find truth assignment to facts such that the total weight of the satisfied clauses is maximized  MAP inference: compute “most likely”possible world CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) (graduatedFrom(David, Stanford) graduatedFrom(David, Princeton))  (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton) 1000 1000 0.4 0.4 0.9 0.8 0.7 0.6 0.7 0.9 Query-Time Reasoning in Uncertain RDF Knowledge Bases

  10. Why are high weights for hard rules not enough? CNF (graduatedFrom(Surajit, Stanford) graduatedFrom(Surajit, Princeton)) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford) • Consider the following CNF (for A,B > 0, A >> B) • The optimal solution has weight A+B • The next-best solution has weight A+0 • Hence the ratio of the optimal over the approximate solution is A+B / A • In general, any (1+) approximation algorithm, with > 0, may set graduatedFrom(Surajit, Princeton)to true, as A+B /A 1 for A  . A 0 B Query-Time Reasoning in Uncertain RDF Knowledge Bases

  11. URDF: MaxSAT Solving with Soft & Hard Rules Special case:Horn-clauses as soft rules & mutex-constraintsas hard rules • Find:argmaxy P( y | x) • Resolves to a variant of MaxSAT for propositional formulas 0.4 0.4 0.9 0.8 0.7 0.6 0.7 0.9 { graduatedFrom(Surajit, Stanford), graduatedFrom(Surajit, Princeton) } { graduatedFrom(David, Stanford), graduatedFrom(David, Princeton) } S: Mutex-const. MaxSAT Alg. Compute W0 = ∑clauses C w(C) P(C is satisfied); For each hard constraint S { For each fact f in St { Compute Wf+t = ∑clauses C w(C) P(C is sat. | f = true); } Compute WS-t= ∑clauses C w(C) P(C is sat. | St= false); Choose truth assignment to f in St that maximizes Wf+t , WS-t ; Remove satisfied clauses C; t++; } (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) graduatedFrom(Surajit, Stanford))  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) graduatedFrom(David, Stanford)) worksAt(Jeff, Stanford) hasAdvisor(Surajit, Jeff) hasAdvisor(David, Jeff) graduatedFrom(Surajit, Princeton) graduatedFrom(Surajit, Stanford)  graduatedFrom(David, Princeton) C: Weighted Horn clauses (CNF) • Runtime: O(|S||C|) • Approximation guarantee of 1/2 Query-Time Reasoning in Uncertain RDF Knowledge Bases

  12. Deductive Grounding Algorithm (SLD Resolution/Datalog) First-Order Rules hasAdvisor(x,y)  worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y)  graduatedFrom(x,z)  y=z Query graduatedFrom(Surajit, y) • graduatedFrom • (Surajit, Princeton) • graduatedFrom • (Surajit, Stanford)  /\ Base Facts • graduatedFrom(Surajit, Princeton) [0.7] • graduatedFrom(Surajit, Stanford) [0.6] • graduatedFrom(David, Princeton) [0.9] • hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] • type(Princeton, University) [1.0] type(Stanford, University) [1.0] • type(Jeff, Computer_Scientist) [1.0] • type(Surajit, Computer_Scientist) [1.0] • type(David, Computer_Scientist) [1.0] • hasAdvisor • (Surajit,Jeff) • worksAt • (Jeff,Stanford) Grounded Rules • hasAdvisor(Surajit, Jeff)  • worksAt(Jeff, Stanford)  • gradFrom(Surajit, Stanford) • gradFrom(Surajit, Stanford) • gradFrom(Surajit, Princeton) Query-Time Reasoning in Uncertain RDF Knowledge Bases

  13. Dependency Graph of a Query • SLD grounding always starts from a query literal and first pursues over the soft deduction rules. • Grounding is also iterated over the hard rules in a top-down fashion by using the literals in each hard rule as new subqueries. • Cycles (due to recursive rules) are detected and resolved via a form of tabling known from Datalog. • Grounding terminates when a closure is reached, i.e., when no new facts can be grounded from the rules and all subgoals are either resolved or form the root of a cycle. Query-Time Reasoning in Uncertain RDF Knowledge Bases

  14. Weighted MaxSAT Algorithm General idea Compute a potential function Wt that iterates over all hard rules St and set the fact f  Stthat maximizes Wt(or none of them) to true;set all other facts in St to false. • At iteration 0, we have • At any intermediate iteration t, we compare • At the final iteration t_max, all facts are assigned either true or false. • Wt_max is equal to the total weight of all clauses that are satisfied. Query-Time Reasoning in Uncertain RDF Knowledge Bases

  15. Step 1 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) Query-Time Reasoning in Uncertain RDF Knowledge Bases

  16. Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) Query-Time Reasoning in Uncertain RDF Knowledge Bases

  17. Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C1: hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford) P(C1) = 1 – (1-(1-1))(1-(1-1))(1-1) = 1 single partition, negated: 1 - pi C: Weighted Horn clauses (CNF) single partition, negated: 1 - pi single partition, positive: pi Query-Time Reasoning in Uncertain RDF Knowledge Bases

  18. Step 2 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi) and probabilities pi P(C1 is satisfied) = 1-(1-(1-1))(1-(1-1))(1-1) = 1 P(C2is satisfied) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ... (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) • W0 = 0.4 + 0.9 + 0.8 + 0.7 + 0.6 + 0.7 + 0.9 = 5.0 Query-Time Reasoning in Uncertain RDF Knowledge Bases

  19. Step 3 { gradFrom(Surajit, Stanford), gradFrom(Surajit, Princeton) } { gradFrom(David, Stanford), gradFrom(David, Princeton) } S: Mutex-const. • Weights w(fi), probabilities pi, truth values P(C1 is satisfied | f1=true) = 1-(1-(1-1))(1-(1-1))(1-1) = 1 P(C1is satisfied | f2=true) = 1-(1-(1-1))(1-(1-1))(1-0) = 0 ... (hasAdvisor(Surajit, Jeff) worksAt(Jeff, Stanford) gradFrom(Surajit, Stanford))0.4  (hasAdvisor(David, Jeff) worksAt(Jeff, Stanford) gradFrom(David, Stanford))0.4 worksAt(Jeff, Stanford) 0.9 hasAdvisor(Surajit, Jeff) 0.8 hasAdvisor(David, Jeff) 0.7 gradFrom(Surajit, Princeton) 0.6 gradFrom(Surajit, Stanford)0.7  gradFrom(David, Princeton) 0.9 C: Weighted Horn clauses (CNF) • W1 = 0.4 + 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.8 • W2 = 0.4 + 0.9 + 0.8 + 0.7 + 0.7 + 0.9 = 4.4 Query-Time Reasoning in Uncertain RDF Knowledge Bases

  20. Experiments – Setup • YAGO Knowledge Base • 2 Mio entities,20 Mio facts • Soft Rules • 16 soft rules (hand-crafted deduction rules with weights) • Hard Rules • 5 predicates with functional properties (bornIn, diedIn, bornOnDate, diedOnDate, marriedTo) • Queries • 10 conjunctive SPARQL queries • Markov Logic as Competitor (based on MCMC) • MAP inference: Alchemy employs a form of MaxWalkSAT • MC-SAT: Iterative MaxSAT & Gibbs sampling Query-Time Reasoning in Uncertain RDF Knowledge Bases

  21. YAGO Knowledge Base: URDF vs. Markov Logic • First run: ground each query against the rules • (SLD grounding + MaxSAT solving) & report sum of runtimes • Asymptotic runtime checks: synthetic soft rule expansions URDF: SLD grounding & MaxSat solving • URDF vs. Markov Logic • (MAP inference & MC-SAT) • |C| - # ground literals in soft rules • |S| - # ground literals in hard rules Query-Time Reasoning in Uncertain RDF Knowledge Bases

  22. Recursive Rules & LUBM Benchmark • 42 inductively learned (partly recursive) rules over 20 Mio facts in YAGO • URDF grounding with different maximum SLD levels • URDF (SLD grounding + MaxSAT) vs. Jena (only grounding) over the LUBM benchmark • SF-1: 103,397 triplets • SF-5: 646,128 triplets • SF-10: 1,316,993 triplets Query-Time Reasoning in Uncertain RDF Knowledge Bases

  23. Current & Future Topics... • Temporal consistency reasoning • Soft/hard ruleswith temporal predicates • Soft deduction rules: deduce confidence distribution of derived facts • Learning soft rules & consistency constraints • Explore how Inductive Logic Programming can be applied to large, uncertain & incomplete knowledge bases • More solving/sampling • Linear-time constrained&weightedMaxSATsolver • Improved Gibbs sampling with soft & hard rules • Scale-out • Distributed grounding via message passing • Updates/versioning for (linked) RDF data • Non-monotonicanswers for rules with negation! Query-Time Reasoning in Uncertain RDF Knowledge Bases

  24. Online Demo! urdf.mpi-inf.mpg.de Query-Time Reasoning in Uncertain RDF Knowledge Bases

More Related