480 likes | 708 Views
Learning Markov Logic Network Structure Via Hypergraph Lifting. Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos. Goal of LHL. Synopsis of LHL. Teaches. Professor. Course. Pete. Advises. Input : Relational DB.
E N D
LearningMarkov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos
Goal of LHL Synopsis of LHL Teaches Professor Course Pete Advises Input: Relational DB CS1 Output: Probabilistic KB Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Paul CS2 2.7Teaches(p, c) ÆTAs(s, c) ) Advises(p, s) 1.4Advises(p, s) ) Teaches(p, c)Æ TAs(s, c) -1.1 TAs(s, c) Æ Advises (s, p) … Advises Advises TAs TAs Pat Input: Relational DB CS3 Pete Pete Pete Sam Pete Sam CS1 CS1 Sam Sam CS1 CS1 Output: Probabilistic KB Advises Phil 2.7 Teaches(p, c) ÆTAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c)Æ TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … Sam Pete Sam Pete Pete Pete CS2 CS2 CS2 Saul CS2 Saul Sam Sara Saul Sue CS4 Sara Paul Paul Sara Paul Paul CS1 CS1 Sara CS2 CS2 Sar … … … … … … … … … … … … CS5 Sam TAs Teaches Teaches CS6 Sara Student CS7 Saul CS8 Sue TAs 2
Experimental Results LHL LHL BUSL MSL BUSL MSL LHL LHL BUSL MSL MSL BUSL Conditional Log-Likelihood (CLL) Area under Prec Recall Curve (AUC) 3
Outline • Background • Learning via Hypergraph Lifting • Experiments • Future Work • Background • Learning via Hypergraph Lifting • Experiments • Future Work 4
Markov Logic • A logical KB is a set of hard constraintson the set of possible worlds • Let’s make them soft constraints:When a world violates a formula,it becomes less probable, not impossible • Give each formula a weight(Higher weight Stronger constraint)
Markov Logic • A Markov logic network (MLN) is a set of pairs (F,w) • F is a formula in first-order logic • wis a real number vector of truth assignments to ground atoms #true groundings of ith formula weight of ithformula partition function
Challenging task Few approaches to date [Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08] Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima MLN Structure Learning 7
While beam not empty Add unit clauses to beam While beam has changed Foreach clause c in beam c’à add a literal to c newClausesà newClauses[ c’ beamÃkbest clauses in beam [ newClauses Add best clause in beam to MLN MSL [Kok & Domingos, ICML’05] 8
Find paths of linked ground atoms !formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths Relational Pathfinding[Richards & Mooney, AAAI’92] CS1 CS2 CS3 Teaches Pete CS4 Paul CS5 Pete CS1 Pat Phil CS6 Advises Advises( p , s )ÆTeaches( p , c )ÆTAs( s , c ) Advises(Pete, Sam)ÆTeaches(Pete, CS1)ÆTAs(Sam, CS1) Sam CS7 Sam Sara CS8 Saul 9 Sue TAs
Find short paths with a form of relational pathfinding Path!Boolean variable!Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN BUSL[Mihalkova & Mooney, ICML’07] Advises( p,s) ÆTeaches(p,c) … Advises(p,s) VTeaches(p,c)V TAs(s,c) :Advises(p,s) V : Teaches(p,c)V TAs(s,c) … TAs(s,c) 10
Background Learning via Hypergraph Lifting Experiments Future Work Outline 11
Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants Learning viaHypergraph Lifting (LHL) CS1 Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises CS2 CS3 Teaches Sam Sara Saul Sue Pete CS4 Paul CS5 Pat TAs “Lift” Phil CS6 Advises Sam CS7 Sara CS8 Saul Sue 12 TAs
Uses a hypergraph(V,E) V: set of nodes E : set of labeled, non-empty, ordered subsets of V Find pathsin a hypergraph Path: set of hyperedgess.t. for any two e0 and en, 9 sequence of hyperedges in set that leads from e0Ãen Learning viaHypergraph Lifting (LHL) 13
Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges´ True ground atoms Learning viaHypergraph Lifting (LHL) Teaches Pete Advises CS1 Paul CS2 Advises Pat TAs CS3 Pete Sam Pete CS1 Sam CS1 Phil Pete Pete Sam CS2 Saul CS2 DB CS4 Paul Sara Paul CS1 CS2 Sara … … … … … … CS5 Sam Teaches CS6 Sara CS7 Saul CS8 Sue TAs 14
LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges Traces paths in lifted hypergraph LHL = Clustering + Relational Pathfinding CS1 Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises CS2 CS3 Teaches Sam Sara Saul Sue Pete CS4 Paul CS5 Pat TAs Phil CS6 Advises “Lift” Sam CS7 Sara CS8 Saul Sue 15 TAs
Learning via Hypergraph Lifting LHL has three components LiftGraph: Lifts hypergraph FindPaths: Finds paths in lifted hypergraph CreateMLN: Creates rules from paths, and adds good ones to empty MLN 16
Defined using Markov logic Jointly clusters constants in bottom-up agglomerative manner Allows information to propagate from one cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true ground atom LiftGraph 17
Find cluster assignment Cthat maxmizes posterior prob. P(C | D) / P(D| C)P(C) Learning Problem in LiftGraph Truth values of ground atoms Defined with an MLN Defined with another MLN 18
For each predicater and each cluster combination containing a true ground atom of r, we have an atom prediction rule • LiftGraph’sP(D|C) MLN Professor Course Course Professor Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Teaches Pete Paul Pat Phil Advises Sam Sara Saul Sue TAs Student 19
LiftGraph’sP(D|C) MLN • For each predicaterand each cluster combination containing a true ground atom of r, we have an atom prediction rule Professor Course Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches p 2 Æ c 2 ) Teaches(p,c) 20
For each predicater, we have a default atom prediction rule • LiftGraph’sP(D|C) MLN x 2 Æ y 2 Professor Professor Course Default Cluster Combination Pete Paul Pat Phil Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 ) Teaches(x,y) Student Sam Sara Saul Sue Æ y 2 x 2 … 21
Each symbol belongs to exactly one cluster Infinite weight Exponential prior on #cluster combinations Negative weight -¸ • LiftGraph’sP(C) MLN 22
Hard assignments of constants to clusters Weights and log-posterior computed in closed form Searches for cluster assignment with highest log-posterior • LiftGraph 23
LiftGraph’s SearchAlgm CS1 Teaches CS2 Pete Paul Pete Pete CS3 Paul Paul Sam Advises Sara 24
LiftGraph’s SearchAlgm CS1 CS2 CS1 CS2 CS3 CS1 CS1 Teaches CS2 CS2 Pete Paul Teaches CS3 CS3 Sam Sara Advises Sam Sam Advises Sara Sara 25
FindPaths Paths Found Pete Paul Pat Phil Advises(,) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises(,) , Teaches( , ) Advises Sam Sara Saul Sue Pete Paul Pat Phil Advises(,) , Teaches( , ), TAs( , ) TAs Sam Sara Saul Sue 26
Clause Creation Pete Paul Pat Phil Pete Paul Pat Phil Sam Sara Saul Sue Sam Sara Saul Sue Advises(,) p s Advises( ,) Advises(,), :Advises(p, s) V :Teaches(p, c) V:TAs(s,c) Advises(p, s)Æ Teaches(p, c)ÆTAs(s,c) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Advises(p, s) V :Teaches(p, c) V:TAs(s,c) Pete Paul Pat Phil Pete Paul Pat Phil Æ Æ p Teaches(, ) c Teaches(,) Teaches(,), Advises(p, s) VTeaches(p, c) V:TAs(s,c) … CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Sam Sara Saul Sue Sam Sara Saul Sue Æ Æ TAs(,) TAs(,) TAs(, ) s c 27
Clause Pruning Score -1.15 : Advises(p, s) V :Teaches(p, c) VTAs(s,c) -1.17 Advises(p, s) V :Teaches(p, c) VTAs(s,c) … … -2.21 : Advises(p, s) V :Teaches(p, c) -2.23 : Advises(p, s) VTAs(s,c) -2.03 :Teaches(p, c) VTAs(s,c) … … :Advises(p, s) -3.13 ` : Teaches(p, c) -2.93 -3.93 TAs(s,c) 28
Clause Pruning Compare each clause against its sub-clauses (taken individually) Score -1.15 : Advises(p, s) V :Teaches(p, c) VTAs(s,c) -1.17 Advises(p, s) V :Teaches(p, c) VTAs(s,c) … … -2.21 : Advises(p, s) V :Teaches(p, c) -2.23 : Advises(p, s) VTAs(s,c) -2.03 :Teaches(p, c) VTAs(s,c) … … :Advises(p, s) -3.13 : Teaches(p, c) -2.93 -3.93 TAs(s,c) 29
Add clauses to empty MLN in order of decreasing score Retrain weights of clauses each time clause is added Retain clause in MLN if overall score improves MLN Creation 30
Background Learning via Hypergraph Lifting Experiments Future Work Outline 31
IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones Datasets 32
Cora Citations to computer science papers Papers, authors, titles, etc., and their relationships 687,422ground atoms; 42,558 true ones Datasets 33
Five-fold cross validation Inferred prob. true for groundings of each predicate Groundings of all other predicates as evidence Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL) Methodology 34
MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours Methodology 35
Compared with MSL[Kok & Domingos, ICML’05] BUSL[Mihalkova & Mooney, ICML’07] Lesion study NoLiftGraph: LHL with no hypergraph lifting Find paths directly from unliftedhypergraph NoPathFinding: LHL with no pathfinding Use MLN representing LiftGraph Methodology 36
LHL vs. BUSL vs. MSLArea under Prec-Recall Curve IMDB UW-CSE LHL BUSL MSL LHL BUSL MSL Cora LHL BUSL MSL 37
LHL vs. BUSL vs. MSLConditional Log-likelihood IMDB UW-CSE LHL BUSL MSL LHL BUSL MSL Cora LHL BUSL MSL
LHL vs. BUSL vs. MSLRuntime IMDB UW-CSE min hr LHL BUSL MSL LHL BUSL MSL Cora hr LHL BUSL MSL 39
LHL vs. NoLiftGraphArea under Prec-Recall Curve IMDB UW-CSE NoLift Graph NoLift Graph LHL LHL Cora NoLift Graph LHL 40
LHL vs. NoLiftGraphConditional Log-likelihood IMDB UW-CSE NoLift Graph NoLift Graph LHL LHL Cora NoLift Graph LHL 41
LHL vs. NoLiftGraphRuntime IMDB UW-CSE min hr NoLift Graph NoLift Graph LHL LHL Cora hr NoLift Graph LHL 42
LHL vs. NoPathFinding IMDB UW-CSE AUC AUC NoPath Finding NoPath Finding LHL LHL CLL CLL NoPath Finding NoPath Finding LHL LHL 43
ifa is an actor andd is a director, and they both worked in the same movie, thenaprobably worked under d ifp is a professor, andp co-authored a paper with s, thens is likely a student if papers x and y have same author then x and y are likely to be same paper Examples of Rules Learned 44
Motivation Background Learning via Hypergraph Lifting Experiments Future Work Outline 45
Integrate the components of LHL Integrate LHL with lifted inference [Singla & Domingos, AAAI’08] Construct ontology simultaneously with probabilistic KB Further scale LHL up Apply LHL to larger, richer domains e.g., the Web Future Work 46
LHL = Clustering + Relational Pathfinding “Lifts” data into more compact form Essential for speeding up relational pathfinding LHL outperforms state-of-the-art structure learners Conclusion 47