1 / 46

Learning the Structure of Markov Logic Networks

Learning the Structure of Markov Logic Networks. Stanley Kok. Overview. Introduction CLAUDIEN, CRFs Algorithm Evaluation Measure Clause Construction Search Strategies Speedup Techniques Experiments. Introduction.

rufin
Download Presentation

Learning the Structure of Markov Logic Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning the Structure of Markov Logic Networks Stanley Kok

  2. Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments

  3. Introduction • Richardson & Domingoes (2004) learned MLN structure in two disjoint steps: • Learn FO clauses with off-the-shelf ILP system (CLAUDIEN) • Learn clause weights by optimizing pseudo-likelihood • Develop algorithm: • Learns FO clauses by directly optimizing pseudo-likelihood • Fast enough • Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches

  4. CLAUDIEN • CLAUsal DIscovery ENgine • Starts with trivially false clause • Repeatedly refine current clauses by adding literals • Adds clauses that satisfy min accuracy and coverage to KB true ) false m ) false f ) false h ) false h ) f h ) m f ) h f ) m f^h ) false m ) h m^f ) false m^h ) false m ) f h ) m v f

  5. CLAUDIEN • language bias ´ clause template • Refine handcrafted KB • Example, • Professor(P) ( AdvisedBy(S,P) in KB • dlab_template(‘1-2:[Professor(P),Student(S)]<-AdvisedBy(S,P)’) • Professor(P) v Student(S) ( AdvisedBy(S,P)

  6. y1 y2 y3 yn-1 yn Misc Person Misc Org Misc x1,x2,…,xn IBM hired Alice…. Conditional Random Fields • Markov networks used to compute P(y|x) (McCallum2003) • Model: • Features, fk e.g. “current word is capitalized and next word is Inc”

  7. CRF – Feature Induction • Set of atomic features (word=the, capitalized etc) • Starts from empty CRF • While convergence criteria is not met • Create list of new features consisting of • Atomic features • Binary conjunctions of atomic features • Conjunctions of atomic features with features already in model • Evaluate gain in P(y|x) of adding each feature to model • Add best K features to model (100s-1000s features)

  8. Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =  • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses

  9. Evaluation Measure • Ideally use log-likelihood, but slow • Recall: • Value: • Gradient:

  10. Evaluation Measure • Use pseudo-log-likelihood (R&D(2004)), but • Undue weight to predicates with large # of groundings • Recall: • E.g.:

  11. Evaluation Measure • Use weighted pseudo-log-likelihood (WPLL) • E.g.:

  12. Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =  • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses

  13. Clause Construction • Add a literal (negative/positive) • All possible ways variables of new literal can be shared with those of clause • !Student(S)vAdvBy(S,P) • Remove a literal (when refining MLN) • Remove spurious conditions from rules • !Student(S)v !YrInPgm(S,5) vTA(S,C) vTmpAdvBy(S,P)

  14. Clause Construction • Flip signs of literals (when refining MLN) • Move literals on wrong side of implication • !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !SameCse(C1,C2) v !SameQtr(Q1,Q2) • Beginning of algorithm • Expensive, optional • Limit # of distinct variables to restrict search space

  15. Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =  • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses

  16. !AdvBy(S,P) v Stu(S) Search Strategies • Shortest-first search (SFS) • Find gain of each clause • Sort clauses by gain • Return top 5 with positive gain MLN wt1, !AdvBy(S,P) wt2, clause2 … • Find gain of each clause • Sort them by gain • Add 5 clauses to MLN • Retrain wts of MLN (Yikes! All length-2 clauses have gains · 0) candidate set

  17. !AdvBy(S,P) v Stu(S) v Prof(P) !AdvBy(S,P) v Stu(S) Shortest-First Search • Extend 20 length-2 clause with highest gains • Form new candidate set • Keep 1000 clauses with highest gains MLN wt1, !AdvBy(S,P) wt2, clause2 …

  18. Shortest-First Search • Shortest-first search (SFS) • Repeat process • Extend all length-2 clauses before length-3 ones MLN wt1, clause1 wt2, clause2 … How do you refine a non-empty MLN? candidate set

  19. SFS – MLN Refinement • Extend 20 length-2 clause with highest gains • Extend length-2 clauses in MLN • Remove a predicate from length-4 clauses in MLN • Flip signs of length-3 clauses in MLN (optional) • b,c,d replaces original clause in MLN MLN wt1, !AdvBy(S,P) wt2, clause2 … wtA, clauseA wtB, clauseB …

  20. Search Strategies • Beam Search • Keep a beam of 5 clauses with highest gains • Track best clause • Stop when best clause does not change after two consecutive iterations MLN wt1, clause1 wt2, clause2 … wtA, clauseA wtB, clauseB … How do you refine a non-empty MLN?

  21. Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses =  • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses

  22. We can refine non-empty MLN • We use pseudo-likelihood; different optimizations. • Applicable to arbitrary MN (not only linear chains) • Maintain separate candidate set • Add best ¼10s in model Difference from CRF – Feature Induction • Set of atomic features (word=the, capitalized etc) • Start from empty CRF • While convergence criteria is not met • Create list of new features consisting of • Atomic features • Binary conjunctions of atomic features • Conjunctions of atomic features with features already in model • Evaluate gain in P(y|x) of adding each feature to model • Add best K features to model (100s-1000s features) Flexible enough to fit in different search algms

  23. Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments

  24. Speedup Techniques • Recall: FindBestClauses(MLN) Search for, and create candidate clauses For each candidate clause c Compute gainWPLLof adding c to MLN Return k clauses with highest gain • LearnWeights(MLN+c) to optimize WPLL with L-BFGS • L-BFGS computes value and gradient of WPLL • Many candidate clauses; important to compute WPLL and its gradient efficiently

  25. CLL Speedup Techniques • WPLL: • Ignore clauses in which predicate does not appear in • e.g. predicate l does not appear in clause 1

  26. Speedup Techniques • Gnd pred’s CLL affected by clauses that contains it • Most clause weights do not  significantly • Most CLLs do not  much • Don’t have to recompute all CLLs • Store WPLL and CLLs • Recompute CLLs only if weights affecting it  beyond some threshold • Subtract old CLLs and add new CLLs to WPLL

  27. Speedup Techniques • WPLL is a sum over all ground predicates • Estimate WPLL • Uniformly sampling grounding of each FO predicates • Sample x% of # groundings subject to min, max • Extrapolate the average

  28. Speedup Techniques • WPLL and its gradient • Compute # true groundings of a clause • #P-complete problem • Karp & Luby (1983)’s Monte-Carlo algorithm • Gives estimate that is within  of true value with probability 1- • Draws samples of a clause • Found that estimate converges faster than algorithm specifies • Use convergence test (DeGroot & Schervish 2002) after every 100 samples • Earlier termination

  29. Speedup Techniques • L-BFGS used to learn clause weights to optimize WPLL • Two parameters: • Max number of iterations • Convergence Threshold • Use smaller # max iterations and looser convergence thresholds • When evaluating candidate clause’s gain • Faster termination

  30. Speedup Technique • Lexicographic ordering on clauses • Avoid redundant computations for clauses that are syntactically the same • Don’t detect semantically identical but syntactically different clauses (NP-complete problem) • Cache new clauses • Avoid recomputation

  31. Speedup Techniques • Also used R&D04 techniques for WPLL gradient : • Ignore predicates that don’t appear in ith formula • Ignore ground formulas with truth value unaffected by changing truth value of any literal • # true groundings of a clause computed once and cached

  32. Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments

  33. Experiments • UW-CSE domain • 22 predicates e.g. AdvisedBy, Professor etc • 10 types e.g. Person, Course, Quarter etc • Total # ground predicates about 4 million • # true ground predicates (in DB) = 3212 • Handcrafted KB with 94 formulas • Each student has at most one advisor • If a student is an author of a paper, so is her advisor etc

  34. Experiments • Cora domain • 1295 citations to 112 CS research papers • Author, Venue, Title, Year fields • 5 Predicates viz. SameCitation, SameAuthor, SameVenue, SameTitle, SameYear • Evidence Predicates e.g. • WordsInCommonInTitle20%(title1, title2) • Total # ground predicates about 5 million • # true ground predicates (in DB) = 378,589 • Handcrafted KB with 26 clauses • If two citations same, then they have same authors, titles etc, and vice versa • If two titles have many words in common, then they are the same, etc

  35. Systems • MLN(KB): weight-learning applied to handcrafted KB • MLN(CL): structure-learning with CLAUDIEN; weight-learning • MLN(KB+CL): structure-learning with CLAUDIEN, using the handcrafted KB as its language bias; weight-learning • MLN(SLB): structure-learning with beam search, start from empty MLN • MLN(KB+SLB): ditto, start from handcrafted KB • MLN(SLB+KB): structure-learning with beam search, start from empty MLN, allow handcrafted clauses to be added in a first search step • MLN(SLS): structure-learning with SFS, start from empty MLN

  36. Systems • CL: CLAUDIEN alone • KB: handcrafted KB alone • KB+CL: CLAUDIEN with KB as its language bias • NB: naïve bayes • BN: Bayesian networks

  37. Methodology • UW-CSE domain • DB divided into 5 areas: ai, graphics, languages, systems, theory • Leave-one-out testing by area • Cora domain • 5 different train-test splits • Measured • average CLL of the predicates • average area under the precision-recall curve of the predicates (AUC)

  38. Results • MLN(SLS), MLN(SLB) better than • MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) AUC

  39. Results • MLN(SLS), MLN(SLB) better than • MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) CLL AUC

  40. Results • MLN(SLB+KB) better than • MLN(KB+CL), KB+CL CLL (-ve) AUC

  41. Results • MLN(SLB+KB) better than • MLN(KB+CL), KB+CL CLL (-ve) CLL AUC

  42. Results • MLN(<system>) does better than corresponding <system> CLL (-ve) AUC

  43. Results • MLN(<system>) does better than corresponding <system> CLL (-ve) CLL AUC

  44. Results • MLN(SLS) on UW-CSE; cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines • With speed-ups: 5.3 hrs • Without speed-ups: didn’t finish running in 24 hrs • MLN(SLB) on UW-CSE; on single 2.8 GHz Pentium 4 machine • With speedups: 8.8 hrs • Without speedups: 13.7 hrs

  45. Future Work • Speeding up counting of # true groundings of clause • Probabilistically bounding the loss in accuracy due to subsampling • Probabilistic predicate discovery

  46. Conclusion • Develop algorithm: • Learns FO clauses by directly optimizing pseudo-likelihood • Fast enough • Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches

More Related