460 likes | 473 Views
Explore the CLAUDIEN algorithm, evaluation measures, clause construction, search strategies, speedup techniques, and experiments in learning Markov Logic Networks structure. Discover CRFs, feature induction, and evaluation measures in MLN learning.
E N D
Learning the Structure of Markov Logic Networks Stanley Kok
Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments
Introduction • Richardson & Domingoes (2004) learned MLN structure in two disjoint steps: • Learn FO clauses with off-the-shelf ILP system (CLAUDIEN) • Learn clause weights by optimizing pseudo-likelihood • Develop algorithm: • Learns FO clauses by directly optimizing pseudo-likelihood • Fast enough • Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches
CLAUDIEN • CLAUsal DIscovery ENgine • Starts with trivially false clause • Repeatedly refine current clauses by adding literals • Adds clauses that satisfy min accuracy and coverage to KB true ) false m ) false f ) false h ) false h ) f h ) m f ) h f ) m f^h ) false m ) h m^f ) false m^h ) false m ) f h ) m v f
CLAUDIEN • language bias ´ clause template • Refine handcrafted KB • Example, • Professor(P) ( AdvisedBy(S,P) in KB • dlab_template(‘1-2:[Professor(P),Student(S)]<-AdvisedBy(S,P)’) • Professor(P) v Student(S) ( AdvisedBy(S,P)
… y1 y2 y3 yn-1 yn Misc Person Misc Org Misc x1,x2,…,xn IBM hired Alice…. Conditional Random Fields • Markov networks used to compute P(y|x) (McCallum2003) • Model: • Features, fk e.g. “current word is capitalized and next word is Inc”
CRF – Feature Induction • Set of atomic features (word=the, capitalized etc) • Starts from empty CRF • While convergence criteria is not met • Create list of new features consisting of • Atomic features • Binary conjunctions of atomic features • Conjunctions of atomic features with features already in model • Evaluate gain in P(y|x) of adding each feature to model • Add best K features to model (100s-1000s features)
Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses
Evaluation Measure • Ideally use log-likelihood, but slow • Recall: • Value: • Gradient:
Evaluation Measure • Use pseudo-log-likelihood (R&D(2004)), but • Undue weight to predicates with large # of groundings • Recall: • E.g.:
Evaluation Measure • Use weighted pseudo-log-likelihood (WPLL) • E.g.:
Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses
Clause Construction • Add a literal (negative/positive) • All possible ways variables of new literal can be shared with those of clause • !Student(S)vAdvBy(S,P) • Remove a literal (when refining MLN) • Remove spurious conditions from rules • !Student(S)v !YrInPgm(S,5) vTA(S,C) vTmpAdvBy(S,P)
Clause Construction • Flip signs of literals (when refining MLN) • Move literals on wrong side of implication • !CseQtr(C1,Q1) v !CseQtr(C2,Q2) v !SameCse(C1,C2) v !SameQtr(Q1,Q2) • Beginning of algorithm • Expensive, optional • Limit # of distinct variables to restrict search space
Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses
!AdvBy(S,P) v Stu(S) Search Strategies • Shortest-first search (SFS) • Find gain of each clause • Sort clauses by gain • Return top 5 with positive gain MLN wt1, !AdvBy(S,P) wt2, clause2 … • Find gain of each clause • Sort them by gain • Add 5 clauses to MLN • Retrain wts of MLN (Yikes! All length-2 clauses have gains · 0) candidate set
!AdvBy(S,P) v Stu(S) v Prof(P) !AdvBy(S,P) v Stu(S) Shortest-First Search • Extend 20 length-2 clause with highest gains • Form new candidate set • Keep 1000 clauses with highest gains MLN wt1, !AdvBy(S,P) wt2, clause2 …
Shortest-First Search • Shortest-first search (SFS) • Repeat process • Extend all length-2 clauses before length-3 ones MLN wt1, clause1 wt2, clause2 … How do you refine a non-empty MLN? candidate set
SFS – MLN Refinement • Extend 20 length-2 clause with highest gains • Extend length-2 clauses in MLN • Remove a predicate from length-4 clauses in MLN • Flip signs of length-3 clauses in MLN (optional) • b,c,d replaces original clause in MLN MLN wt1, !AdvBy(S,P) wt2, clause2 … wtA, clauseA wtB, clauseB …
Search Strategies • Beam Search • Keep a beam of 5 clauses with highest gains • Track best clause • Stop when best clause does not change after two consecutive iterations MLN wt1, clause1 wt2, clause2 … wtA, clauseA wtB, clauseB … How do you refine a non-empty MLN?
Algorithm • High-level algorithm Repeat Clauses <- FindBestClauses(MLN) Add Clauses to MLN Until Clauses = • FindBestClauses(MLN) Search for, For each candidate clause c Compute gainevaluation measureof adding c to MLN Return k clauses with highest gain and create candidate clauses
We can refine non-empty MLN • We use pseudo-likelihood; different optimizations. • Applicable to arbitrary MN (not only linear chains) • Maintain separate candidate set • Add best ¼10s in model Difference from CRF – Feature Induction • Set of atomic features (word=the, capitalized etc) • Start from empty CRF • While convergence criteria is not met • Create list of new features consisting of • Atomic features • Binary conjunctions of atomic features • Conjunctions of atomic features with features already in model • Evaluate gain in P(y|x) of adding each feature to model • Add best K features to model (100s-1000s features) Flexible enough to fit in different search algms
Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments
Speedup Techniques • Recall: FindBestClauses(MLN) Search for, and create candidate clauses For each candidate clause c Compute gainWPLLof adding c to MLN Return k clauses with highest gain • LearnWeights(MLN+c) to optimize WPLL with L-BFGS • L-BFGS computes value and gradient of WPLL • Many candidate clauses; important to compute WPLL and its gradient efficiently
CLL Speedup Techniques • WPLL: • Ignore clauses in which predicate does not appear in • e.g. predicate l does not appear in clause 1
Speedup Techniques • Gnd pred’s CLL affected by clauses that contains it • Most clause weights do not significantly • Most CLLs do not much • Don’t have to recompute all CLLs • Store WPLL and CLLs • Recompute CLLs only if weights affecting it beyond some threshold • Subtract old CLLs and add new CLLs to WPLL
Speedup Techniques • WPLL is a sum over all ground predicates • Estimate WPLL • Uniformly sampling grounding of each FO predicates • Sample x% of # groundings subject to min, max • Extrapolate the average
Speedup Techniques • WPLL and its gradient • Compute # true groundings of a clause • #P-complete problem • Karp & Luby (1983)’s Monte-Carlo algorithm • Gives estimate that is within of true value with probability 1- • Draws samples of a clause • Found that estimate converges faster than algorithm specifies • Use convergence test (DeGroot & Schervish 2002) after every 100 samples • Earlier termination
Speedup Techniques • L-BFGS used to learn clause weights to optimize WPLL • Two parameters: • Max number of iterations • Convergence Threshold • Use smaller # max iterations and looser convergence thresholds • When evaluating candidate clause’s gain • Faster termination
Speedup Technique • Lexicographic ordering on clauses • Avoid redundant computations for clauses that are syntactically the same • Don’t detect semantically identical but syntactically different clauses (NP-complete problem) • Cache new clauses • Avoid recomputation
Speedup Techniques • Also used R&D04 techniques for WPLL gradient : • Ignore predicates that don’t appear in ith formula • Ignore ground formulas with truth value unaffected by changing truth value of any literal • # true groundings of a clause computed once and cached
Overview • Introduction • CLAUDIEN, CRFs • Algorithm • Evaluation Measure • Clause Construction • Search Strategies • Speedup Techniques • Experiments
Experiments • UW-CSE domain • 22 predicates e.g. AdvisedBy, Professor etc • 10 types e.g. Person, Course, Quarter etc • Total # ground predicates about 4 million • # true ground predicates (in DB) = 3212 • Handcrafted KB with 94 formulas • Each student has at most one advisor • If a student is an author of a paper, so is her advisor etc
Experiments • Cora domain • 1295 citations to 112 CS research papers • Author, Venue, Title, Year fields • 5 Predicates viz. SameCitation, SameAuthor, SameVenue, SameTitle, SameYear • Evidence Predicates e.g. • WordsInCommonInTitle20%(title1, title2) • Total # ground predicates about 5 million • # true ground predicates (in DB) = 378,589 • Handcrafted KB with 26 clauses • If two citations same, then they have same authors, titles etc, and vice versa • If two titles have many words in common, then they are the same, etc
Systems • MLN(KB): weight-learning applied to handcrafted KB • MLN(CL): structure-learning with CLAUDIEN; weight-learning • MLN(KB+CL): structure-learning with CLAUDIEN, using the handcrafted KB as its language bias; weight-learning • MLN(SLB): structure-learning with beam search, start from empty MLN • MLN(KB+SLB): ditto, start from handcrafted KB • MLN(SLB+KB): structure-learning with beam search, start from empty MLN, allow handcrafted clauses to be added in a first search step • MLN(SLS): structure-learning with SFS, start from empty MLN
Systems • CL: CLAUDIEN alone • KB: handcrafted KB alone • KB+CL: CLAUDIEN with KB as its language bias • NB: naïve bayes • BN: Bayesian networks
Methodology • UW-CSE domain • DB divided into 5 areas: ai, graphics, languages, systems, theory • Leave-one-out testing by area • Cora domain • 5 different train-test splits • Measured • average CLL of the predicates • average area under the precision-recall curve of the predicates (AUC)
Results • MLN(SLS), MLN(SLB) better than • MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) AUC
Results • MLN(SLS), MLN(SLB) better than • MLN(CL), MLN(KB), CL, KB, NB, BN CLL (-ve) CLL AUC
Results • MLN(SLB+KB) better than • MLN(KB+CL), KB+CL CLL (-ve) AUC
Results • MLN(SLB+KB) better than • MLN(KB+CL), KB+CL CLL (-ve) CLL AUC
Results • MLN(<system>) does better than corresponding <system> CLL (-ve) AUC
Results • MLN(<system>) does better than corresponding <system> CLL (-ve) CLL AUC
Results • MLN(SLS) on UW-CSE; cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines • With speed-ups: 5.3 hrs • Without speed-ups: didn’t finish running in 24 hrs • MLN(SLB) on UW-CSE; on single 2.8 GHz Pentium 4 machine • With speedups: 8.8 hrs • Without speedups: 13.7 hrs
Future Work • Speeding up counting of # true groundings of clause • Probabilistically bounding the loss in accuracy due to subsampling • Probabilistic predicate discovery
Conclusion • Develop algorithm: • Learns FO clauses by directly optimizing pseudo-likelihood • Fast enough • Learns better structure than R&D, pure ILP, purely probabilistic and purely KB approaches