330 likes | 442 Views
Decomposition for Reasoning with Biological Network. Gauvain Bourgne , Katsumi Inoue ISSSB’11, Shonan Village, November 13 th - 17 th 2011. Motivation. In bioinformatics, need to reason on huge amount of data Huge networks (e.g. metabolic pathways, signaling pathways…)
E N D
Decomposition for Reasoning with Biological Network GauvainBourgne, Katsumi Inoue ISSSB’11, Shonan Village, November 13th-17th2011
Motivation • In bioinformatics, need to reason on huge amount of data • Huge networks (e.g. metabolic pathways, signaling pathways…) • On such problems, centralized methods • Long computation time • Memory overflow • Problem decomposition • Divide into smaller problems or steps to recompose a global solution • Need for (1) an automated process to decompose and (2) an algorithm to solve local problems and recompose global solution /33 Automated Problem Decomposition
Example Problem (Krebs Cycle) 1.2.1.31 l-2-aminoadipate l-lysine 2.6.1.39 1.1.1.42 2-oxe-glutarate isocitrate 4.1.1.20 4.2.1.3 trans-aconitate citrate 2.6.1.14 2.6.1.- succinate 2.3.1.61 fumarate Fumarate taurine 4.2.1.2 1.3.99.1 4.3.2.1 2.1.1.1 1.13.11.16 2.1.1.7 6.3.4.5 2.3.3.1 l-as citrulline nmnd nmna hippurate 2.1.3.3 3.5.3.1 formate arginine ornithine 2.1.3.1 urea 2.1.1.2 1.1.99.8 1.5.99.1 3.5.3.3 6.2.1.1 sarcosine acetylcoa acetate creatine formaldehyde 1.4.99.3 3.5.1.59 3.5.2.10 methylamine 4.1.2.32 tmao 1.2.4.1 creatinine 1.1.1.27 4.3.1.6 glycolisis acryloyl-coa 4.2.1.54 lactate 3 pyruvate glucose Automated Problem Decomposition /33 beta-alanine
Example Problem (Krebs Cycle) Ag5 1.2.1.31 l-2-aminoadipate l-lysine 2.6.1.39 Ag3 1.1.1.42 2-oxe-glutarate isocitrate 4.1.1.20 1.1.1.42 4.1.1.20 4.2.1.3 trans-aconitate citrate 2.6.1.14 1.3.99.1 2.6.1.- succinate 2.3.1.61 fumarate Fumarate taurine 4.2.1.2 4.2.1.2 2.1.3.1 1.3.99.1 4.3.2.1 Ag2 2.1.1.1 1.13.11.16 2.1.1.7 6.3.4.5 2.3.3.1 l-as citrulline 2.3.3.1 4.3.1.6 nmnd nmna hippurate 2.1.3.3 2.1.3.3 3.5.3.1 3.5.3.1 formate arginine ornithine Ag1 2.1.3.1 urea 2.1.1.2 1.1.99.8 1.5.99.1 Ag0 3.5.3.3 6.2.1.1 sarcosine acetylcoa acetate creatine formaldehyde 1.4.99.3 3.5.1.59 1.5.99.1 3.5.2.10 methylamine Ag4 4.1.2.32 tmao 1.2.4.1 creatinine 1.1.1.27 4.3.1.6 acryloyl-coa 4.2.1.54 lactate glycolisis 4 pyruvate glucose /33 Automated Problem Decomposition beta-alanine
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Logical representation • Metabolic pathways: set of reactions Ri: Ri: m1,m2,…,mp p1,p2,…,pn • Such reactions can be represented as • an activation rule • ¬m1v¬m2v…v¬mpvRi • n production rules • ¬Riv p1 • ¬Riv p2 • … • ¬Rivpn Clausal theory /33 Automated Problem Decomposition
Problems • (Conditional) accessibility problems • Sources (si), Conditional sources (ci), Targets (ti) • Find which ti can be produced from si, possibly with the addition of ci as a new source • Find all consequences of the form ¬civ…v¬ckvtj • Extraction of sub-networks • Pathways completion (abduction) • Find reactions (set of clauses) • Hypothesis on state of reaction given experiments Consequence finding (with specific form) /33 Automated Problem Decomposition
Main reasoning task • Consequence Finding (CF) in clausal theories • Input • A clausal theory T • A production field P=<L,Cond> • L is a list of literals • Cond is a condition (maximal length of the consequences, or number of occurrences of some literals) • Output • All the consequences of T that are subsumption-minimal and belongs to P (formed with literals of L respecting condition Cond). Carc(T,P) /33 Automated Problem Decomposition
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Partition-based CF • The task • Consequence Finding (CF) in clausal theories • Input • A set of clausal theory Ti such that UTi=T, and a set of reasonersai associated with each partition • A production field P=<L,Cond> • Output • Carc(T,P) • Where • The output should be produced through local computations and interactions between reasoners (message exchange) /33 Automated Problem Decomposition
Partition-based Consequence Finding C F D • Principles • Identify common symbols (communication languages) • Build a tree structure (cycle-cut) • Forwardrelevant consequences from leaf to root /33 • Generalization of Partition-based Theorem Proving [Amir & McIlraith, 2005] • Based on Craig’s Interpolation Theorem: If C entails D, then there is a formula F involving only symbols common to C et D such that C entails F and F entails D. Automated Problem Decomposition
Communication languages • Cycle-cut • While (G not acyclic) • Take a minimal cycle S=(i1,i2),(i2,i3),…,(ip,i1). • Choose (i,j) in S s.t. • is minimal • For each (q,r)≠(i,j) in S, l(q,r)l(q,r)Ul(i,j) • Remove (i,j) from E abc b a bfg ade ac b f ad b acdf /33 Graph induced from the partition Problem : eliminate cycles from it while ensuring a proper labeling. Automated Problem Decomposition
Forward Message-passing Algorithm(Sequential) Carc Carc Carc Carc /33 • Preprocessing • Determine initial l(i,j) • Apply Cut-cycles • Determine Pi • Non-root agents ai (with parent aj): Pi=<LUl(i,j)> • Rootak: Pk=P • Consequence-Finding • From leaves to root • Determine Cni=Carc(∑i,Pi) • Forward Cni Automated Problem Decomposition
Parallel Variant Carc Newcarc Carc Newcarc Carc Carc /33 Incremental computations: Newcarc(TUC,P)=Carc(TUC,P)\Carc(T,P) Automated Problem Decomposition
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Decomposition of clausal theories • Given a Clausal Theory T • Find a set of partitions Ti, such that • UTi=T • Reasoning is easier ie the application of partition-based algorithm to this decomposition is as efficient as possible. • Minimize the size of the communication languages • Ensure that some simplification can be done locally Partitions should be cohesive and loosely coupled. /33 Automated Problem Decomposition
Graph representation • Clausal theory can be represented as graph • Focus on common symbols c1: ¬b∨c∨e∨f c2: ¬a∨d∨e c3: ¬d∨g∨h c4: ¬e∨g c5: ¬g∨¬h∨i h h a a d d c2 c2 c3 c3 b b c c e e g g i i c1 c1 c4 c4 c5 c5 f f d 1 c2 c2 c3 c3 e 1 2 g,h c1 c1 c4 c4 c5 c5 e 1 g 1 Automated Problem Decomposition /33
Architecture Number of partitions kmetis buildGraph Initial Theory .sol file Reduced graph representation Partitioned clausal theory .dcf file graph2dcf Partitioned graph Root choice heuristic • Choose root with maximal average clause size Partition-based CF Root Solution /33 Automated Problem Decomposition
Problem Decomposition 1.2.1.31 l-2-aminoadipate l-lysine 2.6.1.39 1.1.1.42 2-oxe-glutarate isocitrate 4.1.1.20 4.2.1.3 trans-aconitate ag3 citrate ag5 2.6.1.14 2.6.1.- succinate 2.3.1.61 fumarate Fumarate taurine 4.2.1.2 1.3.99.1 4.3.2.1 2.1.1.1 1.13.11.16 2.1.1.7 6.3.4.5 2.3.3.1 l-as citrulline nmnd nmna hippurate 2.1.3.3 3.5.3.1 formate arginine ornithine ag2 2.1.3.1 urea 2.1.1.2 1.1.99.8 1.5.99.1 3.5.3.3 6.2.1.1 sarcosine acetylcoa acetate creatine formaldehyde 1.4.99.3 3.5.1.59 3.5.2.10 methylamine 4.1.2.32 tmao 1.2.4.1 creatinine ag1 1.1.1.27 4.3.1.6 ag0 ag4 Glycolisis path acryloyl-coa 4.2.1.54 lactate pyruvate glucose /33 Automated Problem Decomposition beta-alanine
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Benchmark Problems /33 • Biological networks • TPTP problems • Production field : • Vocabulary of conjecture (+ removing conjecture) • Full vocabulary with length limit • SAT problems • Production field • Based on frequency of literals • N% most/less frequent literals • Size • Problems still not tractable as CF problems • Solving only a cohesive sub-problem (obtained by partition of the clause graph) Automated Problem Decomposition
Problems characteristics /33 Automated Problem Decomposition
Results – Biological Networks 2 682 252 (3 321 857) /33 Automated Problem Decomposition
Results – SAT problems /33 Automated Problem Decomposition
Results – TPTP problems /33 Automated Problem Decomposition
Results - summary /33 Automated Problem Decomposition
Results - summary /33 Automated Problem Decomposition
Results /33 • For almost all problems, decomposition can reduce the number of resolve operations needed. Especially, it can solve some problems that could not be solved • Time is no often improved • Due to communication time (parsing, and such) • Approached decomposition with metis: ok. • Root choice heuristic: still insufficient, though not bad for biological networks problems. Automated Problem Decomposition
Overview • Reasoning task • Partition-based algorithm • Automated decomposition • Experimental evaluation • Conclusion /33 Automated Problem Decomposition
Conclusion /33 • A sound and complete algorithm combined with automated problem decomposition • Can increase efficiency (nb of operation) for almost all problems • But, results dependent on the choice of root Automated Problem Decomposition
Future works /33 • Partition-based algorithm • Variant for Newcarc computations • Common Theories for 1st order representations • Ordered partitions to break cycle (without removing links) • Decomposition • Directly from metabolic pathway • Root choice heuristic • Learning preference relation on root choice • Choosing the number of partition Automated Problem Decomposition
Thank you for your attention Any question ? /33 Automated Problem Decomposition