230 likes | 405 Views
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes. by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King Presenter: Mark H. Rich 2/7/2003 University of Wisconsin - Madison
E N D
Combining Inductive Logic Programming, Active Learning and Robotics to Discover the Function of Genes by C.H. Bryant, S.H. Muggleton, S.G. Oliver, D.B. Kell, P. Reiser and R.D. King Presenter: Mark H. Rich 2/7/2003 University of Wisconsin - Madison CS 838 Learning and Modeling Biological Networks
Discovering Gene Function • Yeast (S. cerevisiae) has 6,000 protein-encoding genes • Only 60% can be assigned function with confidence • The cell is a bio-chemical machine • Logic can help us discover these metabolic functions and networks
ASE-Progol Robot Scientist Background Knowledge Learning Engine Analysis New Knowledge Experiment Selection Results
Outline • Introduction • Abduction and Active Learning • Functional Genomics • Metabolism in Logic • Experiments • Results
Logic in AI • Deduction • Given facts with sound and complete proof theory, show that other facts can be proven • Induction • Given positive and negative examples of facts and background knowledge, find hypothesis that explains difference between positives and negatives
Abduction and TCIE • Given a theory and partial facts, discover what facts are missing to form one consistent hypothesis • Lateral Thinking Puzzles • Presented with a confusing situation • There is an Oracle that knows what happened • You can only ask yes or no questions
The Mysterious Package • One day a man received a parcel in the post. Carefully packed was a human arm. He examined it, repacked it and then sent it on to another man. The second man also carefully examined the arm before taking it to the woods and burying it. Why did they do this?
The Mysterious Package • Was the arm cut off intentionally? • Is the arm’s person still alive? • Is he a doctor? • Did the three men know each other? • Are the other men also missing an arm? • Were they ever stuck on a desert island with no food, make a pact to each cut off an arm to eat and survive, but were rescued before the doctor could cut off his own arm, and the doctor later fulfilled his commitment? YES!
Lateral Thinking Lessons • Certain questions are valuable and lead to large leaps of information . . . • How do we form hypotheses? • How can we pick good questions? • probability that question leads to consistent hypotheses • cost of asking question • We want to find quickest cheapest path to consistent hypotheses
Hypothesis Generation • Use contra-positives for inverse entailment Background Knowledge hasbeak(X) :- bird(X). bird(X) :- vulture(X). Example hasbeak(tweety). Hypotheses bird(tweety). bird(X). vulture(tweety). vulture(X).
One possible trial path e1 f t e2 H1 f t H2 H3 Trial Selection Theory
Hypothesis Probability • Each trial partitions H into {H[t],H[t’]} • Assuming optimal encoding scheme… • Prior probability of each hypothesis • Compression is rounded f measure
Experiment Cost • Ct is the cost of a trial t
Functional Genomics • Want to learn gene-enzyme mapping • Genes encode for • Enzymes that catalyze reactions between • Metabolites to eventually create • Amino Acid Products • Perform auxotrophic growth experiments to determine phenotype
gene1 gene2 gene3 Z A Y B C Trp X Functional Genomics: Simple • A, B and C are Enzymes • X is ubiquitous metabolite, Y and Z optional • If we knock out gene2, we need to add nutrient Z to produce Trp want to learn codes(gene2, B, [Y], [Z]) but only ask: pheno_effect(gene2,[Y]) is false pheno_effect(gene2,[Z]) is true pheno_effect(gene2,[Y,Z]) is true
aromatic amino acids enzymes metabolites Aromatic amino acid pathway
Metabolism in Logic • Hypotheses: codes(‘YDR254W’, ‘4.2.1.11’, [‘C00631’],[‘C00074’]). codes(‘YDR254W’, ‘5.3.1.24’, [‘C04302’],[‘C01302’]). etc ... • Background Knowledge: enzyme(‘4.2.1.11’,[‘C00631’],[‘C00074’]). enzyme(‘5.3.1.24’,[‘C04302’],[‘C01302’]). etc ... generated_by_other_pathways([‘C00002’, ‘C00005’, ‘C00006’, ... , ‘C03356’]). ends([‘C00078’, ‘C00079’, ‘C00082’]).
Metabolism in Logic • What the Oracle answers: phenotypic_effect(ORF, Growth_medium):- generated_by_other_pathways(Ubiquitous_metabolites), union(Ubiquitous_metabolites, Growth_medium, Starts), connected(Starts, Wild_products), ends(Ends), subset(Wild_products, Ends), enz(Enzyme, Reactants, Products), encodes(ORF, Enzyme, Reactants, Products), connected_without_this_step(Starts, Mutant_products, Enzyme, Reactants, Products), not(subset(Mutant_products, Ends)).
Experiments • Learn function of 17 genes by removing ORF • Growth Media • 13 optional nutrients, at most 3 at a time • 378 possible experiments for each ORF • Cost of Optional Nutrients • Determined from www.sigmaaldrich.com catalog • Strategies for Comparison • Random • Naïve Cheapest • ASE-Progol
Experiments • Remove all codes(…) facts • Loop • Generate random sample of trials • Generate hypotheses using Theory Completion by Inverse Entailment • Find minimum EC(H,T) trial and perform • Add results to known examples • until hypotheses consistent with trials
Conclusions and Future Work • ASE-Progol finds hypotheses inexpensively and quickly • 5 of 17 genes had only negative examples… why? Look into inhibitors and nonmonotonic logics. • Limited answers to yes/no. Probabilities? • Can this be applied to gene regulatory networks, using microarray technology? • What other networks have similar frameworks?