1 / 29

How will we efficiently understand the interactions of ~20,000 genes,

How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions?. Minimally, we need to use the information that exists. June 1979: 2 relevant papers. S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans

wanda
Download Presentation

How will we efficiently understand the interactions of ~20,000 genes,

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How will we efficiently understand the interactions of ~20,000 genes, with ~200 million potential pairwise interactions? Minimally, we need to use the information that exists

  2. June 1979: 2 relevant papers S. Brenner (Genetics 1974) The genetics of Caenorhabditis elegans J. Sulston & R. Horvitz (Developmental Biology 1977) Post-embryonic cell lineages of the nematode, Caenorhabditis elegans Jan 2008: >200,000 relevant papers

  3. Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval 1 Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Predicting Gene Interactions from information available in public databases 2 Weiwei Zhong

  4. Textpresso Literature Search Engine www.textpresso.org Scientists spend more time skimming for information than reading papers. Much information are details hidden in the full text, and are neither in the abstract nor captured in MeSH terms. We designed Textpresso to do automated skimming for researchers and database curators. The output can be used for more sophisticated Natural Language Processing.

  5. Can we do better than PubMed and Google Scholar? Ontology Full Text Sentence MeSH Taxonomy - (-) PubMed + - - Google Scholar Gene Ontology Customized Neuroscience Information Framework Textpresso + +

  6. GENE PATHWAY Categories are “bags of words” FOXO HOXA1 pax2 PKD1 precursor upstream cascade descendants denticle wing MP2 neuron Reporter Genes Drosophila anatomy GFP, EGFP, YFP, lacZ, CFP, Green Fluorescent Protein, reporter gene, dsRed, mCherry

  7. Individual sentences in full text are marked up with Categories TEXTPRESSO CATEGORIES regulation process life stage gene gene anatomy egl-38regulateslin-3 transcription in vulF in L3 larvae ARTICLE TEXT Automatically mark up the whole corpus of papers with terms of categories, and index for rapid searching

  8. What Arabidopsis genes are expressed in the meristem based on reporter genes? www.textpresso.org/arabidopsis 14,930 A.t. papers

  9. Is a nicotinic receptor associated with Drugs of Abuse other than nicotine? www.textpresso.org/neuroscience 15,786 papers

  10. The problem with clever fly names Gene name abbreviation forager for ascute as wee we Washed eye We ~70% use italics from PDF Train system to recognize gene names by context ~85% Michael Müller, Arun Rangarajan

  11. What reporter genes have been used with Drosophila genes to study human disease? www.textpresso.org/fly 20,099 full-text fly papers

  12. Database curation: e.g. Gene-Gene Interactions • Find all sentences that contain ≥2 gene names and ≥1 association or regulation word: • 26,000 sentences out of 4.400 articles • simple interface to “check off” sentences 100 sentences per hour output into database

  13. Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval 1 Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Predicting Gene Interactions from information available in public databases 2 Weiwei Zhong

  14. Training Set Training set • 4775 Positive Interactions • Genetic, Literature curation (1909) • Yeast two-hybrid screen (2933) • 3296 Negative Genetic Interactions • cis doubles in genetic mapping Benchmark • 5515 Positives: KEGG database • 5000 Negatives: Randomly selected

  15. Ortholog mapping Score integration Scoring Algorithm interaction GO expression phenotype microarray GO expression phenotype microarray interaction GO localization phenotype microarray fly orthologs fly score total score worm gene pair worm score yeast orthologs yeast score

  16. Scoring and score integration likelihood ratio p(v | pos): probabilities of the predictor having value v if two genes interact p(v | neg): probabilities of the predictor having value v if two genes do not interact C. elegans expression L term usage (% of annotated genes associated with the term) sum the logs of the L’s n: number of predictors Li: likelihood ratio of each predictor

  17. lin-3 let-23 sem-5 sos-1 gap-1 let-60 lin-45 ksr-1 mek-2 v1.4 & v1.6 lip-1 v1.6 mpk-1

  18. Testing let-60 ras Interactors 87 genes have score >0.9; 17 confirmed from literature Inactivating genes on a gain-of-function (gf) let-60 mutant by RNAi Assay vulva precursor cell (VPC) induction not Multivulva N2 strong Multivulva let-60(gf) let-60(gf); tax-6(RNAi) weak Multivulva

  19. let-60(gf) VPC InductionUnder Various RNAi Score > 0.9 Score < 0.6 p< 0.05 p< 0.01 VPC induction index 12 hits (p<0.05) in 49 genes; 1 hit in 26 randomly selected genes Combined with literature, 29/66 (44%) predictions confirmed

  20. let-60 ras interactors (suppressors) tax-6 calcineurin csn-5 COP-9 signalosome qua-1 hedgehog-related protein C01G8.9 SWI/SNF-related (eyelid) C05D10.3 ABC transporter (white) pfa-3 profilin nhr-4 transcription factor

  21. C. elegans Interactions Input 4,726 known interactions among 2,713 genes Predict additional 18,863 for total of 23,589 interactions among 4,408 genes

  22. for Drosophila

  23. D. melanogaster interactions Input 4,180 known interactions among 1,262 genes, Predict 13,126 for 17,306 interactions among 6,044 genes

  24. Automated, Quantitative Phenotyping locomotion morphology generative graphics plate demographics (Weiwei Zhong) sexual behavior Chris Cronin: movement analysis BMC-Genetics 2005 E. Fontaine, A. Whittaker, Joel Burdick

  25. Prioritizing high resolution genetic interaction tests by knowledge mining Full text information retrieval 1 Hans-Michael Muller, Arun Rangarajan, Tracy Teal, Kimberly Van Auken, Juancarlos Chan Predicting Gene Interactions from information available in public databases 2 Weiwei Zhong

More Related