1 / 33

Building biological networks from diverse genomic data

Building biological networks from diverse genomic data. Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006.

Download Presentation

Building biological networks from diverse genomic data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building biological networks from diverse genomic data Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006

  2. Motivation: building biological networks from experimental data ? • Find missing pathway components • Detect uncharacterized crosstalk between pathways • Discover novel pathways Explosion of functional genomic DATA KNOWLEDGE of components and inter-relationships that lead to function

  3. Motivation: building biological networks from experimental data noisy How can we harness this information without sacrificing precision?

  4. Directed network discovery: involving the biologist in the search process • Previous approaches to network analysis from genomic data: • largely undirected global approaches that detect interesting network features • Incorporating expert direction can: • Improve sensitivity and precision by using context information • Focus on relevant information for biologist user (allows interactivity) Two-hybrid interaction network, yeast (SH3 domain) Boone lab • Previous work: • Bader et al. (2003), Asthana et al. (2004) • Yamanashi et al. (2004,2005), Kato et al. (2005)

  5. bioPIXIE system overview bioPIXIE: Pathway Inference from eXperimental Interaction Evidence

  6. Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work

  7. Heterogeneous data integration • Diverse forms of data: what’s a unifying framework? • Variable coverage, reliability, and relevance • Integration scheme should utilize information in data when available, but be robust when missing physical binding cellular localization genetic interaction expression sequence (TF motifs, coding,…)  Map to associations of genes/proteins  Bayes net

  8. Bayes net for evidence integration We infer: Input evidence: grouped by lab (source) and by type • Structure: • Naïve Bayes (~60 nodes) • (also tried TAN) • CPT’s: • learned from GO gold standard Functional Relationship Fully-connected, weighted graph of proteins Microarray correlation Shared transcription factors Synthetic lethality Synthetic rescue Co- localization Purified complex 2 Hybrid … Affinity precipitation

  9. Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work

  10. Expert-driven network discovery • Local search in the PPI network centered at the query • Which proteins should we extract as a single, functionally coherent group? • Should consider: confidence in links and topology surrounding query group

  11. Extracting relevant proteins Basic idea: compute expected linkage to query set eij = P ( protein i is functionally related to protein j | evidence) Xij : binary RV with prob. eij SQ ( pi ): # of links from protein i to query set, Q Find proteins that maximize: What about indirect links to the query set?

  12. Graph search: handling indirect links • Solution: iterative expanding search where indirect links to the query through high confidence neighbors are counted

  13. Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work

  14. Making bioPIXIE usable • Guiding principles: • Accessibility • (users can access most recent data with little effort) • Simplicity vs. flexibility • Drill-down • (details, e.g. supporting exp. data, hidden until requested) • Browseable

  15. Graph visualization

  16. Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work

  17. Evaluation experiments Recovering known network components: How much does integration help? • Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members

  18. Evaluation experiments (2) Recovering known network components: Do naïve methods of integration/search work just as well? • Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members

  19. Biological validation: finding new components Using bioPIXIE to characterize unknown genes S. cerevisiae uncharacterized gene, YPL077C Predicted involvement in chromosome segregation

  20. Biological validation: finding new components P-value based on blind counting: 1.98x10-7 , Fisher’s exact test

  21. Biological validation: novel links between pathways DNA replication initiation: Cdc7: “switch” that starts replication (activated by Dbf4) Linked to Hsp90 complex by our method Hsp90 (yeast- hsc82,hsp82): Cytosolic molecular chaperone that participates in the folding of several signaling kinases and hormone receptors (Helmut Pospiech)

  22. Genetic analysis of DNA replication-Hsp90 link dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ cpr7Δ hsc82Δ hsp82Δ dbf4Δ dbf4Δ dbf4Δ wt wt wt 105 cells RT 105 cells 30°C 105 cells 37°C • YKO Dbf4 vs. hsp82, hsc82 andco-chaperones: cpr7, sti1, cdc37

  23. Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work

  24. Practical challenges/opportunities • Visualizing complex networks of interactions in a meaningful way • how does it scale with added data? • easy user navigation around the network • Data-centric vs. established knowledge views • How do we overlay current knowledge of pathways with predictions derived from experimental data?

  25. Future work An observation: The more specific we can be about the end goal, the better the accuracy of our prediction

  26. Future work Exploiting relevance and reliability variation: context-specific integration

  27. Summary bioPIXIE can facilitate precise network discovery from experimental data using: • Bayesian data integration • Expert-directed search • Web-based dynamic interface bioPIXIE is an effective tool for browsing genomic evidence and generating specific, testable hypotheses http://pixie.princeton.edu

  28. Acknowledgements Olga Troyanskaya Drew Robson Adam Wible Kara Dolinski Camelia Chiriac Matt Hibbs Curtis Huttenhower David Botstein Lab Leonid Kruglyak Lab Thank you! http://pixie.princeton.edu

  29. Evaluation experiments (3): what about noise in the query set? # of random proteins out of 20 total query proteins AUPRC

  30. Hydroxyurea sensitivity (replication inhibitor) dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ dbf4Δhsc82Δ dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δhsp82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δcpr7Δ dbf4Δsti1Δ dbf4Δsti1Δ hsc82Δ hsc82Δ hsc82Δ hsp82Δ hsp82Δ dbf4Δ dbf4Δ wt wt wt hsp82Δ cpr7Δ cpr7Δ cpr7Δ dbf4Δ sti1Δ sti1Δ sti1Δ 106 cells 30°C 106 cells 37°C HU 50 mM HU 100 mM HU 0 mM

  31. Is this interaction specific to DNA replication? MMS sensitivity (induces DNA damage) • Conclusions: • Hsp90 complex plays specific role in DNA replication • Hsc82 and hsp82 do not have identical function • Possible new link between signaling cascades, stress, and DNA replication • Our system generates specific, testable hypotheses dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ dbf4Δhsc82Δ dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δhsp82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δcpr7Δ dbf4Δsti1Δ dbf4Δsti1Δ hsc82Δ hsc82Δ hsc82Δ hsp82Δ hsp82Δ dbf4Δ dbf4Δ wt wt wt hsp82Δ cpr7Δ cpr7Δ cpr7Δ dbf4Δ sti1Δ sti1Δ sti1Δ 106 cells 37°C MMS treatment has no apparent effect at RT, 30°C or 37°C (shown)

More Related