330 likes | 456 Views
Building biological networks from diverse genomic data. Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006.
E N D
Building biological networks from diverse genomic data Chad Myers Department of Computer Science, Lewis-Sigler Institute for Integrative Genomics Princeton University PRIME Workshop on Pathway Databases and Modeling Tools June 16, 2006
Motivation: building biological networks from experimental data ? • Find missing pathway components • Detect uncharacterized crosstalk between pathways • Discover novel pathways Explosion of functional genomic DATA KNOWLEDGE of components and inter-relationships that lead to function
Motivation: building biological networks from experimental data noisy How can we harness this information without sacrificing precision?
Directed network discovery: involving the biologist in the search process • Previous approaches to network analysis from genomic data: • largely undirected global approaches that detect interesting network features • Incorporating expert direction can: • Improve sensitivity and precision by using context information • Focus on relevant information for biologist user (allows interactivity) Two-hybrid interaction network, yeast (SH3 domain) Boone lab • Previous work: • Bader et al. (2003), Asthana et al. (2004) • Yamanashi et al. (2004,2005), Kato et al. (2005)
bioPIXIE system overview bioPIXIE: Pathway Inference from eXperimental Interaction Evidence
Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work
Heterogeneous data integration • Diverse forms of data: what’s a unifying framework? • Variable coverage, reliability, and relevance • Integration scheme should utilize information in data when available, but be robust when missing physical binding cellular localization genetic interaction expression sequence (TF motifs, coding,…) Map to associations of genes/proteins Bayes net
Bayes net for evidence integration We infer: Input evidence: grouped by lab (source) and by type • Structure: • Naïve Bayes (~60 nodes) • (also tried TAN) • CPT’s: • learned from GO gold standard Functional Relationship Fully-connected, weighted graph of proteins Microarray correlation Shared transcription factors Synthetic lethality Synthetic rescue Co- localization Purified complex 2 Hybrid … Affinity precipitation
Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work
Expert-driven network discovery • Local search in the PPI network centered at the query • Which proteins should we extract as a single, functionally coherent group? • Should consider: confidence in links and topology surrounding query group
Extracting relevant proteins Basic idea: compute expected linkage to query set eij = P ( protein i is functionally related to protein j | evidence) Xij : binary RV with prob. eij SQ ( pi ): # of links from protein i to query set, Q Find proteins that maximize: What about indirect links to the query set?
Graph search: handling indirect links • Solution: iterative expanding search where indirect links to the query through high confidence neighbors are counted
Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work
Making bioPIXIE usable • Guiding principles: • Accessibility • (users can access most recent data with little effort) • Simplicity vs. flexibility • Drill-down • (details, e.g. supporting exp. data, hidden until requested) • Browseable
Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work
Evaluation experiments Recovering known network components: How much does integration help? • Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members
Evaluation experiments (2) Recovering known network components: Do naïve methods of integration/search work just as well? • Results averaged over 31 pathways, processes, and complexes (KEGG, GO, MIPS) 10 random proteins as query set and try to recover remaining members
Biological validation: finding new components Using bioPIXIE to characterize unknown genes S. cerevisiae uncharacterized gene, YPL077C Predicted involvement in chromosome segregation
Biological validation: finding new components P-value based on blind counting: 1.98x10-7 , Fisher’s exact test
Biological validation: novel links between pathways DNA replication initiation: Cdc7: “switch” that starts replication (activated by Dbf4) Linked to Hsp90 complex by our method Hsp90 (yeast- hsc82,hsp82): Cytosolic molecular chaperone that participates in the folding of several signaling kinases and hormone receptors (Helmut Pospiech)
Genetic analysis of DNA replication-Hsp90 link dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ cpr7Δ hsc82Δ hsp82Δ dbf4Δ dbf4Δ dbf4Δ wt wt wt 105 cells RT 105 cells 30°C 105 cells 37°C • YKO Dbf4 vs. hsp82, hsc82 andco-chaperones: cpr7, sti1, cdc37
Overview • How do we integrate heterogeneous evidence? • Expert-driven network discovery • Making it usable: practical visualization and other interface considerations • Does it work? (evaluation experiments and biological validation) • Challenges/opportunities and future work
Practical challenges/opportunities • Visualizing complex networks of interactions in a meaningful way • how does it scale with added data? • easy user navigation around the network • Data-centric vs. established knowledge views • How do we overlay current knowledge of pathways with predictions derived from experimental data?
Future work An observation: The more specific we can be about the end goal, the better the accuracy of our prediction
Future work Exploiting relevance and reliability variation: context-specific integration
Summary bioPIXIE can facilitate precise network discovery from experimental data using: • Bayesian data integration • Expert-directed search • Web-based dynamic interface bioPIXIE is an effective tool for browsing genomic evidence and generating specific, testable hypotheses http://pixie.princeton.edu
Acknowledgements Olga Troyanskaya Drew Robson Adam Wible Kara Dolinski Camelia Chiriac Matt Hibbs Curtis Huttenhower David Botstein Lab Leonid Kruglyak Lab Thank you! http://pixie.princeton.edu
Evaluation experiments (3): what about noise in the query set? # of random proteins out of 20 total query proteins AUPRC
Hydroxyurea sensitivity (replication inhibitor) dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ dbf4Δhsc82Δ dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δhsp82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δcpr7Δ dbf4Δsti1Δ dbf4Δsti1Δ hsc82Δ hsc82Δ hsc82Δ hsp82Δ hsp82Δ dbf4Δ dbf4Δ wt wt wt hsp82Δ cpr7Δ cpr7Δ cpr7Δ dbf4Δ sti1Δ sti1Δ sti1Δ 106 cells 30°C 106 cells 37°C HU 50 mM HU 100 mM HU 0 mM
Is this interaction specific to DNA replication? MMS sensitivity (induces DNA damage) • Conclusions: • Hsp90 complex plays specific role in DNA replication • Hsc82 and hsp82 do not have identical function • Possible new link between signaling cascades, stress, and DNA replication • Our system generates specific, testable hypotheses dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δcpr7Δ dbf4Δhsc82Δ dbf4Δhsc82Δ dbf4Δhsp82Δ dbf4Δhsp82Δ dbf4Δsti1Δ dbf4Δcpr7Δ dbf4Δcpr7Δ dbf4Δsti1Δ dbf4Δsti1Δ hsc82Δ hsc82Δ hsc82Δ hsp82Δ hsp82Δ dbf4Δ dbf4Δ wt wt wt hsp82Δ cpr7Δ cpr7Δ cpr7Δ dbf4Δ sti1Δ sti1Δ sti1Δ 106 cells 37°C MMS treatment has no apparent effect at RT, 30°C or 37°C (shown)