580 likes | 675 Views
Divining Systems Biology Knowledge from High-throughput Experiments Using EGAN. Jesse Paquette ISMB 2010 Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center University of California, San Francisco (AKA BCBC HDFCCC UCSF). High-throughput experiments.
E N D
Divining Systems Biology Knowledge from High-throughput Experiments Using EGAN Jesse Paquette ISMB 2010 Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center University of California, San Francisco (AKA BCBC HDFCCC UCSF)
High-throughput experiments • This talk applies to • Expression microarrays • aCGH • SNP/CNV arrays • MS/MS Proteomics • DNA methylation • ChIP-Seq • RNA-Seq • In-silico experiments • If parts of the output can be mapped to gene IDs • You can use EGAN
What do you hope to accomplish? Collect data Process data Differential analysis Publish! Clustersand/orgene lists Produce insight about the underlying biology New papers! New testable hypotheses New grants! Drug targets!
Leverage organic intelligence Clustersand/orgene lists Summarize Produce insight about the underlying biology Visualize Contextualize New testable hypotheses
Producing insight from clusters and gene lists • Summarize: find enriched pathways (and other gene sets) • Hypergeometric over-representation • DAVID • Global trends • GSEA • Visualize: gene relationships in a graph • Protein-protein interactions • Cytoscape • Network module discovery • Ingenuity IPA • Literature co-occurrence • PubGene • Contextualize: pertinent literature • PubMed • Google • iHOP
EGAN: Exploratory Gene Association Networks • Methods: state-of-the-art analysis of clusters and gene lists • Hypergeometric enrichment of gene sets • Global statistical trends of gene sets • Hypergraph visualization (via Cytoscape libraries) • Literature identification • Network module discovery • User Interface: responds quickly to new queries from the biologist • Sandbox-style functionality • Dynamic adjustment of p-value cutoffs • Point-and-click interface • All data in-memory for immediate access • Links to external websites • Modular: integrates as a flexible plug-and-play cog • All data is customizable • Proprietary data can be restricted to the client location • Java runs on almost every OS (PC, Mac, LINUX) • Can be configured and launched from a different application (e.g. GenePattern) • Analyses can be scripted for automation
Gene sets • A gene set is a a set of semantically related genes • e.g. Wnt signaling pathway • EGAN contains a database of gene sets • > 100k gene sets by default • KEGG, Reactome, NCI-Nature, Gene Ontology, MeSH, Conserved Domain, Cytoband, miRNA targets • You can easily add your own • Simple file format • Download from MSigDB (Broad Institute)
Gene-gene relationships • EGAN also contains • Protein-protein interactions (PPI) • Literature co-occurrence • Chromosomal adjacency • Kinase-target relationships • Other possibilities • Sequence homology • Expression correlation
Example with microarray and aCGH results • Mirzoeva et al. (2009) Cancer Research • UCSF-LBL collaboration • Analysis of breast cancer cell lines • Basal vs. luminal • Discoveries in this presentation • miRNA regulator of subtype (mir-200) • Annexin (ANXA1) as potential regulator of ER, glucocorticoid and EGFR signaling