490 likes | 651 Views
Identifying functional subnetworks in large-scale datasets. Benno Schwikowski Institut Pasteur – Systems Biology Group http://systemsbiology.fr. The three levels of this talk. Discovery of pathways active in HepC infection Cytoscape plug-ins Cytoscape platform. Hepatitis C infection.
E N D
Identifying functional subnetworksin large-scale datasets Benno Schwikowski Institut Pasteur – Systems Biology Group http://systemsbiology.fr
The three levels of this talk • Discovery of pathways active in HepC infection • Cytoscape plug-ins • Cytoscape platform Benno Schwikowski
Hepatitis C infection • One person out of 30 is infected • No vaccine exists • In 20% of chronic infections, liver fibrosis and cirrhosis • Frequently requires liver transplants Benno Schwikowski
Studying HepC infection mRNA changes • 50% of transplant livers become re-infected with Hepatitis C • Study expression of 7000 genes in re-infected livers after transplantation • 1-24 month post-transplant • Samples in 3-6 month intervals • 28 biopsies from 11 patients • Mixture of hepatocytes, hepatic stellate cell, Kupffer cells, various types of blood cells • Compare against pre-transplant reference pool Benno Schwikowski
Result of mRNA expression analysis • Most genes (5968 of 7000)were significantly under- or overexpressed in one or more experiments • High patient-to-patient variation Benno Schwikowski
Our approach • Construct seed networkamong known molecular players • Expand seed networkto include differentially expressed genes • Identify putative pathwaysby the Active Modules approach Benno Schwikowski
Types of interactions Protein-protein Protein-DNA Phosphorylation Activation Repression Covalent bond Methylation Seed network
InteractionFetcher plug-in Purpose • Dynamically retrieves remote information for selected nodes • From SQL database • Requests data via XML-RPC protocol Currently implemented types • Protein/gene synonyms • Orthologs • Sequences (DNA, protein, DNA upstream) • Gene, protein, • Interactions/associations Options • Cross-species queries • Ortholog information from Homologene • Inferred interactions (interologs) • Interactive links to Source Web pages 100% open-source (client and server) Benno Schwikowski
2. Expand seed network Purpose • Bring significantly up-/downregulated genes “into the picture” Approach • Add interactions with differentially expressed genes (“in silico pull-down”) • Use BIND, HPRD databases • Only human-curated interactions Benno Schwikowski
Identifying putative pathwaysWhy clustering can be problematic • Many clustering methods are not model-based significance of clusters is unclear • Any given cluster may not be supported by all experiments – noise problem • Clusters tend to contain unrelated genes with vaguely similar profiles Benno Schwikowski
The three levels of this talk • Discovery of pathways active in HepC infection • Cytoscape plug-ins • Cytoscape platform Benno Schwikowski
How can the clustering issuesbe addressed? The ActiveModules Plug-in • Define “up-/downregulated” on the basis of a well-defined statistical model • Also derive clusters from some of the input experiments • Use additional evidence to focus on “plausible” clusters protein interactions Benno Schwikowski
Interaction networks Schwikowski, Uetz, FieldsNature Biotechnology (2000) Benno Schwikowski
Modular organization of interaction networks Benno Schwikowski
A lot of interaction data is becoming available Databases on... • Protein-protein interactions • Protein-DNA interactions • Genetic interactions • Metabolic pathways • Cell signaling pathways, similarity relationships, literature-based relationships Benno Schwikowski
Multi-criteria detection of modules 1. Interaction networkbetween genes/proteins 2. Differential Gene/ProteinAbundances/Activities Experiments Genes Benno Schwikowski
Final Score Scoring a module candidate Perturbations /conditions Pz = 1-F(zA(j)) Rank adjustment: Binomial summation rA(j)=F-1(1-pA(j)) m = total number of conditions j = size of subset of conditions Ideker, Ozier, Schwikowski, Siegel(2002): Bioinformatics 18. S233-240
Pathways in Rosetta’s compendium(300 conditions) Benno Schwikowski
The three levels of this talk • Discovery of pathways active in HepC infection • Cytoscape plug-ins • Cytoscape platform Benno Schwikowski
Active Modules plug-in appliedto HCV re-infection data • Iterative application results in four significant highly overlapping subnetworks • Repeat analysis only retaining “late-active” re-infection experiments • Eliminates pathways activated by transplant operation • Cutoff: 8 months Benno Schwikowski
Which observations can we make locally? Network after InteractionFetcher expansion Bold: Differentially regulated subnetwork Red/Green: Late-active subnetwork
Cytotalk plug-in • Overrepresentation analysis using Cytotalk plug-in, R, of overrepresentation of genes in Gene Ontology classes • Cytotalk enables interactive communication with • C/C++ programs • Java processes • Python • UNIX shell scripts • R, R scripts • Can be run on same machine or any other Internet-connected machine • Can function as Cytoscape plug-in • 100% open-source Benno Schwikowski
The three levels of this talk • Discovery of pathways active in HepC infection • Cytoscape plug-ins • Cytoscape platform Benno Schwikowski
Some Network Visualization Tools • Pajek - Slovenia • Osprey - SLRI, Toronto • VisANT - BU • Biolayout - EBI • GraphViz • PowerPoint • Others • Cytoscape (only open-source biology) Benno Schwikowski
Cytoscape Basic Concepts • Objectsvisualized as nodes • Relationshipsvisualized as edges • Attributes (name, sequence, source,...) • Mappingattributes drawing customizable throughvisual mapper Benno Schwikowski
Cytoscape file formats Sample interaction file YDR216W pd YIL056W YDR216W pd YKR042W YDR216W pd YGL096W YDR216W pd YDR077W [...] Sample interaction file GENE DESC exp0.sig exp1.sig exp0.sig exp1.sig GENE0 G0 0.0 0.0 23.2 11.5 GENE1 G1 0.0 0.0 34.6 5.2 GENE2 G2 0.0 0.0 10.0 28.0 GENE3 G3 0.0 0.0 1.64 4.77 [...]
Cytoscape • Display • gene & protein expression • protein interactions (physical andnon-physical) • protein classifications • Analysis plug-in modules • http://www.cytoscape.org/ • Java: platform independent + web-start • 100% open-source Benno Schwikowski
Visual Styles Display gene expressionas clear text
Visual Styles Map expression values to node colors using a continuous mapper
Visual Styles Expression data mapped to node colors
Multidimensional attributes Cytoscape, pre-release plug-in Data from Ideker et al., Science (2001)
Layout • 16 algorithms available through plug-ins • Zooming, hide/show, alignment
Cytoscape Core – Differences to most other approaches • Emphasis on data analysis & integration • No built-in semantics(added by plug-ins) • Very simple concepts • Human-readable input formats • Extensibility Benno Schwikowski
Cytoscape extensibility • Core: 100% open source Java • Plug-in API • Plug-ins are independently licensed • “Just need to do the biology” • Template code samples Plug-in Benno Schwikowski
Biomodules plug-in Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A, Dimitrov K, Siegel AF, and Galitski T Genome Res. 2004 14: 380-390
Cytoscape Plugins Modules in Complex Networks Iliana Avila-Campillo, Tim Galitski Discovering Regulatory and Signaling Circuits in Molecular Interaction Networks Trey Ideker, Owen Ozier, Benno Schwikowski, Andrew Siegel Data Integration in Juvenile Diabetes Research Marta Janer, Paul Shannon A network motif sampler David Reiss, Benno Schwikowski Benno Schwikowski
Cytoscape Core Features • Visualize and lay out networks • Display network data using visual styles • Easily organize multiple networks • Bird’s eye view navigation of large networks • Supports SIF and GML, molecular profiling formats, node/edge attributes • Functional annotation from GO + KEGG • Metanode support (hierarchical groupings) • Extensible through plugins (20 developed) Benno Schwikowski
Baliga et al.Genome ResearchJune 2004 Benno Schwikowski
Collaborators: HCV Institute for Systems Biology, Seattle, WA • David Reiss • Iliana Avila-Campillo • Vesteinn Thorsson • Tim Galitski Benno Schwikowski
ISBLeroy HoodRowan Christmas Agilent Technologies Unilever PLC Long-term funding from NIH and participating institutions UCSDTrey IdekerChris Workman Memorial-Sloan KetteringCancer CenterChris SanderGary BaderEthan Cerami PasteurMelissa ClineAndrea SplendianiTero Aittokallio Collaborators: Cytoscape Benno Schwikowski
Shannon, P., et al. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 13, 2498-504.
Collaborators: Active Networks • Trey Ideker • Owen Ozier • Andrew Siegel • Richard Karp Benno Schwikowski
Levels of Biological Information DNA mRNA Protein Pathways Networks Cells Tissues Organs Individuals Populations Ecologies Benno Schwikowski