250 likes | 358 Views
Biological networks. Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu. Protein-protein interaction (PPI). Definition Physical association of two or more protein molecules Examples Receptor-ligand interactions Kinase-substrate interactions
E N D
Biological networks Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu
Protein-protein interaction (PPI) • Definition • Physical association of two or more protein molecules • Examples • Receptor-ligand interactions • Kinase-substrate interactions • Transcription factor-co-activator interactions • Multiprotein complex, e.g. multimeric enzymes RNA polymerase II, 12 subunits Cramer et al. Science 292:1863, 2001 BCHM352, Spring 2011
Significance of protein interaction • Most proteins mediate their function through interacting with other proteins • To form molecular machines • To participate in various regulatory processes • Distortions of protein interactions can cause diseases BCHM352, Spring 2011
Yeast two-hybrid • Method • Bait strain: a protein of interest, bait (B), fused to a DNA-binding domain (DBD) • Prey strains: ORFs fused to a transcriptional activation domain (AD) • Mate the bait strain to prey strains and plate diploid cells on selective media (e.g. without Histidine) • If bait and prey interact in the diploid cell, they reconstitute a transcription factor, which activates a reporter gene whose expression allows the diploid cell to grow on selective media • Pick colonies, isolate DNA, and sequence to identify the ORF interacting with the bait • Pros • High-throughput • Can detect transient interactions • Cons • False positives • Non-physiological (done in the yeast nucleus) • Can’t detect multiprotein complexes UetzP. Curr Opin Chem Biol. 6:57, 2002 BCHM352, Spring 2011
Tandem affinity purification • Method • TAP tag: Protein A, Calmodulin binding domain, TEV protease cleavage site • Bait protein gene is fused with the DNA sequences encoding TAP tag • Tagged bait is expressed in cells and forms native complexes • Complexes purified by TAP method • Components of each complex are identified through gel separation followed by MS/MS • Pros • High-throughput • Physiological setting • Can detect large stable protein complexes • Cons • High false positives • Can’t detect transient interactions • Can’t detect interactions not present under the given condition • Tagging may disturb complex formation • Binary interaction relationship is not clear Chepelev et al. Biotechnol & Biotechnol 22:1, 2008 BCHM352, Spring 2011
Large scale protein interaction identification • Experimental • Yeast two-hybrid • Tandem affinity purification • Computational • Gene fusion • Ortholog interaction • Phylogenetic profiling • Microarray gene co-expression Valencia et al. Curr. Opin. Struct. Biol, 12:368, 2002 BCHM352, Spring 2011
Protein interaction data in the public domain • Database of Interacting Proteins (DIP) http://dip.doe-mbi.ucla.edu/ • The Molecular INTeraction database (MINT) http://mint.bio.uniroma2.it/mint/ • The Biomolecular Interaction Network Database (BIND) http://www.binddb.org/ • The General Repository for Interaction Datasets (BioGRID) http://www.thebiogrid.org/ • Human Protein Reference Database (HPRD) http://www.hprd.org • Online Predicted Human Interaction Database (OPHID) http://ophid.utoronto.ca • The Munich Information Center for Protein Sequences (MIPS) http://mips.gsf.de BCHM352, Spring 2011
HPRD BCHM352, Spring 2011
Protein interaction networks Saccharomyces cerevisiae Jeong et al. Nature, 411:41, 2001 Drosophila melanogaster Giot et al. Science, 302:1727, 2003 Caenorhabditis elegans Li et al. Science, 303:540, 2004 Homo sapiens Rual et al. Nature, 437:1173, 2005 BCHM352, Spring 2011
Gene regulatory networks • Experimental • Chromatin immunoprecipitation (ChIP) • ChIP-chip • ChIP-seq • Computational • Promoter sequence analysis • Reverse engineering from microarray gene expression data • Public databases • Transfac (http://www.gene-regulation.com) • MSigDB (http://www.broadinstitute.org/gsea/msigdb) • hPDI (http://bioinfo.wilmer.jhu.edu/PDI/ ) Shen-orr et al. Nat Genet, 31:64, 2002 BCHM352, Spring 2011
KEGG metabolic network BCHM352, Spring 2011
Network visualization tools • Cytoscape • http://www.cytoscape.org Gehlenborg et al. Nature Methods, 7:S56, 2010 BCHM352, Spring 2011
Graph representation of networks • Graph: a graph is a set of objects called nodes or vertices connected by links called edges. In mathematics and computer science, a graph is the basic object of study in graph theory. node edge RNA polymerase II Cramer et al. Science 292:1863, 2001 BCHM352, Spring 2011
Undirected graph vs directed graph Protein interaction network Nodes: protein Edges: physical interaction Undirected Krogan et al. Nature 440:637, 2006 Lee et al. Science 298:799, 2002 Metabolic network Nodes: metabolites Edges: enzymes Directed Substrate->Product Transcriptional regulatory network Nodes: transcription factors and genes Edges: transcriptional regulation Directed TF->target gene Fhl1 RPL2B Ravasz et al. Science 297:1551, 2002 BCHM352, Spring 2011
Degree, path, shortest path • Degree: the number of edges adjacent to a node. A simple measure of the node centrality. • Path: a sequence of nodes such that from each of its nodes there is an edge to the next node in the sequence. • Shortest path: a path between two nodes such that the sum of the distance of its constituent edges is minimized. Fhl1 Out degree: 4 In degree: 0 YDL176W Degree: 3 BCHM352, Spring 2011
Obama vs Lady Gaga: who is more influential? Twitter following (out degree) Twitter followers (in degree) Obama 701,301 7,035,548 Gaga 144,263 8,873,525 Eminem 0 3,509,469 BCHM352, Spring 2011
Network properties (I): hubs • Random network • 130 nodes, 215 edges • Homogeneous: most nodes have approximately the same number of links • Five red nodes with the highest number of links reach 27% of the nodes • Scale-free network • 130 nodes, 215 edges • Heterogeneous: the majority of the nodes have one or two links but a few nodes have a large number of links • Five red nodes with the highest degrees reach 60% of the nodes (hubs) Albert et al., Nature, 406:378, 2000 BCHM352, Spring 2011
Scale-free biological networks Metabolic network C. elegans Protein interaction network H. sapiens Gene co-expression network S. cerevisiae Jeong et al, Nature, 407:651, 2000 Stelzl et al. Cell, 122:957, 2005 Noort et al, EMBO Reports,5:280, 2004 BCHM352, Spring 2011
Network properties (II): small world network Wichita • Stanly Milgram’s small world experiment • Social network • Average path length between two person • Small world network: a graph in which most nodes can be reached from every other by a small number of steps. • Biological interpretation: Efficiency in transfer of biological information Boston Omaha • "If you do not know the target person on a personal basis, do not try to contact him directly. Instead, mail this folder to a personal acquaintance who is more likely than you to know the target person." Six degrees of separation BCHM352, Spring 2011
Network properties (III): motifs • Network motifs: Patterns that occur in the real network significantly more often than in randomized networks. • Three-node patterns Milo et al., Science, 298:824, 2002 Feed-forward loop Feedback loop BCHM352, Spring 2011
Network properties (IV): modularity • Modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a relatively distinct function. • Examples • Transcriptional module: a set of co-regulated genes sharing a common function • Protein complex: assembly of proteins that build up some cellular machinery, commonly spans a dense sub-network of proteins in a protein interaction network • Signaling pathway: a chain of interacting proteins propagating a signal in the cell Protein interaction modules Pallaet al, Nature, 435:841, 2005 Gene co-expression modules Shi et al, BMC SystBiol, 4:74, 2010 BCHM352, Spring 2011
Network distance vs functional similarity • Proteins that lie closer to one another in a protein interaction network are more likely to have similar function and involve in similar biological process. Sharan et al. Mol SystBiol, 3:88, 2007 BCHM352, Spring 2011
Network-based disease gene prioritization Kohler et al. Am J Hum Genet. 82:949, 2008 For a specific disease, candidate genes can be ranked based on their proximity to known disease genes. BCHM352, Spring 2011
Summary • Biological networks • Protein-protein interaction network; Gene regulatory network; Metabolic network • Graph representation of networks • Graph, node, edge, undirected graph, directed graph, degree, path, shortest path • Network properties • Hubs and scale-free degree distribution • Small-world • Motifs • Modularity • Network-based applications • Disease gene prioritization BCHM352, Spring 2011