Exploring Biological Interaction Networks: Motifs, Alignment, and Integration

CSCI2950-CLecture 12Networks Ben Raphael November 18, 2008 http://cs.brown.edu/courses/csci2950-c/

Biological Interaction Networks Many types: • Protein-DNA (regulatory) • Protein-metabolite (metabolic) • Protein-protein (signaling) • RNA-RNA (regulatory) • Genetic interactions (gene knockouts)

Outline • Biology of cellular interaction networks • Network Alignment • Network Motifs • Network Integration

Metabolic Networks Nodes = reactants Edges = reactions labeled by enzyme (protein) that catalyzes reaction

Regulatory Networks

Regulatory Networks Nodes = genes Edges = regulatory interaction A “activates” B A B A “represses” C C Protein-DNA interaction network

Signaling Networks

Protein Interaction Networks • Proteins rarely function in isolation, protein interactions affect all processes in a cell. • Forms of protein-protein interactions: • Modification, complexation [Cardelli, 2005]. phosphorylation protein complex

Protein Interaction Networks High-throughput methods are available to find all interactions, “PPI network”, of a species. • an undirected graph • nodes: protein, edges: interactions • Edges may have weights • Yeast DIP network: ~5K proteins, ~18K interactions • Fly DIP network: ~7K proteins, ~20K interactions PPI network

How are protein-protein interaction networks derived? • Yeast two-hybrid screens

How are protein-protein interaction networks derived? • Protein purification and separation

Computational Problems • Classifying Network Topology • Finding paths, cliques, dense subnetworks, etc. • Comparing Networks Across Species • Using networks to explain data • Dependencies revealed by network topology • Modeling dynamics of networks

Alignment Networks Evolve via gain/loss of proteins or interactions (?) • human • mouse • Sequences • Evolve via substitutions • Conservation implies function • EFTPPVQAAYQKVVAG • DFNPNVQAAFQKVVAG

Motivation • By similar intuition, subnetworks conserved across species are likely functional modules

mismatch/substitution Network Alignment • “Conserved” means two subgraphs contain proteins serving similar functions, having similar interaction profiles • Key word is similar, not identical

Alignment Analogy Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

Earlier approaches: interologs • Interactions conserved in orthologs • Orthology (descended from common ancestor) is a fuzzy notion • Sequence similarity not necessary for conservation of function

Complications • Protein sequence similarity not 1-1 • Orthologs • Paralogs • Interaction data: • Noisy • Incomplete • Dynamic • Computational tractability

Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

The Network Alignment Problem Given: k different interaction networks belonging to different species, Find: Conserved sub-networks within these networks Conserved defined by protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

A B C D A’ X’ D’ Score: match + gap + mismatch + match PathBLAST • Goal: identify conserved pathways (chains) • Idea: can be done efficiently by dynamic programming if networks are DAGs Kelley et al (2003)

Why paths?

PathBLAST (Kelley, et al. PNAS 2003) • Find conserved pathways in protein interaction maps of two species • Model & Scoring: See class notes.

2 1 4 1 4 2 3 5 5 3 5 2 1 4 3 PathBLAST • Problem: Networks are neither acyclic nor directed • Solution: Randomize Impose random ordering on nodes, perform DP; repeat many times • On average, highest scoring path preserved in 2/L! subgraphs • Finds conserved paths of length L within networks of size n in O(L!n) expected time • Drawbacks • Computationally expensive • Restricts search to specific topology Kelley et al (2003)

PathBLAST

PathBLAST: Computational Formulation • Given: • Undirected weighted graph G = (V, E, w) • Set of start vertices I, and end vertex v, • Find: a minimum-weight simple path starting in I and ending at v. • NP-hard in general (reduction from TSP) • Dynamic Programming formulation (see class notes) Scott, et al. JCB 2006

Color-coding(Alon, Yuster, & Zwick) • Assign each vertex random color between 1 and k. • Search for path w/ distinct colors. (colorful path) • Resulting paths are simple. • High-scoring path not discovered when two vertices have same color. • Repeat for many random colorings.

Color-coding(Alon, Yuster, & Zwick) • Extends to many other cases of subgraph isomorphism problem: • Does a graph G have a subgraph isomorphic to graph H? • H = simple path of length k. • H = simple cycle of length k. • H = tree. • H = graph of fixed (bounded) tree-width

Additional Problems • Efficient querying of a network (e.g. QNET) • Find conserved subgraphs Heavy subgraphs in product graph • Multiple network alignment

Exploring Biological Interaction Networks: Motifs, Alignment, and Integration