550 likes | 813 Views
Networks of Protein Interactions Construction of Networks from Diverse Data Sources. Neda Nategh CS 374 Lecture 16 November 7, 2006. What we have learned about interaction networks in CS374. Properties of interaction networks (Susan) Comparison of networks across species
E N D
Networks of Protein InteractionsConstruction of Networks from Diverse Data Sources Neda Nategh CS 374 Lecture 16 November 7, 2006
What we have learned about interaction networks in CS374 • Properties of interaction networks (Susan) • Comparison of networks across species (Chuan Sheng) Network alignment • Construction of networks from diverse data sources (Neda) Network integration
Basics of protein Interaction Networks Biological aspects
Types of interactions • Physical interactions Protein pairs are in direct contact • Complex interaction Protein pairs participate in the same functional module • Metabolic pathway • Signaling network • Multiprotein complex Eukaryote-like glycosylation system of Campylobacter jejuni Cell division machinery of Caulobacter crescentus
Protein Complex A protein complex is a group of two or more associated proteins. Networks of proteins • Topological properties • Functional organization news.uns.purdue.edu/UNS/images/cramer.photo2.jpeg
Metabolic Pathway • Metabolic pathway is a series of chemical reactions occurring within a cell catalyzed by enzymes formation of a metabolic product initiation of another metabolic pathway • Metabolic Networks http://en.wikipedia.org/wiki/Metabolic_pathway
Signaling network Signal transduction Process by which a cell converts one kind of signal or stimulus into another. A sequence of biochemical reactions inside the cell, which are carried out by enzymes and linked through second messengers. http://en.wikipedia.org/wiki/Signal_transduction
High-throughput data • Co-expression • Co-location • Co-inheritance • Co-evolution • Co-citation • Rosetta stone(Gene-fusions)
Expression Gene expression, or simply expression, is the process by which a gene's DNA sequence is converted into the structures and functions of a cell. Indirectly, the expression of particular genes may be assessed with DNA microarray technology, which can provide a rough measure of the cellular concentration of different messenger RNAs; http://en.wikipedia.org/wiki/DNA_microarray
Inheritance Proteins are clustered according to the similarity of their phylogenetic profiles. Similar profiles show a correlated pattern of inheritance and, by implication, functional linkage.
Evolution Evolution is the change in the heritable traits of a population over successive generations, as determined by shifts in the allele frequencies of genes.
Gene fusion • A fusion gene is a hybrid gene formed from two previously separate genes. • translocation • interstitial deletion • chromosomal inversion • By creating a fusion gene of a protein of interest and green fluorescent protein, the protein of interest may be observed in cells or tissue using fluorescence microscopy. The protein synthesized when a fusion gene is expressed is called a fusion protein. http://en.wikipedia.org/wiki/Gene_fusion
Experiments • Microarray analysis of gene expression • Systematic protein interaction mapping • Mass spectrometry • Yeast two hybrid • Synthetic lethal screens
Microarray analysis of gene expression DNA microarray or gene/genome chip, DNA chip, or gene array Collection of microscopic DNA spots attached to a solid surface, such as glass, plastic or silicon chip forming an array for the purpose of expression profiling, monitoring expression levels for thousands of genes simultaneously. Applications: • Identification of sequence • Determination of expression level of genes http://phys.chem.ntnu.no/~bka/images/MicroArrays.jpg
Affinity purification/Mass spectrometry • For characterization of proteins • Using quantitative mass spectrometry to analyze the composition of a partially purified protein complex. • Interacting proteins can be distinguished from nonspecifically co-purifying proteins by their abundance ratios. • Complexes can be analyzed after a single step purification Better detection of weakly associated proteins http://en.wikipedia.org/wiki/Image:Mass_spectrom.gif
Yeast Two Hybrid Two-hybrid screening is a molecular biology technique used to discover protein-protein interactions by testing for physical interactions (such as binding) between two proteins. Susan Tang’s presentation, CS374 algorithms in biology, Stanford University
Synthetic lethal screening • To interpret genetic networks by examining the effects on the cell when pairs of genes are knocked out simultaneously. • Knocking out each gene separately may have no phenotypic effect because of robustness provided by genetic redundancy, • but knocking out both genes has a severe, possibly lethal effect.
Basics of protein Interaction Networks Computational aspects
Statistics terminology • Probability • Probability density • Conditional probability • Prior/Postrior probability • Bayes’ rule
Graph theory We map interaction networks to graphs Vertex (node) Cycle Edge -5 Directed Edge (Arc) Weighted Edge 10 7
Networks in our model • Undirected graphs • Nodes correspond to proteins • Edges represent the interactions • Edge weights represent interaction probabilities
Network Clustering 7000 Yeast interactions among 3000 proteins
Training sets • KEGG(Kyoto Encyclopedia of Genes and Genomes) • GFP(Green Fluorescent Protein) • GO(Gene Ontology) • COG(Cluster of Orthologous Groups of proteins)
Genomics • Genomics • 1 genome • Assembly, Gene Finding • Comparative Genomics • N genomes • Sequence Alignment • Functional Genomics • 1 assay • Microarray Analysis • Integrative Genomics • N assays • Network Integration
A probabilistic functional network of yeast genes Insuk Lee, Shailesh V. Date, Alex T. Adai,Edward M. Marcotte
Motivation Knowledge of gene networks’ structure • Complex roles of individual genes interplay between many systems in a cell
Problem Heterogeneous functional genomics data • Microarray analyses of gene expression • Systematic protein interaction mapping measure different aspects of gene or protein associations • Mass spectrometry measure the tendency for proteins to be components of the same physical module • Yeast two-hybrid assays indicate direct physical interaction(stable or transient) between proteins • Synthetic lethal screens measure the tendency for genes to compensate for the loss of other genes
Idea of integration Constructing a more accurate and extensive gene network Considering functional rather than physical associations • genetic • biochemical • computational probabilistic gene-gene linkages Single coherent network
Scoring scheme • Based on a Bayesian statistics approach • Log-likelihood score • Frequencies of linkages (L) observed in the given experiment (E) between annotated genes operating in • the same pathway is P(L|E) • different pathways is ~P(L|E) Total frequency of linkages between all annotated yeast genes operating in • the same pathway is P(L) • different pathways is ~P(L)
Scoring scheme • LLS > 0 Experiments tend to link genes in the same pathway • Higher scores More confident linkages • proportional to the accuracy of the experiments • Different experiments’ scores are directly comparable
Benchmarked accuracy and extent of functional genomics data sets and the integrated networks
Results • Evidence from diverse sources • Estimating the functional coupling between yeast genes • A view of relations between yeast proteins distinct from their physical interactions Probabilistic gene network
Future directions Application of this strategy to other organisms such as human • (i) assemble benchmarks for measuring the accuracy of linkages between human genes • (ii) assemble gold standard sets of highly accurate interactions for calibrating the benchmarks • (iii) benchmark functional genomics data for their ability to correctly link human genes • then integrate the data as described.
Integrated protein interaction networks for 11 Microbes Balaji S. Srinivasan, Antal F. Novak, Jason A. Flannick, Serafim Batzoglou, Harley H. McAdams
Motivation There are different methods to predict the interactions but the network generated by eah method are often contradictory Objective: constructing a summary network for each species which uses all the evidence at hand to predict which proteins are functionally linked
Pearson Correlation Arrays Gene C Gene B Gene A .8 1 .8 - .7 Genes Gene A = 1 - .6 Gene B -.7 -.6 1 Gene C Microarray data Data sourceCo-expression Expression Balaji S. Srinivasan
Average chromosomal distance Location Chrom 3 Chrom 1 Chrom 2 Chrom 4 Protein C Protein B Protein A .06 .4 .2 .3 .1 Protein A 0 .06 .25 Protein A = Protein B .5 .25 .25 .05 0 .25 Protein B .25 .1 .3 .2 .6 Protein C .25 0 Protein C Assembled Genomes Data sourcesCo-location Balaji S. Srinivasan
Tree Distances Evolution Prt Fam C Prt Fam B Prt Fam A .9 A A’ A’’ A’’’ 1 .9 -.8 Prt Fam A = 1 -.7 Prt Fam B -.8 -.7 1 Prt Fam C B C B’ B’’ B’’’ C’ C’’ C’’’ Multiple Alignments Data sourcesCo-evolution Balaji S. Srinivasan
Spearman Correlation Inheritance Species 3 Species 1 Species 2 Species 4 Protein C Protein B Protein A .95 400 200 300 100 Protein A 1 .95 - 1 Protein A = Protein B 500 250 250 50 1 - .95 Protein B -1 100 300 200 600 Protein C -.95 1 Protein C BLAST bit scores Data sourcesCo-inheritance Balaji S. Srinivasan
Integration of two predictors • Previous work • Recent work • Method presented in this paper
= coexpression coinheritance Previous work We can integrate two given networks by • intersection • union • average +
Recent work Decision Trees (Wong 2004) Bayesian Networks (Troyanskaya 2003) Likelihood Ratios (Lee 2004) Naïve Bayes + Boosting (Lu 2005)
Training sets • COG • GO • KEGG From up COG to GO to KEGG • Fraction of annotated proteins in a given organism decreases • Annotation quality is increases
Bayes’ Rule: Calculate conditional probability of linkage given evidence 1D Bayes’ rule Balaji S. Srinivasan
B A L=1 B A Same Function P(L|E) B A L=? E known L=0 Different Function ID Bayes’ rule Bayes error rate= min. error rate of classifier Balaji S. Srinivasan
2D network integration • 2D scatter plot • Separates linked pairs from unlinked pairs more efficiently • co-expression vs. co-inheritence
2D network integration • Estimate densities • Kernel density estimation • Gray-Moore dual tree algorithm