Protein Function Prediction

Protein Function Prediction Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role.

Sub cellular locations • 1- Cytoplasm • 2- Nuclear • 3- Mitochondria • 4- Extra cellular • 5- Golgi apparatus • 6- Chloroplast • 7- Endoplasmic reticulum • 8- Cytoskeleton • 9- Vacuole • 10- Peroxisome • 11- Lysosome • 12- Plasma membrane

Protein function prediction methods 1- Analyzing Gene expression 2- phylogenetic profiles 3- protein fusion 4- Protein sequences - N- protein protein interaction

protein protein interaction What do we mean by protein interaction ? Do you mean physical contact ? no, but higher levels of relations 1- Inclusion in multi protein complexes 2- Common cellular compartments 3- Same signalling path way 4- Same metabolic path way 5- Co-expression 6- Genetic co-regulation 7- Molecular co-evolution

protein interaction Complete protein interaction  inter actom Very complicated Not only for large number of proteins but for the range of distinct types of protein interaction

Types of protein interaction Three levels of associations 1- Physical interaction 2- Correlated proteins 3- Co-located proteins

Permanent interaction Transient interaction

Metabolic correlation Genetic correlation

Soluble location Membrane location

Databases of Protein interaction • Bimolecular Interaction Network Database (BIND); • Database of Interacting Proteins (DIP); • the General Repository for Interaction Datasets (GRID); • Molecular Interactions Database (MINT); • Database of predicted functional associations among genes/proteins (HNB) that has 3 tools [ SMART - mini PEDANT – STRING ].

How can we predict the function of un annotated proteins through PPI If two proteins interact, they are neighborhood of each other. the functions of un annotated protein’s neighbors contain information about the un annotated proteins f8 f1 f9 p2 p1 f7 f1 p3 m2 f5 m1 p5 Annotated protein f1 f2 Un Annotated protein f5 function f3 f4 f6

Our objective • Is to assign functions to all the un annotated proteins based on the functions of the annotated proteins. Conditions • Protein may have several different functions up to 8 functions • We do not know which combinations of the functions contribute the interaction

Given • Suppose that genome has N proteins P1 PN P1 Pn un annotated proteins Pn+1 Pn+m annotated proteins N=n+m Xi = 1 protein has function 0 protein has not function X = ( x1,x2,…….,xn+m)

Assumptions • Let Oi j observed interaction between Pi and Pj. • Oi j = 1 proteins have interaction • Oi j = 0 other wise • S = { Pi <-> PJ : Oi j , i, j =1…N} • Nei (i) set of proteins interact with Pi • ∏ jfraction of all proteins having function Fj

Previous methods • Neighborhood counting method • Frequencies of its neighbors method • Chi square method • Markov random field method

Neighborhood counting method • Method to predict the function based on the functions of its neighbors ( all annotated functions are ordered in list ) • Dis advantages 1- no significance value 2- ignore the size of functional classes 3- equal weights for distance neighbors F1 most frequent F2 F3 F4 | | FN least frequent

Frequencies of its neighbors method • As same as the previous method Neighborhood counting method but assign k functions for un annotated protein with k largest frequencies in its neighbors. Dis advantages 1- it does not take consideration that some proteins have same function. 2- if we have famous function in the neighbors, the probability that the un annotated protein has the same function is larger than any other function

Chi square method Test • Let ni (j) number of proteins interact with protein i & having function j. observed freq • #Nei (i) number of proteins in Nei (i) • ei (j) = # Nei (j) . ∏j Expected freq Si (j) = [ni (j) – ei (j) ]²/ ei (j) We will take the highest score Significance value but …………………?

Problem of Chi square method • Equally weighted it is obvious that proteins far away from Pi contribute less information than those close neighbors. It is not clear how can we choose the correct weights p2 f1 m2 f7 p5 f5 f6

Problem • The problems are how to assign different weights to the parameters. • How to estimate the probabilities based on the network.

Markov random field method Over come all the above problems • X=(x1, x2, ….., xN) functional annotation • Xi = 1 i protein has the function = 0 other wise Get the prior probability distribution of x based on the interaction network (Gibbs distribution) Calc the posterior probability of the function of the un annotated protein ob Pr (g) ps

Check the accuracy • By using leave one out method for each annotated protein with at least one interaction. We assume that it as un annotated protein and predict the function. Compare the prediction and the annotation Repeat the LOO for all proteins Pi…..Pk Check the sensitivity = ∑ Ki / ∑ ni where Ki over lap between the set of observed & predicted functions Ni number of the functions of protein Pi

Limitations and assumptions • Limitations - Interaction network and functional annotation of proteins are incomplete. - The actual number of interacting protein much than what we have. • Assumptions - annotated proteins have complete functional annotation.

Note • We can take into consideration the correlation between the functions. Ex: protein has function A may increase the chance to have function B because A,B are highly correlated.

Chi square Test • Is a test of significance of overall deviation square in the observed and expected frequencies divided by the expected ones. • X²= ∑ { (o - e)² / e} where o observed freq • e expected freq • Compare x² and the tabulated value • If x² > the tabulated value  value is significance • x² depends on the - number of the classes • - the degree of the freedom • 0--- ∞ Back

Thanks

Protein Function Prediction

Protein Function Prediction

Presentation Transcript

Protein Molecular Function Prediction by Bayesian Phylogenomics

Function Prediction from Protein Sequence

Protein Function

DNA/Protein structure-function analysis and prediction

Analysis and Prediction of Protein Function

DNA/Protein structure-function analysis and prediction

DNA/Protein structure-function analysis and prediction

Protein Function

System approaches to the prediction of protein function

Lecture 3 Protein Function prediction using network concepts

Protein function

DNA/Protein structure-function analysis and prediction

Protein Structure and Function Prediction

Consistent probabilistic outputs for protein function prediction

Prediction of protein function

Protein Function

Protein Function

DNA/Protein structure-function analysis and prediction

Protein Function

Protein Function Prediction Based on Domain Content

Protein Function Prediction from Protein Interactions

Biological Signal Detection for Protein Function Prediction