280 likes | 581 Views
Protein Function Prediction. Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role. Sub cellular locations. 1- Cytoplasm 2- Nuclear 3- Mitochondria 4- Extra cellular 5- Golgi apparatus
E N D
Protein Function Prediction Function categories of proteins : Proteins can be divided into 3 categories 1- Biochemical functions. 2- Sub cellular locations. 3- Cell role.
Sub cellular locations • 1- Cytoplasm • 2- Nuclear • 3- Mitochondria • 4- Extra cellular • 5- Golgi apparatus • 6- Chloroplast • 7- Endoplasmic reticulum • 8- Cytoskeleton • 9- Vacuole • 10- Peroxisome • 11- Lysosome • 12- Plasma membrane
Protein function prediction methods 1- Analyzing Gene expression 2- phylogenetic profiles 3- protein fusion 4- Protein sequences - N- protein protein interaction
protein protein interaction What do we mean by protein interaction ? Do you mean physical contact ? no, but higher levels of relations 1- Inclusion in multi protein complexes 2- Common cellular compartments 3- Same signalling path way 4- Same metabolic path way 5- Co-expression 6- Genetic co-regulation 7- Molecular co-evolution
protein interaction Complete protein interaction inter actom Very complicated Not only for large number of proteins but for the range of distinct types of protein interaction
Types of protein interaction Three levels of associations 1- Physical interaction 2- Correlated proteins 3- Co-located proteins
Permanent interaction Transient interaction
Metabolic correlation Genetic correlation
Soluble location Membrane location
Databases of Protein interaction • Bimolecular Interaction Network Database (BIND); • Database of Interacting Proteins (DIP); • the General Repository for Interaction Datasets (GRID); • Molecular Interactions Database (MINT); • Database of predicted functional associations among genes/proteins (HNB) that has 3 tools [ SMART - mini PEDANT – STRING ].
How can we predict the function of un annotated proteins through PPI If two proteins interact, they are neighborhood of each other. the functions of un annotated protein’s neighbors contain information about the un annotated proteins f8 f1 f9 p2 p1 f7 f1 p3 m2 f5 m1 p5 Annotated protein f1 f2 Un Annotated protein f5 function f3 f4 f6
Our objective • Is to assign functions to all the un annotated proteins based on the functions of the annotated proteins. Conditions • Protein may have several different functions up to 8 functions • We do not know which combinations of the functions contribute the interaction
Given • Suppose that genome has N proteins P1 PN P1 Pn un annotated proteins Pn+1 Pn+m annotated proteins N=n+m Xi = 1 protein has function 0 protein has not function X = ( x1,x2,…….,xn+m)
Assumptions • Let Oi j observed interaction between Pi and Pj. • Oi j = 1 proteins have interaction • Oi j = 0 other wise • S = { Pi <-> PJ : Oi j , i, j =1…N} • Nei (i) set of proteins interact with Pi • ∏ jfraction of all proteins having function Fj
Previous methods • Neighborhood counting method • Frequencies of its neighbors method • Chi square method • Markov random field method
Neighborhood counting method • Method to predict the function based on the functions of its neighbors ( all annotated functions are ordered in list ) • Dis advantages 1- no significance value 2- ignore the size of functional classes 3- equal weights for distance neighbors F1 most frequent F2 F3 F4 | | FN least frequent
Frequencies of its neighbors method • As same as the previous method Neighborhood counting method but assign k functions for un annotated protein with k largest frequencies in its neighbors. Dis advantages 1- it does not take consideration that some proteins have same function. 2- if we have famous function in the neighbors, the probability that the un annotated protein has the same function is larger than any other function
Chi square method Test • Let ni (j) number of proteins interact with protein i & having function j. observed freq • #Nei (i) number of proteins in Nei (i) • ei (j) = # Nei (j) . ∏j Expected freq Si (j) = [ni (j) – ei (j) ]²/ ei (j) We will take the highest score Significance value but …………………?
Problem of Chi square method • Equally weighted it is obvious that proteins far away from Pi contribute less information than those close neighbors. It is not clear how can we choose the correct weights p2 f1 m2 f7 p5 f5 f6
Problem • The problems are how to assign different weights to the parameters. • How to estimate the probabilities based on the network.
Markov random field method Over come all the above problems • X=(x1, x2, ….., xN) functional annotation • Xi = 1 i protein has the function = 0 other wise Get the prior probability distribution of x based on the interaction network (Gibbs distribution) Calc the posterior probability of the function of the un annotated protein ob Pr (g) ps
Check the accuracy • By using leave one out method for each annotated protein with at least one interaction. We assume that it as un annotated protein and predict the function. Compare the prediction and the annotation Repeat the LOO for all proteins Pi…..Pk Check the sensitivity = ∑ Ki / ∑ ni where Ki over lap between the set of observed & predicted functions Ni number of the functions of protein Pi
Limitations and assumptions • Limitations - Interaction network and functional annotation of proteins are incomplete. - The actual number of interacting protein much than what we have. • Assumptions - annotated proteins have complete functional annotation.
Note • We can take into consideration the correlation between the functions. Ex: protein has function A may increase the chance to have function B because A,B are highly correlated.
Chi square Test • Is a test of significance of overall deviation square in the observed and expected frequencies divided by the expected ones. • X²= ∑ { (o - e)² / e} where o observed freq • e expected freq • Compare x² and the tabulated value • If x² > the tabulated value value is significance • x² depends on the - number of the classes • - the degree of the freedom • 0--- ∞ Back