860 likes | 1.02k Views
Some Topics in Proteomic Bioinformatics. Biocomputing group Department of Mathematical Sciences, Tsinghua University. Group member. 导师:林元烈 蛋白质修饰分析软件课题组成员: 郭健,赵凌志,刘颖,赵同同,张仲阳,陈家一. Content. Section I: Software and relative problems for protein post-translational modification
E N D
Some Topics in Proteomic Bioinformatics Biocomputing group Department of Mathematical Sciences, Tsinghua University
Group member • 导师:林元烈 • 蛋白质修饰分析软件课题组成员: 郭健,赵凌志,刘颖,赵同同,张仲阳,陈家一
Content • Section I: Software and relative problems for protein post-translational modification • Section II: Protein quaternary structure prediction • Section III: Recognition of G-protein coupled receptor
Protein modification • Background • Pre-, co- and post-translational modifications • Three categories of PTMs • Learning method for prediction • Difficulties • Prediction Modules • Acetylation • Myristoylation • Phosphorylation • Glycosylation • Sulfation • GPI • The Server of PTM Prediction • Prediction Web Architecture • Web Server
Protein modification • Release of a completed polypeptide chain from a ribosome is often not the last chemical step in the formation of a protein. Various covalent modifications often occur, either during or after assembly of the polypeptide chain. Most proteins undergo co- and /or post-translational modifications.
Protein modification • Knowledge of these modifications is extremely important because they may alter physical and chemical properties, folding, conformation distribution, stability, activity, and consequently, function of the proteins. Moreover, the modification itself can act as an added functional group.
Examples of the biological effects of protein modifications • Phosphorylation for signal transduction, Ubiquitination for proteolysis, • Attachment of fatty acids for membrane anchoring and association, • Glycosylation for protein half-life, • Targeting, cell:cell and cell:matrix interactions.
Categories of modification • Proteins are subject to three classes of protein modifications, pre-, co- and post-translational modifications. • Pre-translational modifications Two ‘non-standard’ amino acids (selenocysteine and pyrrolysine) are incorporated into proteins by modification of some ‘standard’ amino acids while they are charged on special tRNAs. • Co-translational modifications Modifications are made while the polypeptide is still being synthesized on the ribosome. • Post-translational modifications (PTMs) Modifications are made when the protein is already folded. Most modifications are PTMs.
Application of PTM • The analysis of proteins and their post-translational modifications is particularly important for the study of heart disease, cancer, neurodegenerative diseases and diabetes.
Types of PTMs • PTMs can themselves be classified into three categories. • Proteolytic cleavage of part of the sequence Such as removal of an initiator methionine, a signal sequence, a transit peptide, etc. • Adjunction of a chemical group or covalent linkage Such as acetylation, glycosylation, phosphorylation, sulfation, ubiquitination, etc. • Formation of inter- or intra-peptide linkages Such as disulfide bonds, thioether links, etc.
Acetylation • N-terminal acetylation is one of the most common protein modifications in eukaryotes, occurring on approximately • 80-90% of the cytosolic mammlian proteins
Distribution of different amino acid in N-terminal of acetylated sequence Fig. 1. Shannon information (Shannon, 1948) sequence logo of 57 acetylation sites, in the format of extracted patterns. Acetylation is reported on Position 2 in the logo.
Predicted amino acid types • To our knowledge, only four types of amino acids has the possibility to be acetylated: • Serine, S • Threonrine, T • Alanine, A • Glycine, G
Previous works • Software: NetAcet 1.0 • Reference: Kiemer, L., Bendtsen, J. D., Blom, N. (2004). NetAcet: Prediction of N-terminal acetylation sites. Bioinformatics.
Previous works • Algorithm used by NetAcet 1.0: BP Neural network Feature extraction: Windows: 9 residue after the methionine (if the N-terminal methionine exists) or in the N-termimus (if the N-terminal methionine does not exists)
Acetylation module Our module: • In addition to including subsequent residues following an acetylated site, we included one more residue ahead of the acetylated site. If the acetylated residue is located first at N-terminal, we used a symbol “X” to represent the residue ahead of it. After being processed like this, all positive examples will begin with either “M” or “X” (see Figure 1). Thus information about the N-terminal methionine cleavage has been encoded into the patterns we have extracted.
Acetylation • Results comparison: • Our module NetAcet • MCC 0.856 0.69 • Sens 86% 75% • Spec 97% 92%
Myrostoylation • Biology InterpretationN-terminal myristoylation is a post-translational modification that causes the addition of a myristate to a glycine in the N-terminal end of the amino acid chain. The donor for this modification is myristoyl-CoA. N-terminal myristoylation is wide-spread in eukaryotic cells and virus.
Myristoylation • A lipid is adjunct to the target protein sequence during this modification process and it interacts with membranes and hydrophobic protein domains. Therefore, myristoylation participates many important cellular process, such as signal tranduction, apoptosis and in oncogene-driven cellular transformation.
Myristoylation • Conservative motifs: • (1) position 1-6: fitting the bonding pocket; • (2) position 7-10: interacting with the N- myristoyltransferases’ surface at the mouth of the catalytic cavity; • (3) position 11-17: comprising a hydrophilic linker.
Myristoylation • Exist Software • Resource: NMT Predictor • Algorithm: A scoring method based on expert system • Reference: Maurer-Stroh, S., Eisenhaber, B., Eisenhaber, F., J. Mol. Biol. 2002, 317, 541–557.
Myrostoylation • Exist Software • Resource: Myristoylator • Web URL: http://au.expasy.org/tools/myristoylator/ • Institute: Swiss Institute of Bioinformatics (SIB) and Institute of Physiology, University of Bern, Switzerland • Classification Method: Bagging of Neural Networks • Reference: G. Bologna, C. Yvon, S. Duvaud, A.-L. Veuthey. N-terminal Myristoylation Predictions by Ensembles of Neural Networks. Proteomics, 2004.
Myrostoylation • Our Myrostoylation Module • Algorithm: Support vector machine • Dataset: http://au.expasy.org/tools/myristoylator/myristoylator-data.html • Feature extraction: 17 residues in N-terminal sequence. • MCC ~91%
Phosphorylation • Biology Interpretation • Post-translational phosphorylation is one of the most common and important protein modifications that occurs in animal cells. The majority of phosphorylations occur as a mechanism to regulate the biological activity of a protein and as such are transient, i.e., a phosphate (or more than one in many cases) is added and later removed.
Phosphorylation • Physiologically relevant examples are the phosphorylations that occur in glycogen synthase and glycogen phosphorylase in hepatocytes in response to glucagon release from the pancreas. Phosphorylation of synthase inhibits its activity, whereas, the activity of phosphorylase is increased. These two events lead to increased hepatic glucose delivery to the blood.
Phosphorylation • The enzymes that phosphorylate proteins are termed kinases and those that remove phosphates are termed phosphatases. Protein kinases catalyze reactions of the following type: • ATP + protein <----> phosphoprotein + ADP
Phosphorylation • In animal cells serine, threonine and tyrosine are the amino acids subject to phosphorylation. The largest group of kinases are those that phosphorylate either serines or threonines and as such are termed serine/threonine kinases. • The ratio of phosphorylation of the three different amino acids is approximately 1000/100/1 for serine/threonine/tyrosine.
Phosphorylation • Although the level of tyrosine phosphorylation is minor, the importance of phosphorylation of this amino acid is profound. As an example, the activity of numerous growth factor receptors is controlled by tyrosine phosphorylation.
Phosphorylation • Our Phosphorylation Module • Reference: Jong Kun Kim, et.al., Prediction of phosphorylation sites using SVM, Bioinformatics, Vol. 20, 2004 • Classification Method: SVM • Dataset: Phospho.ELM (http://phospho.elm.eu.org) • Accuracy: 70%-80% • Feature extraction: A 17 residue window flanking the predicted amino acid.
Glycosylation • Mechanisms • Glycosylation is an enzymatic process • The donor molecule is an activated nucleotide sugar • The process is site specific
Glycosylation • N-linked glycosylation • O-linked glycosylation • GPI anchor
Glycosylation • N-linked glycosylation of some proteins is required for proper folding. The N-linked glycosylation process occurs in eukaryotes and widely in archaea, but very rarely in prokaryotes.
Glycosylation • For N-linked oligosaccharides, a 14-sugar precursor is first added to the asparagine in the polypeptide chain of the target protein.
Glycosylation • O-linked glycosylation • O-linked glycosylation occurs at a later stage during protein processing, probably in the Golgi apparatus. • This is the addition of N-acetyl-galactosamine to Serine or Threonine residues by the enzyme UDP-N-acetyl-D-galactosamine:polypeptide N-acetylgalactosaminyltransferase, followed by other carbohydrates (such as galactose and sialic acid).
Glycosylation • A GPI anchor (phosphatidyl-inositol glycane) is a common modification of the C-terminus of membrane-attached proteins. It is composed of a hydrorophobic phosphatidyl inositol group linked through a carbohydrate containing linker (glucosamine and mannose linked to phosphoryl ethanolamine residue) to the C-terminal amino acid of a mature protein. The two fatty acids within the hydrophobic inositol group anchor the protein to the membrane.
Glycosylation • Our Module • Reference: Jan E. Hansen, et.al., Prediction of O-glycosylation of mammalian proteins: specificity of UDP-GalNAc: polypeptide N-acetylgalactosaminyltransferase, Biochem. J,1995. • Classification Method: SVM • Dataset: http://www.cbs.dtu.dk/databases/OGLYCBASE/(O-GlycBase v6.00 • Accuracy: ~70%
GPI-Anchoring • Biology InterpretationGlycosylphosphatidylinositol (GPI) lipid anchoring is a common posttranslational modification known mainly from extracellular eukaryotic proteins. Attachment of the GPI moiety to the carboxyl terminus (o-site) of the polypeptide follows after proteolytic cleavage of a C-terminal propeptide.
GPI-Anchoring • Current Software • big-II • Reference: Prediction of Potential GPI-modification Sites in Proprotein Sequences, Birgit Eisenhaber, Peer Bork and Frank Eisenhaber(1999). • Algorithm: Scoring method based on expert system
GPI-Anchoring • 具体过程:S=Sprofile+Sppt • Sprofile根据每种氨基酸在特定位置出现的频率打分 • Sppt=S1+S2+....+S14 • S1,....S14是根据修饰位点附近的物化性质打分,主要是volume和hydrophobicity,还有固定位置出现稀有氨基酸的个数等等
Sulfation • Biology InterpretationSulfate modification of proteins occurs at the tyrosine residues of fibrinogen (纤维蛋白原) and and some secreted proteins (e.g. gastrin (胃泌激素)). The universal sulfate donor is 3'-phosphoadenosyl-5'-phosphosulphate (PAPS). Since sulfate is added permanently, it is not to used as a regulatory modification like that of tyrosine phosphorylation.