10 likes | 131 Views
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes. Slavomira Stefkova , Michal Kreps and Rudolf A Roemer. Department of Physics , University of Warwick, Coventry, UK S.Stefkova@warwick.ac.uk. Abstract.
E N D
Automatic assignment of NMR spectral data from protein sequences using NeuroBayes Slavomira Stefkova, Michal Kreps and Rudolf A Roemer Department of Physics , University of Warwick, Coventry, UK S.Stefkova@warwick.ac.uk Abstract • Use NeuroBayes, a neural network implementation from particle physics. • Assign HSQC peaks by training NeuroBayes on Biological Magnetic Resonance Bank HSQC plot database. • 25% of amino acids within a protein are correctly assigned by NeuroBayes using only chemical shifts in HSQC spectra. • Not quite as good as the 70% agreement of other codes, however, training with extra data should improve results. • Automatizing the assignment of nuclear magnetic resonance (NMR) spectral data from protein allows for nearly automatic structure determination. • Assignment is very time consuming. • Heteronuclearsingle quantum correlation (HSQC) experiments usually the cheapest and quickest. Provide unique scatter plots – fingerprints of proteins – that can be correlated with well-known protein structures. • Artificial neural networks have been used in other correlation searches for several decades. Introduction • Fig.2: Structure of a protein. • Proteinis an organic compound consisting of one or more chains of amino acids. Structure of protein plays very important role because it usually determines its biological function. Fig.1 : Amino acid. Amino acids are biologically important compounds made from amine (-NH2) and carboxylic acid (-COOH) functional group as well as side chain. • Fig.3: HSQC plot. NMR spectroscopy is used to determine the structure and the dynamics of the proteins. One of the possible measurements is chemical shift, precise resonant frequency of atom. The easiest and cheapest experiment is heteronuclear quantum correlation experiment (HSQC experiment) which measures hydrogen and nitrogen chemical shifts. Plots that follow are unique fingerprints of each protein. • Fig.4 : Biological Magnetic Resonance Bank. Biological Magnetic Resonance Bank is a database that contains hydrogen and nitrogen shifts for several thousands proteins. • Fig.5: Model of artificial neural network. Artificial Neural networks are non-linear statistical data modelling or decision making tools. They can be used to model complex relationships between inputs and outputs or to find patterns (sequence recognition) in data. Fig.6: Workflow of NeuroBayes. NeuroBayesis a neural network used in this project. This neural network consists of two components – NeuroBayes Teacher and NeuroBayes Expert - that are necessary for assignment of unknown protein. Results In order to automatize the process of spectral NMR data NeuroBayes Teacher has been trained with a sample of 5717 proteins collected from Biological Magnetic Resonance Bank database. NeuroBayes Teacher provided probability output that had been interpreted using four different approaches. Several proteins have not been included in training so that it is possible to measure success rate of predictive algorithms. However, it was observed that prediction gives similar success rate for known as well as unknown proteins. Table 1: This table shows a sample of success rate of known (5003, 5005) and unknown proteins (5000). Analysis 3 has been so far the most successful interpretation of probability output. It uses following formula: Conclusion Nowadays , the most successful assignment algorithms reach around 70 % agreement with experimentally assigned proteins. These ; however, have incorporated several combined experiments ‘ data which has not been done it this project. Using this neural network only around 25% amino acids were correctly assigned to the peaks in HSQC experiment. Acknowledgments References Fig.1 - http://en.wikipedia.org/wiki/File:AminoAcidball.svg Fig.2 - http://en.wikipedia.org/wiki/File:Main_protein_structure_levels_en.svg Fig.4 - http://deposit.bmrb.wisc.edu/bmrb-adit/docs/tutorial.html Fig.5 - http://en.wikipedia.org/wiki/File:Neural_network_example.svg Fig.6 - https://twiki.cern.ch/twiki/pub/Main/NeuroBayes/np_workflow.png Brain fig. - http://scientopia.org/blogs/scicurious/files/2011/05/neurons51.jpg I would like to thank Professor Rudolf Roemer and Dr. Michal Kreps for guidance throughout the execution of this project . Gratitude also goes to the physics department as well as Centre for Scientific Computing in the University of Warwick for providing me with computational resources. Finally, I would like to thank URSS for allowing me to undertake this project.