10 likes | 91 Views
Improving a Software S ystem for Protein A ctive S ite D etermination Greg James Dodge 1 , Cyprian Corwin 2 , Herbert J. Bernstein 3 , Paul A. Craig 1 .
E N D
Improving a Software System for Protein Active Site Determination Greg James Dodge1, Cyprian Corwin2, Herbert J. Bernstein3, Paul A. Craig1. 1Chemistry, 2Computer Science, Rochester Institute of Technology, Rochester NY, 3Mathematics and Computer Science, Dowling College, Shirley NY Program No. 978.7 Abstract #A294 ProMol is a plugin for the PyMOL molecular graphics system that is designed to locate the active site on a query protein. ProMol does this by comparing the three dimensional distances of all of the atoms in a protein to the three dimensional positions of a library of known active site residues. This is a computationally intensive process, and it would take months to run the entire PDB (Protein Data Bank) through the program. Recently however, advances in GPGPU (General Purpose Graphics Processing Unit) computing and GPU design have allowed simple computations, such as those used by ProMol, to be performed rapidly across hundreds of cores. Utilizing GPGPU computing, the time needed to run proteins through our program could be drastically reduced. Due to the rapid turnover of authors on the project however, ProMol's codebase was quite fragmented. Over the course of summer 2011, our group worked to streamline ProMol's source in preparation for implementing GPGPU computing. This work consisted of bug hunting, re-implementing the PDB loading functionality, and cleaning up the user interface, among other things. Ideally, with the changes in place, we will be able to adapt PyMOL and ProMol's selection algebra to run in parallel across many GPU Cores. Methods After the initial alignment between 3DS8 and a hydrolase, the 3DS8 plasmid was obtained from the plasmid repository at Arizona State University. The protein was then was over expressed in XL-21B E.coli cell having been grown in ampicillin containing media. The protein was then purified via cobalt affinity column chromatography. Once purified, the functionality of the protein was tested with a QuantiCleaveTM Protease Assay kit (Thermo Scientific). The results from the assay suggested some protease activity, but they were not entirely conclusive. The FASTA sequence of the protein was then tested against the entire PDB using NCBI’s protein BLAST tool1. This alignment showed putative conserved domains between 3DS8 and the esterase lipase superfamily. In particular, there was 91% sequence coverage between 3DS8 and 3FLE, another protein of unknown function. Figure 5. Comparison of predicted active sites on 3DS8 and 3FLE, two proteins of unknown function. ProMOL assigned a probable hydrolase or lipase function. Note the highly conserved secondary structure between the two proteins. 3DS8 was isolated from Listeria innocua, while 3FLE was isolated from Staphylococcus epidermis. Conclusions Our goal is to determine the function of proteins listed with unknown function in the PDB using ProMOL, a plug in for PyMOL. Using ProMOL and other bioinformatics tools, we identified two proteins (3DS8, 3FLE) that may belong to the A/B hydrolase superfamily, more specifically the lipase family. Early lab results indicate that 3DS8 has hydrolase function, as predicted. Based on these results, it appears that using ProMOL to determine active site motifs is a promising bioinformatics tool. Introduction As of April 9th, 2012 there are 2,965 protein structures listed with unknown function in the Protein Data Bank (PDB). ProMOL, a plugin for the molecular visualization software PyMOL, is a tool which may be useful in determining the function of these proteins. ProMOL uses PyMOL’s built in selection algebra to compare a queried protein structure against a database of known active site templates, called motifs. Should a motif be found in the query protein, ProMOL can align the active residues in both the motif and the query (as well as the entire protein structure) to asses similarity visually. Figure 2. BLAST Alignment between 3ds8 and the entire PDB. Unexpectedly, this protein was extremely similar to another protein of unknown function; having 91% query coverage with PDB ID 3FLE. Future Plan Although the initial goal of implementing parallel computing has not been met, it has been worthwhile to explore the validity of ProMOL’s alignments in vitro. The plasmid for 3FLE is currently on order, as well as the reagents to run two separate lipase activity assays. Once all of the necessary materials have been gathered, the same expression, purification, and analysis pipeline from the earlier work on 3DS8 will be followed. Additionally, work is being done on another protein with a predicted function of a galactosyl- transferase using the same approach. Biochemical characterization of an enzyme is one true test of its function. In the future, we plan to continue to compare functional assay results with predicted behavior. This project combines classical enzymology with bioinformatics; as such it may be suitable for incorporation into the undergraduate biochemistry lab curriculum. Further analysis within ProMOL revealed a very good active site alignment between these two proteins of unknown function with 1TAH, a lipase with a known and well documented active site3. Figure 1. The motif maker and motif finder tabs in ProMOL. Protein analysis is as simple as entering the PDB ID into the search box under the motif finder tab, selecting a set of motifs, and clicking search. Creating a motif is accomplished by entering the PDB ID, the EC #, and the active residue information under the Motif Maker tab. References Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), Nucleic Acids Res. 25:3389-3402 The PyMOL Molecular Graphics System, Version 1.5.0.1 Schrödinger, LLC. Noble, M.E.M., A. Cleasby, L.N. Johnson, M.R. Egmond, and L.G.J. Frenken. FEBS Letters 331, no. 1–2 (September 27, 1993): 123–128. After the summer of 2011, attempts to implement parallel computing were scaled back in favor of confirming results from ProMOL experimentally. ProMOL’s library of motifs had been expanded to house 440 active sites, up from 142 at this time last year. Upon testing this expanded library against proteins of unknown function, several very good alignments were produced. In particular, a protein with a PDB ID 3DS8 aligned very well with a member of the A/B hydrolasesuperfamily PDB ID 3LIP. Figure 3. Active site alignments of 1TAH (white) with the predicted site on 3DS8 (red). The active residues on 1TAH are ser 87, his 285, and asp 263. ProMOL predicts that these correspond to ser 102, his 222, and asp 188 on 3DS8. Figure 4. Active site alignments of 1TAH (white) with the predicted sites on 3FLE (red). The active residues on 1TAH are ser 87, his 285, and asp 263. ProMOL predicts that these correspond to ser 144, his 269, and asp 235 on both the A and B chains of 3FLE. Acknowledgments The authors gratefully acknowledge the assistance of current and former students who have worked at Dowling College and at RIT on the SBEVSL project. Funding: This work has been supported in part by National Science Foundation Division of Undergraduate Education grant 0402408, National Institute of General Medical Sciences grants 2R15GM078077-02, 3R15GM078077-02S1. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies.