20 likes | 295 Views
Comparative Protein Modeling. Jason Wiscarson ( jwiscarson@gmail.com ) , Lloyd Spaine ( llspaine@gmail.com ). Sequence Alignment and Modeling System with Hidden Markov Models (SAM)-T02 provides sequence alignment from the target sequence to all templates in steps:
E N D
Comparative Protein Modeling Jason Wiscarson (jwiscarson@gmail.com), Lloyd Spaine (llspaine@gmail.com) • Sequence Alignment and Modeling System with Hidden Markov Models (SAM)-T02 provides sequence alignment from the target sequence to all templates in steps: • Find sequences similar to the target sequence. • Predict the secondary structure. • Find probable templates for threading. • Align the target with the templates. • Construct a fragment library for the target. • Build a 3-D model of the target. • Threading different proteins that have similar structures • Creates pseudo-protein models based on solved proteins. • Calculates energy value for the pseudo-protein models. • Ranks the alignments based on that energy value. Introduction Selecting Templates and Improving Alignments Protein Model Refinement Comparative or homology modeling, is a computational tool used to predict the three-dimensional (3-D) structure of proteins whose structures are unknown. If the sequence and the protein share sequence similarity, proteins with known 3-D structures may serve as templates to predict the unknown structure of a protein. The term “homology” refers to evolutionary relationship between two or more proteins that have the same ancestor in an evolution tree regardless of their sequence similarity. Proteins from similar families often have similar functions, yet there are many instances in which proteins have similar structure but different functions. • Side-Chains with Rotamer Library (SCWRL) determines the most likely side-chain conformations in the • Reading the initial structure and determining possible low energy side-chain conformations (rotamers). • Defining disulfide bridges and performing a dead-end elimination to get rid of rotamers. • Constructing a residue graph and determining the rotamer clusters and outputing the final structure. • Molecular Mechanics (MM) is a method that removes repulsive contacts between side chains by allowing the side chains to relax to low-energy rotamers. • Molecular Dynamics (MD)simulation involves: • Warm-up, equilibrium, cool down • Sampling the trajectory during a “production” run time period and analyzing results. • Molecular Dynamics with Simulated Annealing (MD-SA) is an optimization method that works by heating a system, samples many energy states, and then slowly cools the system to ensure that the low-energy structures are found. The first step is to improve the alignment and select the template. This is where the sequence of interest (target) and other sequences and structures (template) are aligned. Afterwards, the best templates are chosen based on evolutionary distance as determined by a phylogenic tree. Selecting Templates: structure for a protein model is done by considering R-factor (residual index), the value that relates how well predicted structure matches experimental electron density maps. Improving Sequence Alignment With Primary and Secondary Structure Analysis is used to reveal regions rich in proline, glutamic acid, serine, and threonine (PEST regions) locate sequence repeats; predict percentage of buried versus accessible residues; and provide information about protein’s isoelectric point. Pattern and Motif-Based Secondary Structure Prediction: AA sequence 3D structure. Well-known pattern and motif-based secondary structure prediction methods include PSIPRED, GenTHREADER, PREDATOR, PROF, MEMSAT, and PHD. Sequence Alignment Find known sequences and 3-D structures related to the target protein • Alignment based on evolutionary history is done to amino acid residues of target protein. The types of alignment are: • Global alignment of regions that lack similarity and then search for similar regions. • Local alignment in regions with significant similarity first, and then align regions of optimally aligned residues. • To prepare sequences a database Sequence to Coordinates (S2C) is used to examine the differences that originate from the mutagenesis studies. • Alignment programs differ in the methods used but they score or evaluate the final alignment using gap penalties, similarity matrices and alignment scores. • Similarity Matrices describe the probability of a specific amino acid residue mutating to a different residue type. • Common similarity matrices include : • Point-Accepted Mutation per 100 amino acid residues (PAM), which is based on the probability of an amino acid residue mutating to another amino acid residue BLOck SUstitution Matix (BLOSUM) matrices which include a more diverse set of sequences. • Gonnet similarity matrices which index and reorganize amino acids using a tree on small cluster of computers. • Clustal alignment program aligns large sequences of varying similarity quickly. Sequences are progressively aligned based on the branching order in the phylogenetic tree. • Tree-Based Consistency Objective Function for Alignment Evaluation (T-Coffee)rectifies progressive-alignment (heuristic) methods where errors in the first alignment cannot be corrected as other sequences are added to the alignment. It suffers from greediness, the inability of the to correct errors (addition or extension of a gap) • Divide-and-Conquer Alignment (DCA)method aligns sequences simultaneously. It uses the multiple sequence simultaneously (MSA) methodology. Final Model Align the target and template amino acid residues Evaluate Model Refine Model Select templates and adjust the alignments Construct Model Evaluating Protein Models Figure 1 Flow chart showing the construction a 3-D model of a protein. Finding related sequences and structures Several methods exist to check imperfections in the models. PROCHECKdoes statistical checks and indicates regions of a protein structure that might require modification because of nonoptimal stereochemistry. Verify 3Dscores 3-D models with probability table and assess probability that each amino acid residue would occupy specific position in the 3-D structure. ERRATexamines nonbonded distances of C-C, C-N, C-O, N-N, N-O, and O-O atoms. Protein Structure Analysis (ProSa) uses potential of mean force which is change in potential energy of a system caused by the variation of a specific coordinate to locate the regions of the protein structure that may contain improper or unsuitable geometries. Protein Volume Evaluation (PROVE)methoduses computed volume of individual atoms as a means of evaluating the viability of a protein model. Model Clustering Analysis uses NMRCLUST, NMRCORE, and OLDERADO which are programs that aid in the superposition and clustering of protein structure. Constructing Protein Models In comparative protein modeling several databases are used to find genomic, amino acid, and protein data. The Expert Protein Analysis System (ExPASy) is the start for searching for proteins and their related sequences. Swiss-Prot contains data that has been refined by removing unnecessary information and TrEMBL receives and stores initial genomics data. PROSITE uses tertiary structure and key amino acid residues based on biologically significant patterns. ENZYME retrieves an enzyme’s recommended name, alternative names, catalytic activity, cofactors, human genetic diseases, and cross-references. SWISS-MODEL holds comparative protein models that do not have a known 3-D structure. Basic Local Alignment Search Tool (BLAST) uses protein sequence to search and analyze the sequences of interest; locates similar protein sequences: sequence alignments. Protein Data Bank (PDB) is a repository for experimentally determined protein 3-D structures. • Satisfaction of Spatial Restraints (SSR) constructs a 3-D protein model using spatial restraints based on distances, bond angles, dihedral angles, dihedral pairs, etc. • Segment Match Modeling (SMM)constructs protein by: • Choosing protein template. • Building list of possible template matches • Sorting templates by best fit to target’s structure. • Using probabilities to select the “best segment” from a low pseudo-energy subset group. • Moving coordinates from best segments template protein. • Multiple Template Method (MTM) uses solved X-ray structures to build the target sequence’s protein model. • 3D-JIGSAWcreates a homology model: • Select and align templates, based on sequence. • Select template segments. • Create backbone (framework, scaffold). • Add side chains, refine and evaluate target protein model. References [1] Esposito, E. X.; Tobi, D.; Madura, J. D. “Comparative Protein Modeling” Reviews in Computational Chemistry, Volume 22, 2006, Wiley-VCH, John Wiley & Sons, Inc. – to be published. [2] Ramachandran Plot and analine structure: http://www.cgl.ucsf.edu/home/glasfeld/tutorial/AAA/AAA.html Figure 2 Peptide bonds create rigid plates which rotate about phi and psi. Figure 3 A Ramachandran plot for the tripeptide in Figure 2.