1 / 20

Prediction to Protein Structure

This text explains the use of PSIPRED, a neural network-based method, for predicting the secondary structure of proteins. It also discusses the importance of training and filtering in improving prediction accuracy.

seva
Download Presentation

Prediction to Protein Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics

  2. Psi-BLAST Predict Secondary Structure (PSIPRED) • Three stages: • 1) Generation of sequence profile • 2) Prediction of initial secondary structure • 3) Filtering of predicted structure

  3. PSIPRED • Uses multiple aligned sequences for prediction. • Uses training set of folds with known structure. • Uses a two-stage neural network to predict structure based on position specific scoring matrices generated by PSI-BLAST (Jones, 1999) • First network converts a window of 15 aa’s into a raw score of h,e (sheet), c (coil) or terminus • Second network filters the first output. For example, an output of hhhhehhhh might be converted to hhhhhhhhh. • Can obtain a Q3 value of 70-78% (may be the highest achievable)

  4. Neural networks • Computer neural networks are based on simulation of adaptive • learning in networks of real neurons. • Neurons connect to each other via synaptic junctions which are either • stimulatory or inhibitory. • Adaptive learning involves the formation or suppression of the right • combinations of stimulatory and inhibitory synapses so that a set • of inputs produce an appropriate output.

  5. Neural Networks (cont. 1) • The computer version of the neural network involves • identification of a set of inputs - amino acids in the • sequence, which transmit through a network of • connections. • At each layer, inputs are numerically • weighted and the combined result passed to the next • layer. • Ultimately a final output, a decision, helix, sheet or • coil, is produced.

  6. Neural Networks (cont. 2) 90% of training set was used (known structures) 10% was used to evaluate the performance of the neural network during the training session.

  7. Neural Networks (cont. 3) • During the training phase, selected sets of proteins of known structure are scanned, and if the decisions are incorrect, the input weightings are adjusted by the software to produce the desired result. • Training runs are repeated until the success rate is maximized. • Careful selection of the training set is an important aspect of this technique. The set must contain as wide a range of different fold types as possible without duplications of structural types that may bias the decisions.

  8. Neural Networks (cont. 4) • An additional component of the PSIPRED procedures involves sequence alignment with similar proteins. • The rationale is that some amino acids positions in a sequence contribute more to the final structure than others. (This has been demonstrated by systematic mutation experiments in which each consecutive position in a sequence is substituted by a spectrum of amino acids. Some positions are remarkably tolerant of substitution, while others have unique requirements.) • To predict secondary structure accurately, one should place less weight on the tolerant positions, which clearly contribute little to the structure • One must also put more weight on the intolerant positions.

  9. Provides info on tolerant or intolerant positions Row specifies aa position 15 groups of 21 units (1 unit for each aa plus one specifying the end) Filtering network three outputs are helix, strand or coil

  10. Example of Output from PSIPRED

  11. Workshop • http://bioinf.cs.ucl.ac.uk/psipred/psiform.html

  12. 3D structure data • The largest 3D structure database is the Protein Database • It contains over 33,000 records • Each record contains 3D coordinates for macromolecules • 80% of the records were obtained from X-ray diffraction studies, 15% from NMR and the rest from other methods and theoretical calculations

  13. Part of a record from the PDB ATOM 1 N ARG A 14 22.451 98.825 31.990 1.00 88.84 N ATOM 2 CA ARG A 14 21.713 100.102 31.828 1.00 90.39 C ATOM 3 C ARG A 14 22.583 101.018 30.979 1.00 89.86 C ATOM 4 O ARG A 14 22.105 101.989 30.391 1.00 89.82 O ATOM 5 CB ARG A 14 21.424 100.704 33.208 1.00 93.23 C ATOM 6 CG ARG A 14 20.465 101.880 33.215 1.00 95.72 C ATOM 7 CD ARG A 14 20.008 102.147 34.637 1.00 98.10 C ATOM 8 NE ARG A 14 18.999 103.196 34.718 1.00100.30 N ATOM 9 CZ ARG A 14 18.344 103.507 35.833 1.00100.29 C ATOM 10 NH1 ARG A 14 18.580 102.835 36.952 1.00 99.51 N ATOM 11 NH2 ARG A 14 17.441 104.479 35.827 1.00100.79 N

  14. Steps to tertiary structure prediction • Comparative protein modeling • Extrapolates new structure based on related family members • Steps • Identification of modeling templates • Alignment • Model building

  15. Identification of modeling templates • One chooses a cutoff value from FastA or BLAST search (10-5) • Up to ten templates can be used but the one with the highest sequence similarity to the target sequence (lowest E-value) is the reference template • Ca atoms of the templates are selected for superimposition. • This generates a structurally corrected multiple sequence alignment

  16. Alignment • “Common core” of target sequence is threaded onto the template structure using only alpha carbons

  17. Framework construction

  18. Building the model • Framework construction • Average the position of each atom in target, based on the corresponding atoms in template. • Portions of the target sequence that do not match the • template are constructed from a “spare part” algorithm. • Each loop is defined by its length and C atom • coordinates of the four amino acids preceding • and following the loop.

  19. Building the model • Completing the backbone-a library of PDB entries is consulted to add carbonyl groups and amino groups. The 3-D coordinates come from a separate library of pentapeptide backbone fragments. These backbone fragments are fitted onto the target C alpha carbons. The central tri-peptide is averaged from each backbone atom (N,C,C(O)). • Side chains are added from a table of most probable rotamers that depend on backbone conformation. • Model refinement-minimization of energy

More Related