Protein structure prediction: The holy grail of bioinformatics

Protein structure prediction:The holy grail of bioinformatics

Proteins: Four levels of structural organization: Primary structure Secondary structure Tertiary structure Quaternary structure

Primary structure = the linear amino acid sequence

Secondary structure = spatial arrangement of amino-acid residues that are adjacent in the primary structure

a helix = A helical structure, whose chain coils tightly as a right-handed screw with all the side chains sticking outward in a helical array. The tight structure of the a helix is stabilized by same-strand hydrogen bonds between -NH groups and -CO groups spaced at four amino-acid residue intervals.

The b-pleated sheet is made of loosely coiled b strands are stabilized by hydrogen bonds between -NH and -CO groups from adjacent strands.

An antiparallel β sheet. Adjacent β strands run in opposite directions. Hydrogen bonds between NH and CO groups connect each amino acid to a single amino acid on an adjacent strand, stabilizing the structure.

A parallel β sheet. Adjacent β strands run in the same direction. Hydrogen bonds connect each amino acid on one strand with two different amino acids on the adjacent strand.

Silk fibroin

a helix b sheet (parallel and antiparallel) tight turns flexible loops irregular elements (random coil)

Tertiary structure = three-dimensional structure of protein

The tertiary structure is formed by the folding of secondary structures by covalent and non-covalent forces, such ashydrogen bonds,hydrophobic interactions,salt bridgesbetween positively and negatively charged residues, as well asdisulfide bondsbetween pairs of cysteines.

Quaternary structure = spatial arrangement of subunits and their contacts.

Holoproteins & Apoproteins Holoprotein Prosthetic group Apoprotein Holoprotein Prosthetic group

Apohemoglobin = 2a + 2b

Prosthetic group Heme

Hemoglobin = Apohemoglobin + 4Heme

Christian B. Anfinsen 1916-1995 Sela M, White FH, & Anfinsen CB. 1959. The reductive cleavage of disulfide bonds and its application to problems of protein structure. Biochim. Biophys. Acta. 31:417-426.

Not all proteins fold independently. Chaperones.

The denaturation and renaturation of proteins

Reducing agents: Ammonium thioglycolate (alkaline) pH 9.0-10 Glycerylmonothioglycolate (acid) pH 6.5-8.2

Oxidant

What do we need to know in order to state that the tertiary structure of a protein has been solved? Ideally: We need to determine the position of all atoms and their connectivity. Less Ideally: We need to determine the position of all Cbackbone structure).

Protein structure: Limitations and caveats • Not all proteins or parts of proteins assume a well-defined 3D structure in solution. • Protein structure is not static, there are various degrees of thermal motion for different parts of the structure. • There may be a number of slightly different conformations in solution. • Some proteins undergo conformational changes when interacting with STUFF.

Experimental Protein Structure Determination • X-ray crystallography • most accurate • in vitro • needs crystals • ~$100-200K per structure • NMR • fairly accurate • in vivo • no need for crystals • limited to very small proteins • Cryo-electron-microscopy • imaging technology • low resolution

Why predict protein structure? • Structural knowledge = some understanding of function and mechanism of action • Predicted structures can be used in structure-based drug design • It can help us understand the effects of mutations on structure and function • It is a very interesting scientific problem (still unsolved in its most general form after more than 50 years of effort)

Secondary structure prediction

Secondary structure prediction • Historically first structure prediction methods predicted secondary structure • Can be used to improve alignment accuracy • Can be used to detect domain boundaries within proteins with remote sequence homology • Often the first step towards 3D structure prediction • Informative for mutagenesis studies

Protein Secondary Structures (Simplifications) -HELIX -STRAND COIL (everything else)

Assumptions • The entire information for forming secondary structure is contained in the primary sequence • side groups of residues will determine structure • examining windows of 13-17 residues is sufficient to predict secondary structure • a-helices 5–40 residues long • b-strands 5–10 residues long

Predicting Secondary Structure From Primary Structure • accuracy 64-75% • higher accuracy for a-helices than for b-sheets • accuracy is dependent on protein family • predictions of engineered (artificial) proteins are less accurate

A surprising result! Chameleon sequences

The “Chameleon” sequence sequence 1 sequence 2 TEAVDAATAEKVFKQYANDNGVDGEWTYDDATKTFTVTEK Replace both sequences with an engineered peptide (“chameleon”) TEAVDAWTVEKAFKTFANDNGVDGAWTVEKAFKTFTVTEK a -helix b-strand Source: Minor and Kim. 1996. Nature 380:730-734

Measures of prediction accuracy • Qindex and Q3 • Correlation coefficient

Qindex Qindex: (Qhelix, Qstrand, Qcoil, Q3) • percentage of residues correctly predicted as a-helix, b-strand, coil, or for all 3 conformations. Drawbacks: - even a random assignment of structure can achieve a high score (Holley & Karpus 1991)

Correlation coefficient Ca= 1 (=100%)

Methods of secondary structure prediction

First generation methods: single residue statistics Chou & Fasman (1974 & 1978) : Some residues have particular secondary-structure preferences. Based on empirical frequencies of residues in -helices, -sheets, and coils. Examples: Glu α-helix Val β-strand

Chou-Fasman method

Chou-Fasman Method • Accuracy: Q3 = 50-60%

Second generation methods: segment statistics • Similar to single-residue methods, but incorporating additional information (adjacent residues, segmental statistics). • Problems: • Low accuracy - Q3 below 66% (results). • Q3 of -strands (E) : 28% - 48%. • Predicted structures were too short.

The GOR method • developed by Garnier, Osguthorpe & Robson • build on Chou-Fasman Pij values • evaluate each residue PLUS adjacent 8 N-terminal and 8 carboxyl-terminal residues • sliding window of 17 residues • underpredicts b-strand regions • GOR method accuracy Q3 = ~64%

Third generation methods • Third generation methods reached 77% accuracy. • They consist of two new ideas: 1. A biological idea – Using evolutionary information based on conservation analysis of multiple sequence alignments. 2. A technological idea – Using neural networks.

Artificial Neural Networks An attempt to imitate the human brain (assuming that this is the way it works).

Protein structure prediction: The holy grail of bioinformatics

Protein structure prediction: The holy grail of bioinformatics

Presentation Transcript

Protein structure prediction

Protein Structure Prediction

DNA, RNA, Protein Structure Prediction

Protein Structure Prediction using ROSETTA

Protein structure prediction: The holy grail of bioinformatics

The Holy Grail

Bioinformatics

Ab-initio protein structure prediction

Computational Methods for Protein Structure Prediction

Prediction of Protein Structure and Function on a Proteomic Scale

BCB 444/544

Lecture 10 Secondary Structure Prediction

Applied Bioinformatics

Protein secondary structure prediction methods

Protein primary structure

Bioinformatic Quest for the Holy Grail: UG gene

Secondary Structure Prediction

COT 6930 HPC and Bioinformatics Protein Structure Prediction

Protein structure prediction

Protein Structure Analysis - II

Protein Structure Prediction