530 likes | 761 Views
From Sequences to Structure. Illustrations from: C Branden and J Tooze , Introduction to Protein Structure, 2 nd ed. Garland Pub. ISBN 0815302703. Protein Functions. Mechanoenzymes: myosin, actin Rhodopsin: allows vision Globins: transport oxygen Antibodies: immune system
E N D
From Sequences to Structure Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN 0815302703
Protein Functions • Mechanoenzymes: myosin, actin • Rhodopsin: allows vision • Globins: transport oxygen • Antibodies: immune system • Enzymes: pepsin, renin, carboxypeptidase A • Receptors: transmembrane signaling • Vitelogenin: molecular velcro • And hundreds of thousands more…
Proteins are Chains of Amino Acids • Polymer – a molecule composed of repeating units
The Peptide Bond • Dehydration synthesis • Repeating backbone: N–C–C –N–C–C • Convention – start at amino terminus and proceed to carboxy terminus O O
Peptidyl polymers • A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids. • Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues. amidenitrogen carbonylcarbon
Side Chain Properties • Recall that the electronegativity of carbon is at about the middle of the scale for light elements • Carbon does not make hydrogen bonds with water easily – hydrophobic • O and N are generally more likely than C to h-bond to water – hydrophilic • We group the amino acids into three general groups: • Hydrophobic • Charged (positive/basic & negative/acidic) • Polar
The Hydrophobic Amino Acids Proline severely limits allowable conformations!
More Polar Amino Acids And then there’s…
Phi and psi • = = 180° is extended conformation • : C to N–H • : C=O to C
The Ramachandran Plot • G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi • Note the structural importance of glycine Observed (non-glycine) Observed (glycine) Calculated
Primary and Secondary Structure • Primary structure = the linear sequence of amino acids comprising a protein:AGVGTVPMTAYGNDIQYYGQVT… • Secondary structure • Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheet • The location and direction of these periodic, repeating structures is known as the secondary structure of the protein
The alpha Helix 60°
Properties of the alpha helix • 60° • Hydrogen bondsbetween C=O ofresidue n, andNH of residuen+4 • 3.6 residues/turn • 1.5 Å/residue rise • 100°/residue turn
Properties of -helices • 4 – 40+ residues in length • Often amphipathic or “dual-natured” • Half hydrophobic and half hydrophilic • Mostly when surface-exposed • If we examine many -helices,we find trends… • Helix formers: Ala, Glu, Leu,Met • Helix breakers: Pro, Gly, Tyr,Ser
The beta Strand (and Sheet) 135°+135°
Properties of beta sheets • Formed of stretches of 5-10 residues in extended conformation • Pleated – each C a bitabove or below the previous • Parallel/aniparallel,contiguous/non-contiguous OCCBIO 2006 – Fundamental Bioinformatics
Parallel and anti-parallel -sheets • Anti-parallel is slightly energetically favored Anti-parallel Parallel
Turns and Loops • Secondary structure elements are connected by regions of turns and loops • Turns – short regionsof non-, non-conformation • Loops – larger stretches with no secondary structure. Often disordered. • “Random coil” • Sequences vary much more than secondary structure regions
Levels of Protein Structure • Secondary structure elements combine to form tertiary structure • Quaternary structure occurs in multienzyme complexes • Many proteins are active only as homodimers, homotetramers, etc.
Disulfide Bonds • Two cyteines in close proximity will form a covalent bond • Disulfide bond, disulfide bridge, or dicysteine bond. • Significantly stabilizes tertiary structure.
Determining Protein Structure • There are ~ 100,000 distinct proteins in the human proteome. • 3D structures have been determined for 14,000 proteins, from all organisms • Includes duplicates with different ligands bound, etc. • Coordinates are determined by X-ray crystallography
X-Ray diffraction • Image is averagedover: • Space (many copies) • Time (of the diffractionexperiment)
Electron Density Maps • Resolution is dependent on the quality/regularity of the crystal • R-factor is a measure of “leftover” electron density • Solvent fitting • Refinement
The Protein Data Bank • http://www.rcsb.org/pdb/ ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213 ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214 ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215 ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216 ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217 ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218 ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219 ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220 ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221 ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222 ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223 ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224 ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225 ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226 ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227 ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228
Views of a Protein Wireframe Ball and stick
Views of a Protein Spacefill Cartoon CPK colors Carbon = green, black Nitrogen = blue Oxygen = red Sulfur = yellow Hydrogen = white
The Protein Folding Problem • Central question of molecular biology:“Given a particular sequence of amino acid residues (primary structure), what will the tertiary/quaternary structure of the resulting protein be?” • Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)
Forces Driving Protein Folding • It is believed that hydrophobic collapse is a key driving force for protein folding • Hydrophobic core • Polar surface interacting with solvent • Minimum volume (no cavities) • Disulfide bond formation stabilizes • Hydrogen bonds • Polar and electrostatic interactions
Folding Help • Proteins are, in fact, only marginally stable • Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form • Many proteins help in folding • Protein disulfide isomerase – catalyzes shuffling of disulfide bonds • Chaperones – break up aggregates and (in theory) unfold misfolded proteins
The Hydrophobic Core • Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen. • The mutation E6V in the chain places a hydrophobic Val on the surface of hemoglobin • The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently • Sickle cell anemia was the first identified molecular disease
Sickle Cell Anemia Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.
Computational Problems in Protein Folding • Two key questions: • Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein? • H-bonds, electrostatics, hydrophobic effect, etc. • Derive a function, see how well it does on “real” proteins • Optimization – once we get an evaluation function, can we optimize it? • Simulated annealing/monte carlo • EC • Heuristics
Fold Optimization • Simple lattice models (HP-models) • Two types of residues: hydrophobic and polar • 2-D or 3-D lattice • The only force is hydrophobic collapse • Score = number of HH contacts
Scoring Lattice Models H/P model scoring: count noncovalent hydrophobic interactions. Sometimes: Penalize for buried polar or surface hydrophobic residues
What can we do with lattice models? • For smaller polypeptides, exhaustive search can be used • Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process • For larger chains, other optimization and search methods must be used • Greedy, branch and bound • Evolutionary computing, simulated annealing • Graph theoretical methods
Learning from Lattice Models The “hydrophobic zipper” effect: Ken Dill ~ 1997
Representing a lattice model Absolute directions UURRDLDRRU Relative directions LFRFRRLLFFL Advantage, we can’t have UD or RL in absolute Only three directions: LRF What about bumps? LFRRR Bad score Use a better representation
Preference-order representation • Each position has two “preferences” • If it can’t have either of the two, it will take the “least favorite” path if possible • Example: {LR},{FL},{RL},{FR},{RL},{RL},{FR},{RF} • Can still cause bumps:{LF},{FR},{RL},{FL},{RL},{FL},{RF},{RL},{FL}
More Realistic Models • Higher resolution lattices (45° lattice, etc.) • Off-lattice models • Local moves • Optimization/search methods and / representations • Greedy search • Branch and bound • EC, Monte Carlo, simulated annealing, etc.
The Other Half of the Picture • Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold). • Theoretical force field: G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb • Empirical force fields • Start with a database • Look at neighboring residues – similar to known protein folds?
Threading: Fold recognition • Given: • Sequence: IVACIVSTEYDVMKAAR… • A database of molecular coordinates • Map the sequence onto each fold • Evaluate • Objective 1: improve scoring function • Objective 2: folding
Secondary Structure Prediction AGVGTVPMTAYGNDIQYYGQVT… A-VGIVPM-AYGQDIQY-GQVT… AG-GIIP--AYGNELQ--GQVT… AGVCTVPMTA---ELQYYG--T… AGVGTVPMTAYGNDIQYYGQVT… ----hhhHHHHHHhhh--eeEE…
Secondary Structure Prediction • Easier than folding • Current algorithms can prediction secondary structure with 70-80% accuracy • Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222. • Based on frequencies of occurrence of residues in helices and sheets • Neural network based • Uses a multiple sequence alignment • Rost & Sander, Proteins, 1994 , 19, 55-72
Chou-Fasman Algorithm • Identify -helices • 4 out of 6 contiguous amino acids that have P(a) > 100 • Extend the region until 4 amino acids with P(a) < 100 found • Compute P(a) and P(b); If the region is >5 residues and P(a) > P(b) identify as a helix • Repeat for -sheets [use P(b)] • If an and a region overlap, the overlapping region is predicted according to P(a) and P(b)
Chou-Fasman, cont’d • Identify hairpin turns: • P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3) • Predict a hairpin turn starting at positions where: • P(t) > 0.000075 • The average P(turn) for the four residues > 100 • P(a) < P(turn) > P(b) for the four residues • Accuracy 60-65%