1 / 44

The Inverse Protein Folding Problem*

Canada-China Industrial Workshop, 2005. Hong Kong Baptist University. The Inverse Protein Folding Problem*. Arvind Gupta Simon Fraser University May 24, 2005. *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. Huang. Outline. Background Forces in Protein Folding

lot
Download Presentation

The Inverse Protein Folding Problem*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Canada-China Industrial Workshop, 2005 Hong Kong Baptist University The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. Huang

  2. Outline • Background • Forces in Protein Folding • Hydrophobic-Polar Model • Protein Databank • Determining Attributes of the Ideal Lattice • Future Steps

  3. DNA • Genetic code • A “string” of nucleotides over A C G T • Code for all proteins • Self-replicating

  4. Proteins • A “string” over 20 amino acids • In solvent will fold into a unique 3D spatial structure with minimal energy

  5. Protein Structure • Structure determines protein function. • Proteins normally are in an aqueous environment • Proteins are globular.

  6. Proteins in the body • Proteins are involved in all processes in the body: Insulin Hemoglobin

  7. Proteins and diseases M. Thorpe, Protein Folding, HIV and Drug Design, Physics and Technology Forefronts (2003).

  8. Forward Protein Folding Problem • Identify the protein structure for a specific amino acid sequence. MAGWTRLS.. • Central open problem in biology • NP-hard under most models

  9. Inverse Protein Folding Problem • Given a structure (or a functionality) identify an amino acid sequence whose fold will be that structure (exhibit that functionality). • Crucial problem in drug design. • NP-hard under most models.

  10. Forces acting on Proteins • Hydrogen Bonding • Van der Waals interactions • Ion pairing • Disulfide bonds • Intrinsic properties (conformational preference) • Hydrophobicity: the dominant force in protein folding (Dill, 1990) • Hydro (water) • philic (loving) • phobic (fearing)

  11. Hydrophobic Interactions • Each amino acid can be classified as either hydrophobic or hydrophilic (polar) • Hydrophobic [Polar] are in a higher [lower] energy state in an aqueous environment.

  12. Hydrophobic – Polar (HP) Model • Introduced by Dill (1985) and Chan (1985) • “0” for polar; “1” for hydrophobic • Protein sequence embedded on lattice • Each amino acid in exactly one cell • Interactions across adjacent cells • Empty lattice cells contain water • Given protein maximize hydrophobic interactions (native fold). • IE: Given 0-1 string embed onto a lattice, maximizing adjacent 1’s.

  13. The 2-D Square Lattice Protein: • Hydrophobic “1”: Polar “0”: • Peptide bond: Hydrophobic interaction: • Example.

  14. Inverse protein folding • Problem: For a given shape find a protein (amino acid string) with a native fold approximating the shape. • Example.

  15. Constructible structures Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S. • Proof by induction: • Base case: p(S)=010010010010

  16. Constructible structures Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S. • Proof by induction: • Inductive case:

  17. Constructible structures Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S. • Proof by induction: • Inductive case:

  18. Constructible structures Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S. • Proof: • Folds are saturated: every hydrophobic “1” is involved in two hydrophobic interactions • saturated implies native

  19. Stability of proteins • Proteins is stable if it has unique “native fold” (fold with minimal energy). • Most natural proteins are stable. • The protein in our example is not stable: Together 82 native folds!

  20. Stability of proteins Conjecture: For any constructible structure S, the protein p(S) is stable. • Tested for >20,000 constructible structures. • Mathematically proved for two simple infinite classes of constructible structures L0 and L1. L0: L1:

  21. Boundary squares • Diagonal frame: the smallest diagonal rectangle containing all hydrophobic “1”-s. • Boundary square: hydrophobic “1” lying on the border of diagonal frame. 5 boundary squares

  22. Boundary squares • Useful to find the last tile of constructible structure. • A saturated fold has at least 4 of them. Lemma.Let p=0{0,1}*0 be a protein string not containing 11, 000 and 10101 as a substring. For every saturated fold of p, each boundary square not adjacent to a terminal is the main square of a corner-closed core.

  23. Proof for L0 structures • Take a saturated fold for p(S), L0. • It has at least 4 boundary squares, and at least 2 not adjacent to a terminal (the first or the last amino acid). • By Lemma, each is contained in a corner-closed core, i.e., is a red 1 of substring 1001001 of the protein string. • In p(S)=0(10010)n(01001)n0, there are only two occurrences of substring 1001001, and they are overlapping. • Hence, cores match each other and form a fully-closed core (closed on 3 sides) - the last tile. • Cut the last tile and apply induction.

  24. L1 structures are more complex • p(S)=0(10010)n010(10010)m(01001)m01(01001)n-10 • p(S) contains one occurrence of substring 10101 (Lemma cannot be directly applied) and three occurrences of 1001001 (two corner-closed cores does not imply a fully-closed core).

  25. Choosing a Lattice • 2D is easier • Fewer options for combinatorial case analysis • More visually intuitive • Torsion angles describe protein mainchain • 3D is more relevant • More biologically relevant • More representative of actual protein structures • Directly applicable to known protein structures

  26. Protein Data Bank (PDB) • Worldwide repository for 3-D biological macromolecular structure data • Contains 30857 known protein structures (May17,2005) • Structures derived using different techniques • Nuclear Magnetic Resonance spectroscopy • X-ray crystallography • PDB ‘known structures’ are really models of the structure of a protein

  27. Determining Ideal Lattice Attributes • Should all edges of the lattice be identical in length? • How should distances between non-adjacent lattice points behave? • What angles should the lattice have? • How regular should the lattice be? Use PDB statistics to answer these questions

  28. Assemble a Set of Proteins Create a protein structure subset of good quality protein structures from the PDB: • Protein structures generated using X-ray diffraction • High resolution structures (<= 1.75 Å) • Model fits the experimental data well Result: 3704 Protein structures in subset

  29. Q1: Uniform Edge Length? Overall distribution of consecutive residue distance: Consecutive residue distance appears consistently with length 3.8 Å. Answer to Question 1: All edge lengths should be uniform with length 3.8 Å.

  30. Q2: Non-adjacent Vertex Distances? Overall distribution of non-consecutive residue distance: • minimum distance: 3.06 Å • only 10 distances < 3.5Å • 1813 distances < 3.8Å • (out of 426 billion pairs). Answer to Question 2: Non-adjacent vertices should be at least 3.8 Å apart.

  31. Q3: Lattice Angles? One amino acid Amino acid chain

  32. Q3: Lattice Angles? Overall distribution of Ca angles: • Calculate Ca angles: angle produced by three consecutive Ca atoms • Group results by middle amino acid residue type • Bimodal distribution: • Sharp peak at 90o • Shallow peak at 120o

  33. Q3: Lattice Angles? Some differences appear for Ca angles around certain amino acids: Shown: Proline, Phenylalanine, Aspartic acid

  34. Q4: Lattice Regularity? • Determine average corresponding coordinate root square mean deviation (c-RMS) values between the original PDB structure and lattice approximated structures (over the entire 3704 PDB protein subset) ai = coordinates of lattice vertex corresponding to bi bi = coordinates of residue in protein X-ray structure

  35. Q4: Lattice Regularity? • Periodic Lattices: Cubic and Face-Centered-Cubic (FCC) • Randomized Lattices: Shift each vertex in periodic lattices by a random value from normal (0, 0.0025) distribution, preserve edges • De Novo Random Lattices: Generate random nodes and edges, maintain average degree and edge length of periodic lattices

  36. Q4: Lattice Regularity? • average c-RMS values generally increase as the randomization of the lattices increase Answer to Question 4: Periodic lattices achieve better approximation of protein structure than random lattices of the same degree

  37. Results: Ideal Lattice Attributes • Uniform edge lengths of 3.8Å • Mimimum distance between any two vertices of 3.8Å • Supporting mainly 90o and 120o angles • Periodic in structure

  38. Candidate lattices (space-filling) cubic hex. prism truncated octahedron truncated tetrahedron cuboctahedron

  39. Candidate lattices (vector-based) Face-centered cubic (FCC) Side+FCC (S+FCC) Extended FCC (e-FCC)

  40. RMS comparison of lattices

  41. Angle comparison of lattices

  42. Future • Investigate candidate lattices to determine an ideal lattice for inverse protein folding • Mathematically prove that the ideal lattice can generate stable sequences for specified protein shapes within the HP model • Attempt to assign specific amino acids to lattice sites

  43. Future • Investigate protein sequences generated by the model for stability and folding properties. • Incorporate other protein folding forces • Hydrogen Bonding • Van der Waals interactions • Intrinsic properties (conformational preference) • Ion pairing • Disulfide bonds

  44. Questions?

More Related