Lessons on Protein Structure from Lattice Model HC Lee 李弘謙 Nanjing University Nanjing, China

Lessons on Protein Structure from Lattice Model HC Lee 李弘謙 Nanjing University Nanjing, China 2002 May 22 – 25

What is a protein? • Large molecule: chain of amino acids • Several tens to thousands residues • Folds to specific shape • Biological machines

DNA & Gene Now we know, for higher life forms: one gene, many proteins

Gene to Protein 轉錄與翻譯

What do proteins do? • Links Genotype & Phenotype 基因型與現象型 • Structural and Functional 結構與功能 • Structural • blood, muscle, bone, etc. • Functional • catalytic (enzyme), metabolic, neural, reproductive 催化、新陳代謝、神經、複製 Aberrant gene > malfunction protein > disease

Protein Conformation

HIV reverse transcriptase 反轉錄脢 Alpha helix Beta sheets

Understanding protein folding

Driving Force for Protein Folding • Most important is interaction • of residues with water – • hydrophobic and hydrophilic

Miyazawa-Jernigan Statistical Interaction

Li-Tang-Wingreen’s representation of MJ Matrix two-body one-body

Theoretical analysis[Wang & Lee, PRL 84 (2000)]

Fit to one (a) and two-body (b) terms MJ-matrix Theory

Compare with MJ-matrix Correct to first order; dominated one-body term - hydrophobicity

Lattice Model • Simple way to learn something • about a very complex subject

Lattice model • Represent space (or, in field theory, space-time) by a discrete lattice. • Represent a structure by a path on the lattice. • A peptide is a string of residues. • A peptide whose residues occupy a path is in a state, or have a conformation. • Residues may interact with each other according to relative distance. Or, • In mean-field model, residue interacts only with lattice sites.

Random coil and compact path Putting a binary peptide on 2D lattice Binary rep’n of Peptide: 0101011010010 110010110010

Mean-Field HP Model • The most important interaction for protein folding is residue with water: residues are hydrophobic (厭水)or hydrophilic (親水). • In real protein in native conformation, hydrophobic residues like to be buried, hydrophilic residues like to be exposed to water. • Simplest model: divide residues into hydrophobic and hydrophilic, structure into core and surface sites. • Both peptide and structure are binary sequences.

Structure-path on a 2D lattice Structure-path on a 2D lattice Pay attention to only whether path is on a core (1) or a surface (0) site Structure has a binary representation: 001100110000110000110011000011111100 (from Li et al. PRL 79 (1997) 765-768)

Designability of Structures • Very, very few structures • are good for proteins

Structure space >> observed structures

Protein Designability

The LTW model Ground state of peptide p is structure s closest to it in n-dimensional hyperspace. All peptides in Voronoi volume of s has s as ground state.

The Hamiltonian H = ½ (p – s)**2 is a mapping of the set of peptides P to the set of sructures S that partitions P into equivalent classes labeled by s in S. Target of each class is the ground state/conformation of the class. Designability of a structure is the number of peptides in the class mapped to that structure

Vonoroi volume Voronoi volume In hyperspace, all peptide sequences within the Voronoi volume of a structure is closest to that structure (from Li et al. PRL (1997)).

No. of structures vs designability Very few structures have high designability Number of structures Designability Li, Tang and Wingreen, PRL (1997)

Paths with high switchback numbers have high designability [Shih et al. & HCL, PRL 84 (2000)] • Shortest possible Hamming distance btw two paths proportional to difference in switchback numbers (n10) • Few paths have high n10 • Path with high n10 has large Voronoi volume, hence high designability

Hi switchback > hi design’ty

Distribution of Hamming dist.

Designability vs n10; (a) 6x6 (b) 21-site triangular Log distrib’n vs switchback no.

Foldability of Peptides • Vast majority of peptides • do not fold

Alpha helices like paths with high switchback numbers • Conformation degeneracy – disfavor peptides w/ long strings of identical/similar residues • Hence proteins rarely have long strings of contiguous hydrophobic or hydrophilic residues • Alternating short stretches of hydrophobic and hydrophilic residues yields structurally non-degenerate and robust conformations • 0011 switchback motif simulate alpha helix on the surface • Empirically most alpha helices on surface

Compare with real proteins [Shih et al. & HCL, PRE 65 (2002)] • Compare model high designability peptides with binarized (by hydrphobicity) protein sequences in PDB • Represent peptide by frequency of occurrence of set of all binary words of fixed length l=2k • Has 22k such words, put frequencies on a 2k x 2k lattce

PDBAlpha-HP PDBAll - PDBAlpha PDBAll-HP HP-LS Highly foldable peptides in HP-model resemble alpha-helices in real proteins [Shih et al. PRL 84 (2000)] Overlap of binary sequence Oligomer length

In HP model: peptide that folds into high designability conformations correspond to peptides that fold to alpha helices in real proteins

Many models give designabilitybut not all are correct • Any Hamiltonian (H) is a mapping of peptide space (P) onto conformation space (C) • For coarse grained C, H partitions P into equivalent classes, each class corresponding to a point in C • Designability results from a highly skewed distribution of the SIZES of the classes • Example. The LS (Large-Small) model: structure dominated by steric effect; small residues inside, large residues outside. Almost same math as HP model; has designability but wrong physics.

PDBAll - PDBAlpha PDBAll-LS HP-LS Highly foldable peptides in LS-model does not resemble alpha-helices in real proteins [Shih et al. PRL 84 (2000)] PDBAlpha-LS Overlap of binary sequence Oligomer length

Unlike hydrophobicitySteric effect does not play a dominant rolein the determination of native structure

Folding Funneland Free-energy Barrier • Why is folding so • fast yet so slow ?

Folding Funnel Folding funnel

Folding funnel (picture) http://www.npaci.edu/envision/v15.4/proteinfolding.html

Free Energy, Entropy and Monte Carlo Free energy and entropy

Free-energy barrier • [Guan, Su, Shih & Lee (2000)] • Biding energy increase with • compactness • (b) Entropy lost rapidly as binding energy increases • (c) Free-energy barrier formed by competition btw energy gain and entropy lost (b) Free-energy barrier Log (S) |E/Enative| (a) (c) barrier low T No. of contacts G = (E – TS)/Enative annealing high T |E/Enative| |E/Enative|

Getting over the barrier takes all the folding time

Summary of lessons • Average hydrophobic/hydrophlic property of residues can be understood by simple physics. • Lattice model useful for examining coarse-grain phenomena. • Long folding time caused by need to surmount free-energy barrier formed by rapid lost of entropy. • Designability of structure is a direct consequence of hydrophobic/hydrophlic dichotomy of residues.

Summary of lessons (cont’d) Very few structures are highly designable; those that are have large switchback numbers. Very few peptides are foldable; many of those that are alternate rapidly between hydrophobic and hydrophlic residues. Highly foldable peptides folded into high designability structures form robust proteins. They fold easily into alpha-helices and to a lesser extent to beta-sheets; hence alpha-helices are formed very, very early in folding process, then beta-sheets.

Molecular Dynamics - atomistic description of protein folding • takes one giga-flop PC to run • one-million days to fold a • medium small protein

Massively Distributive Computation • Molecular dynamics. • Atomistic level simulation needed to understand protein folding and function relevant to biology and drug design • Annealing time very long • Boltzmann probability: one machine x 1 M days = 1 M machines x one day • Starting a program of massively distributive computation - use screen saver program for simulation • of Vijay Pande, Stanford

The End謝謝大家

Lessons on Protein Structure from Lattice Model HC Lee 李弘謙 Nanjing University Nanjing, China