NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN Joe DeBartolo

An overview of my thesis structure prediction • Why do prediction and design matter? • Structure Prediction. Growth of sequences outpaces experimental characterization. Knowing their structure provides insights into their function and interactions • Protein design. Understanding design principles can allow the creation of new proteins with therapeutic and industirial applications amino acid sequence native protein structure protein design

Protein structure prediction and design PART I ItFix: Homology-free structure prediction PART II SPEED: ItFix enhanced with evolution PART III Future directions in prediction PART IV Protein design

Protein structure prediction 1° structure MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR local 2° structure 2° and 3°structure topology diagram 3D model Residue-residue contact map

…LEKVQLN… native structure The Challenge: Distill the folding problem down to the basic principles, code them into an algorithm, and predict pathways and structure without using homology amino acid sequence

Capturing the interrelated forces of protein structure • Ramachandran angles • backbone hydrogen bonds y local structures f • local sterics • solvation • backbone entropy • long range sterics • Van der Waals • electrostatics • hydrophobic effect

The overlapping features of local protein structure turn β-strand α-helix backbone Ramachandran torsion angles 180 y ψ f -180 -180 180 -180 180 -180 180 φ φ φ backbone H-bonds polar polar amphipathic sidechain patterning apolar apolar mostly polar

Capturing the interrelated forces of protein structure • ramachandran angles • backbone hydrogen bonds • solvation y f long range effects • sterics • Van der Waals • electrostatics • long-range hydrogen bonding

3° packing specificity of the chain hydrophobic effect surface residue placement salt bridges and other favorable pairings solvent exposed residues apolar buried residues long-range hydrogen bonding contacts that are highly separated in sequence

The structure prediction challenge: To integrate all of these features into an algorithm y f requirements 180 X ψ  φ 180 -180 -180 a way to sample conformations a way to evaluate conformations

Sample Ramachandran space Rama map of PDB Rama angle pair exclude sidechains beyond Cβ 180 y f ψ Rama angle pairs describe entire conformation... NOsidechainrotamer sampling -180 -180 180 φ

1° and 2° structure information refines the Rama search space 2° structure 1° structure 180 Entire PDB ALL-ALL-ALL ALL-ALL-ALL ψ -180 180 add amino acid identity ALL-ASN-ALL ALL-ALL-ALL ψ -180 180 add neighbor identity ALL-ASN-GLY ALL-ALL-ALL ψ -180 180 add 2° structure identity ALL-ASN-GLY BETA-ALL-ALL ψ φ 180 -180 -180 φ φ φ 180 180 180 -180 -180 -180

The structure prediction challenge: To integrate all of these features into one algorithm y f requirements 180 X ψ  φ 180 -180 -180 a way to sample conformations a way to evaluate conformations

The DOPE statistical potential Discrete Optimized Potential Energy Knowledge-based modeling of the energy of a conformation The DOPE atom pair energy… residue jamino acid j atom type j residue iamino acid i atom type I rij is the distance between atoms i and j EnergyPDB(rij) = -ln( ProbPDB(rij) ) Shen and Sali, Proteins (2007) DOPE DOPE-PW GLU-Cβ- GLU-Cβ GLU-Cβ- GLU-Cβ LEU-Cβ- LEU-Cβ LEU-Cβ- LEU-Cβ PDB DOPE PW energy DOPE energy I have added to DOPE… • orientation dependence • 2° structure dependence • eliminate local biases Distance (Å) Distance (Å)

Ca Capturing sidechain orientation in a sidechain-free model residue 1 ρ1-2 is the angle between two vectors ρ1-2 High ρ (in-line) low ρ residue 1 Cb Cβ Cβ Cβ ρ2-1 ρ2-1 ρ1-2 Ca Cα Ca residue 2 , Cβ Ca residue 2 PW = r= DeBartolo et al. PNAS 2009

DOPE-PW (uniquely) captures the hydrophobic effect Potential orientations of high PW hydrophobic residues pairs have lower energy at smaller distances Cβ Cβ Cβ Cβ Cβ Cα Cα C α C α Cα C α Cb buried in the core GLU-Cβ GLU-Cβ LEU-CβLEU-Cβ DOPE energy Distance (Å) large distance preferred

DOPE-PW captures the amphipathic nature of β-sheets potential orientations of low PW polar and apolar residues prefer opposing sides of the β-sheet C α C β C β Cβ C β C α C α C α same side of β-sheet GLU-Cβ LYS-Cβ GLU-Cβ LEU-Cβ DOPE energy opposite side of β-sheet Distance (Å)

The challenge: To integrate all of these features into one algorithm y f requirements 180 X ψ  φ 180 -180 -180 a way to sample conformations a way to evaluate conformations

ItFix Iterative Fixing to reduce the conformational search sampling library 180° Starting configuration 1° only (no 2o structure restriction) “U” ψ Fold with (f,y) from LibraryInitial “I1” Remove trimers of lowly-populated 2o structure -180° Fold with (f,y) from LibraryRestricted 1 180° “I2” Fold with (f,y) from LibraryRestricted 2 ψ 2° structure option removed search space is restricted Remove trimers Repeat until no further fixing is possible -180° Final Round Fold with (f,y) from LibraryRestricted final 180° Repeat removal “N” ψ helix Not(Strand) -180° φ -180° 180° strand Coil subtypes Not(Helix) DeBartolo et al., PNAS 2009

Homology-free ItFix 2° and 3° structure prediction results Native ---HHHHHHHHHHHHHHH-----GGGHHHHHHHHHHHHHHHT---HHHHHHHHHH-TT-THHHHHHHH- ItFix ---HHHHHHHHHHHHHHHT-----S-HHHHHHHHHHHHHHHT-S--HHHHHHHHHT---HHHHHHHHH- SSPro ---HHHHHHHHHHHHHHHHHHE-TTHHHHHHHHHHHHHHHHT--HHHHHHHHHHT-TTHHHHHHHHHH- PSIPRED ---HHHHHHHHHHHHHHH-----HHHHHHHHHHHHHHHHHH----HHHHHHHHH----HHHHHHHHH-- 1af7 2.5 Å Native -HHHHHHHHHHHTT-SS--HHHHHHHHHHHT--HHHHHHHHHHHHHHHH- ItFix --HHHHHHHHHHHH-----HHHHHHHHHHHH--S-HHHHHHHHHHHHHH- SSPro -HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH-HHHHEEHEHHHHHHH-- PSIPRED -HHHHHHHHHHHHH-----HHHHHHHHHHHHHHHHHHHHHHH-HHHH--- 1b72 1.6 Å Native -EEEEEEEEETTTTEEEEE-TTS--EEEEGGGB-SSSS----TT-EEEEEEEEETTEEEEEEEEE-- ItFix -EEEEEEEE-STTTEEEEEEET-T-EEEEEEE--SSS-----TS--EEEEEEES--S----EEEEE- SSPro --TEEEEEE-TTTTEEEE--TT--EEEEEEEHEETTT--E--TT-EEEEEEEE-TT--E-EE----- PsiPred --EEEEEEEE----EEEEE-----EEEEEEE--------------EEEEEEEE-----EEEEEE--- 1csp 6.0 Å Native --BGGG---SEEEEE-TTS-EEEEEEHHHHHHHHHHTT-EEEEEETTSSS-EEEEE- ItFix -EEE-SSSSEEEEEE-TTS-EEEEEEHHHHHHHHHHHT--EEEE-TTSSS-EEEEE- SSPro --BBTEEE-EEEEEEETTT-EEEEE-HHHHHHHHHHHT--EEEE-TT----EEEE-- PSIPRED ----------EEEEE-----EEEEE-HHHHHHHHHH----EEEE-------EEEE-- 1tif 4.2 Å Native -HHHHHHHHHHHTT--HHHHHHHHTS-HHHHHHHHTTS-SS-TTHHHHHHHTT--HHHHH- ItFix -HHHHHHHHHHHHT--HHHHHHHHT--HHHHHHHHTT--SS----HHHHHHHT--HHHHH- SSPro ---HHHHHHHHHHHHHHHHHHHHHT-HHHHHHHHHTT-------HHHHHHHHHT--HHHH- PsiPred -HHHHHHHHHHH----HHHHHHHH---HHHHHHHH------HHHHHHHHHHH---HHHH-- 1r69 2.4 Å Native -EEEEEETTS-EEEEE--TTSBHHHHHHHHHHHH---GGGEEEEETTEE--TTSBTGGGT--TT-EEEEEE- ItFix -EEEEEETTS-EEEEEE---S-B-HHHHHHHHHSS---SSEEEEETT----TT-B----------EEEEEE- SSPro -EEEEEEETTEEEEEEE---SHHHHHHHHHHHTTT---T--E--ETT-E--TT-EEEEEE--TT-EEEEEE- PSIPRED -EEEEEE----EEEEEE-----HHHHHHHHHHHH---HHHEEEEE--EE------HHH-------EEEEEE- 1ubq 3.1 Å DeBartolo et al., PNAS 2009

Major pathway (from experiment) b1 b2 helix b4 b5 310b3 1 Mimicking folding pathways Unfolded state Round 0 0 1 b1-b2 hairpin Round 1 + b3 +helix 0 1 Round 2 + b4 + b3 0 1 2° Structure frequency Round 3 + b4 +helix 0 1 Round 4 +310 helix 0 1 Round 6 + b5 0 1 Round 9 Native state b1 b2 helix b4 b5 310b3 DeBartolo et al., PNAS 2009 0 73 residue index 1

Part I Conclusions Challenge: Distill the folding problem down to the basic principles, code them into an algorithm, and predict pathways and structure without using homology What novel about how we approached this challenge? • Use basic principles of protein structure and folding. • Search strategies: mimic true folding behavior • Coupled 2° & 3° structure formation • ii) Iterative fixing to reduce the search • iii) Outputs pathway information • Energy functions: orientational and 2° structure dependence

Protein structure prediction and design  PART I ItFix: Homology-free structure prediction PART II SPEED: ItFix enhanced with evolution PART III Future directions in prediction PART IV Protein design ψ φ Cover image of Protein Science, March 2010

SPEED: Structure Prediction Enhanced by Evolutionary Diversity Increase φ, ψ diversity and accuracy sequence database multiple sequence alignment target sequence IEIKIRDIYSKTYKFMA IEITCNDRLGKKVRVKC MRLFIRSHLHDQVVISA MKLSVKSPNGRIEIFNE LQFFVRLLDGKSVTLTF IEITLNDRLGKKIRVKC IEIWVNDHLSHRERIKC MDVFLMIRRQKTTIFDA IIVTVNDRLGTKAQIPA MRISVIKLDSTSFDVAV MNVNFRTILGKTYTITV MLLTVRDRSELTFSLQV MQIFVTTPSENVFGLEV MSLTIKF-GAKSIALSL MKYRIRTISNDEAVIEL … ~1000 sequences MQIFVKTLTGKTITLEV 180° ψ 180° ψ -180° 180° -180° φ homology-free sampling -180° SPEED sampling 180° -180° φ Uses sequence data base 107seq’s, growing fast; PDB only 104 structures growing slowly

ItFix-SPEED overview Homology-free SPEED 1tif position 4 {IND , IGD , VGN,…}MSA 1tif position 4 INE …AGTYEFRKAKIT… Final Rama distribution Round 2 Rama distribution Round 1 Rama distribution Multiple Sequence Alignment homology free 180° SPEED ψ Rama Distribution Fold 500x with Eradial -180° ItFix 180° Analyze 2° Structure Statistics ψ 2° structure converged no yes -180° 180° Final 2° Structure Fold 10000x with Eradial or DOPE-PW (all α) ψ -180° φ φ 180° -180° -180° 180° DeBartolo et al., Protein Sci. 2010

ItFix-SPEED overview Homology-free SPEED 1tif position 4 {IND , IGD , VGN,…}MSA 1tif position 4 INE …AGTYEFRKAKIT… Round 2 Rama distribution Final Rama distribution Round 1 Rama distribution Multiple Sequence Alignment homology free 180° SPEED ψ prediction Rama Distribution Fold 500x with Eradial -180° ItFix 180° Analyze 2° Structure Statistics ψ 2° structure converged no yes -180° 180° Final 2° Structure Fold 10000x with Eradial or DOPE-PW (all α) ψ cluster Refine 100X each with DOPE-PW Reject ∆Eradial> 0 min<Energy> 100 -180° φ φ 180° -180° -180° 180° Largest cluster DeBartolo et al., Protein Sci. 2010

Assaying accuracy Clustering predicts model accuracy and confidence fold ItFix predicted 2° structure cluster identify best cluster Global Accuracy Local Accuracy 1af7 1b72 1r69 (i.e. we know whether we got it right or wrong)

Performance in CASP8 T0482 (4.8 Å) Global Distance Test ItFix Cut-off Distance (Å) free modeling T0405 D1 (6.4 Å ) Cut-off Distance (Å) ItFix T0464 D1 (4.5 Å) loop insertion modeling RAPTOR Cut-off Distance (Å) AashishAdhikari ItFix T0429 D2 (6.8 Å) RAPTOR ItFix template identification using folding ItFix Cut-off Distance (Å) Better template DeBartolo et al., Protein Sci. 2010 Percentage of residues

Part II Conclusions • Adding evolutionary information to ItFix improves the accuracy of the conformational search • Clustering permits global and local prediction of cluster accuracy and uncertainty • SPEED is successful in the CASP8 experiment

Protein structure prediction and design  PART I ItFix: Homology-free structure prediction  PART II SPEED: ItFix enhanced with evolution  PART III Future directions in prediction PART IV Protein design 1° structure MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR

Invert the structure prediction problem local 2° structure 2° and 3°structure topology diagram 3D model 3D contacts 1° structure MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLR

Current designs are very similar to parent sequences Can we design a more unique protein sequence?

Design method 01010111 MKLFVKTP…LTVTIR LIV R E Restrict AA possibilities by burial in native structure for the hydrophobic effect Find best sequences for maximum Rama propensity DOPE-PW GLU-Cβ- GLU-Cβ positional sequence library LEU-Cβ- LEU-Cβ 2 1 DOPE PW energy 3 Monte Carlo search of Statistical Potential Distance (Å)

Hello Jello

Preliminary wetlab analysis • 1ds0 expresses in inclusion bodies • mutations enhance in vitro solubility • further experiments needed insoluble at 3 hrs soluble at 3 hrs insoluble at induce soluble at induce cd design design-sol native wavelength (nm)

Thesis defense Conclusions • Homology-free structure prediction can provide accurate models by mimicking folding pathways • Adding evolutionary information improves the accuracy of the conformational search • Inverting our homology-free prediction method into a design algorithm aims to generate unique amino acid sequences

Acknowledgements Prof. Tobin Sosnick Prof. Karl Freed Prof. JinboXu Glen Hocky Andres Colubri James Fitzgerald AbhishekJha EsmaelHaddadian James Hinshaw AashishAdhikari Jouko Virtanen Chloe Antoniou Josiah Zayner Feng Zhao JianPeng GrzegorzGawlak SrikanthAravamuthan • Funding: NIH, NSF, Joint Theory Institute

Enhancement of Ramachandran propensity Native Rama probability ψ φ AA SecStr position • Enhancement in energy and structure prediction • ∆∆E = -120 (arb. units) • 2X enhancement in native-like models in prediction

1b72 1.6 Å 1af7 2.7 Å 1r69 2.4 Å 1di2 4.6 Å 1 Round 0 Round 0 Round 0 Round 0 0 1 Round 1 Round 1 Round 1 Round 1 0 1 Round 2 Round 2 Round 2 Round 2 Secondary Structure frequency 0 1 Round 4 Round 3 Round 3 Round 3 0 1 Round 6 Round 4 Round 5 Round 4 0 1 Round 6 Round 7 Round 6 Round 8 0 residue index

SPEED increases the native Rama probability native Rama regions 180 SPEED reduces cases where native φ, ψ has a very low probability 2 SPEED improves native φ, ψ probability across sequence % positions with PNative > 0.25 ψ 3 1 1b72 4 -180 -180 180 φ Native basin probability PDB id of target 2° structure by position Amino acid by position

Radial energy terms enforce productive chain collapse (global terms) Rg-phil Rg-Cα: Root-squared distance of Cα from CM. Compactness of model Ru-Cα: Root-mean-squared deviation of Cα from CM. Enforces a spherical model Rg-phob/Rg-phil (burial ratio): best packing of hydrophobic residues Rg-phob CMCα Rg-Cα Cα Cβ

Eliminating the fixing thresholds from ItFix (e.g. pos. 67) 180 MQIFVKT…STLHLVLR 0 round0 Rama distribution -180 180 fold 2000X fold 2000X fold 2000X 0 Rama distribution Rama distribution Rama distribution round1 -180 180 0 round2 -180 180 0 round3 -180 180 -180 0

An evolution-enhanced energy function DOPE-PW-SPEED 10 8 6 4 WT:ILE Homologs: polar 2 DOPE-PW DOPE-PW-SPEED PHE4 0 0.0 15.0 25.0 10.0 20.0 30.0 energy 5.0 WT:Ala Homologs: polar THR14 10 8 6 distance (Å) 4 DOPE-PW DOPE-PW-SPEED 2 0 energy 0.0 6.0 14.0 16.0 10.0 4.0 8.0 12.0 2.0 distance (Å)

NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

NEW APPROACHES TO PROTEIN STRUCTURE PREDICTION AND DESIGN

Presentation Transcript

Protein structure prediction

Protein Structure Prediction

Protein structure prediction

Protein Structure Prediction

Protein Structure Prediction

Protein structure prediction

Protein Structure Prediction

Protein structure prediction

Protein structure prediction

Protein Structure and Prediction

Protein Structure Prediction

Protein Structure Prediction

Protein Structure Prediction

Protein Structure Prediction

Protein Folding Protein Structure Prediction Protein Design

A new approach to protein structure prediction

Protein structure prediction

Protein structure prediction

Protein Structure Prediction

Protein Structure Prediction

Protein Structure Prediction

Protein Structure Prediction