440 likes | 569 Views
OSPREY Tutorial. Ivelin Georgiev Bruce Donald Donald Lab Duke University. Distribution of Structures. min ( ). Maximum Likelihood. (pick most probable). Global Minimum Energy Conformation. Bayesian. ò. 1 Z. (average over all conformations).
E N D
OSPREYTutorial Ivelin Georgiev Bruce Donald Donald Lab Duke University
Distribution of Structures min( ) Maximum Likelihood (pick most probable) Global Minimum Energy Conformation Bayesian ò 1 Z (average over all conformations) Probability« Energy using Boltzmann distribution
Distribution of Structures min( ) Maximum Likelihood (pick most probable) Global Minimum Energy Conformation `Bayesian’ ò 1 Z (weighted average over all conformations) Probability« Energy using Boltzmann distribution
GMEC traditional-DEE maximum likelihood MinDEE BD
K*: provably-accurate approximation to the binding constant via conformational ensembles ∫ 1 Z a GMEC traditional-DEE maximum likelihood MinDEE BD weighted average Application: Enzyme-Ligand Binding
thousands of sequences!!! MinDEE ε approximation K* A* 1 - ε BD pruned partition function conformations s1 s2 … J. Comp. Chem. (2008) si fraction evaluated confs … sk
Example(PNAS, 2009) Cheng-Yu Chen Ivelin Georgiev Amy Anderson Bruce Donald
NonRibosomal Peptide Synthetases (NRPS) • NRPS enzymes found in some fungi and bacteria • NRPS enzymes make peptide-like products with pharmaceutical properties (antifungal, antineoplastic, antibacterial)e.g. vancomycin, penicillin, gramicidin, bacitracin, cyclosporin, bleomycin, … • NRPS similar to PKS FPVOL
NRPS: GrsA-PheA Redesign gramicidin S Phe Leu
Protein Redesign (NRPS) Three-dimensional structure of GrsA PheA domain [Conti et al., 1997]
Change specificity from Phe to Leu by allowing any 2 (of 9) mutations Mutations to GAVLIFYWM Appx. 3000 Mutation Sequences = 680,000,000 Conformations (78,200 after pruning) - CO2 +H3N r = 9 s = 2 Leu
Crystal Structure: 1amu (1.9 Å) 563 a.a., 65 kD (K517) I330 C331 AMP D235 A322 A301 A236 W239 T278 I299
Three-Step Enzyme Redesign • K*: active site mutations • Entropy step: mutatable positions • MinDEE: bolstering mutations Ivelin Georgiev, Cheng-Yu Chen 1. 2. 3. provable heuristic provable Computational Structure-Based Redesign of Enzyme Activity. PNAS (2009)
T278L/A301G with Leu AMP K517 • #1 • 3,000 sequences • 6.8 108 rotameric conformations PNAS (2009)
V187 S447 V238 I207 F45 I277 L210 Mutations Outside the Active Site rotamer probabilities AA probabilities mutatable positions SCMF residue entropy Boltzmann MinDEE PNAS (2009)
All top 10 • 3,000 sequences • 6.8 108 rotameric conformations A301G/T278L [L-Leu] mM Leu Phe Normalizedkcat/ KM PNAS (2009)
L-Arg T278D/A301G with Arg Arg: #1 of 2511 sequences Lys: #4 of 2511 sequences >9 108 conformations WT AMP [L-arg] mM D235 K517 301G L-Lys W239 278D WT PNAS (2009)
Installation Setup Running OSPREY
Installation Java mpiJava MPICH2 32-bit 64-bit √ may require special instructions
Setup Compute Nodes Input Structure Rotamer Library Energy Function
Compute Nodes Select MPI nodes: linux1 linux2 linux3 linux4 linux5 mpdboot mpdboot -n 5 -f mpd.hosts Select job-specific nodes: linux1 linux1 linux1 linux2 linux3 linux3 mpirun java OSPREY mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg
Input Structure REMARK 470 MISSING ATOM REMARK 470 THE FOLLOWING RESIDUES HAVE MISSING ATOMS (M=MODEL NUMBER; REMARK 470 RES=RESIDUE NAME; C=CHAIN IDENTIFIER; SSEQ=SEQUENCE NUMBER; REMARK 470 I=INSERTION CODE): REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 34 CG CD OE1 OE2 REMARK 470 GLU A 63 CD OE1 OE2 missing atoms KiNG model delete possible over-constraint possible under-constraint
Input Structure adding hydrogens proteins general compounds recommended: MolProbity recommended: Accelrys DS Visualizer Check: protonation states missing protons
Input Structure His residues HIP HIE HID
Input Structure steric shell • close to design site • significant speedup
Input Structure Other considerations: • protein, ligand, cofactor • ligand: natural AA, small molecule • water molecules • no chain ID’s • unique residue numbers • protein-peptide, protein-protein • connectivity (good input structures)
Input Structure Check and double-check!!!
Rotamer Library rotamers Richardsons’ Penultimate proteins general compounds # dihed # rot name TYR 2 4 N CA CB CG CA CB CG CD1 62 90 -177 80 -65 -85 -65 -30 TYR 2 5 N CA CB CG CA CB CG CD1 62 90 -177 80 -65 -85 -65 -30 -65 -45 FCL 2 4 N CA CB CG CA CB CG CD1 62 90 -177 80 -65 -85 -65 -30 1 2 one rotamer
Energy Function parm96a.dat all_amino94X.in all_nuc94_and_gr.in • atom types • dihedral parameters • vdW parameters • amino acids • partial charges • connectivity • general compounds • partial charges • connectivity add params for new atom types antechamber typically no changes add params for new compounds antechamber can modify partial charges user control: distance-dependent dielectric, dielectric value, vdW radii scaling, solvation energy scaling, dihedral energies switch
Running OSPREY GMEC-based Ensemble-based Residue entropy
GMEC-based mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg input structure rotamer library energy function mutation search parameters doDEE energy minimization (MinDEE, BD, BRDEE) DACS 1 MET GLY ASP ARG FCL 6 0 2 18 3 unMinE: -273.75 minE: -273.75 bestE: -273.75 2 MET GLY ASP MET FCL 6 0 2 6 3 unMinE: -271.96 minE: -271.96 bestE: -273.75 3 MET GLY ASP ARG FCL 6 0 2 18 3 unMinE: -271.78 minE: -271.78 bestE: -273.75 1 MET GLY SER ARG FCL 6 3 2 18 2 unMinE: -276.50 minE: -276.50 bestE: -276.50 2 MET GLY SER ARG FCL 6 3 1 18 2 unMinE: -276.42 minE: -276.42 bestE: -276.50
GMEC-based java -Xmx1024M KStar -c KStar.cfg genStructDEE System.cfg GenStruct.cfg input structure rotamer library energy function struct generation parameters genStructDEE energy minimization (MinDEE, BD, BRDEE) 1 MET GLY SER ARG FCL 6 3 2 18 2 unMinE: -276.50 minE: -276.50 bestE: -276.50 2 MET GLY SER ARG FCL 6 3 1 18 2 unMinE: -276.42 minE: -276.42 bestE: -276.50 3 MET GLY ASP ARG FCL 6 0 2 18 3 unMinE: -273.75 minE: -273.75 bestE: -273.75 1 MET GLY ASP ARG FCL 6 0 2 18 3 unMinE: -273.75 minE: -273.75 bestE: -273.75 2 MET GLY ASP MET FCL 6 0 2 6 3 unMinE: -271.96 minE: -271.96 bestE: -273.75 3 MET GLY ASP ARG FCL 6 0 2 18 3 unMinE: -271.78 minE: -271.78 bestE: -273.75 1 MET GLY SER ARG FCL 6 3 2 18 2 unMinE: -276.50 minE: -276.50 bestE: -276.50 2 MET GLY SER ARG FCL 6 3 1 18 2 unMinE: -276.42 minE: -276.42 bestE: -276.50 rank
Ensemble-based: Protein-ligand binding mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi KSMaster System.cfg MutSearch.cfg bound structure rotamer library energy function K* mutation search parameters KSMaster energy minimization (MinDEE, BD, BRDEE) doSinglePartFn 1 4.25E+24 ILE TRP ILE ALA ALA ILE 2 3.12E+24 TRP ASP ILE GLY ALA ILE 3 2.18E+24 ILE THR ILE PHE ALA ILE 4 1.45E+24 VAL THR ILE PHE ALA ILE 5 1.41E+24 ILE THR ILE TYR ALA ILE
Residue entropy mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi doResEntropy System.cfg ResEntropy.cfg input structure rotamer library energy function mutation search parameters doResEntropy entropy res ID # prox res AA probabilities 257 2.33 0.2 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.1 0.0 0.0 0.0 0.2 0.0 0.1 18 481 2.29 0.2 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.0 0.1 0.0 0.1 15 32 2.29 0.3 0.0 0.1 0.1 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.1 23 26 2.28 0.2 0.0 0.1 0.0 0.0 0.1 0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.2 0.0 0.0 0.1 0.1 0.0 29 163 2.26 0.3 0.0 0.1 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.0 0.0 0.0 0.1 0.0 0.1 0.0 0.1 22
Some important parameters mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg KStar.cfg: hElect true hVDW false hSteric false distDepDielect true dielectConst 6.0 vdwMult 0.95 doDihedE true doSolvationE true solvScale 0.8 stericThresh 0.4 softStericThresh 1.5 rotFile LovellRotamer.dat grotFile GenericRotamers.dat volFile AAVolumes.dat energy function steric filter rotamer libraries volume filter
Some important parameters mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg System.cfg: pdbName 1amuFH.pdb numInAS 4 residueMap 239 278 299 301 pdbLigNum 566 ligAA false numCofRes 1 cofMap 567 input pdb design site ligand cofactor
Some important parameters mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi -c KStar.cfg doDEE System.cfg DEE.cfg DEE.cfg (partial): doDACS true distrDACS false initDepth 2 subDepth 1 diffFact 6 doMinimize false minimizeBB false doBackrubs false backrubFile none useEref true ligPresent false ligType none resAllowed0 gly ala val leu ile tyr phe trp met … resAllowed3 gly ala val leu ile tyr phe trp met resumeSearch false resumeFilename runInfo.out.partial DACS minimization reference energies ligand in search allowed mutations resuming
Some important parameters mpirun -machinefile ./machines -np 5 java -Xmx1024M KStar mpi KSMaster System.cfg MutSearch.cfg MutSearch.cfg (partial): mutFileName 1amuFCL_2MUT.mut numMutations 2 targetVolume 620.0 volumeWindow 100000000.0 doMinimize false minimizeBB false doBackrubs false backrubFile none epsilon 0.03 gamma 0.01 repeatSearch true useUnboundStruct false unboundPdbName none resAllowed0 gly ala val leu ile tyr phe trp met resumeSearch false resumeFilename 1amuFCL_MutSearch.partial volume filter/ candidate mutants minimization (1-ε) accuracy inter-mutation at most 1 repeat unbound struct allowed mutations resuming
General citation: Citing OSPREY K* and MinDEE: BD: BRDEE: DACS: Original K* publication:
Acknowledgements Bruce Donald Ryan Lilien Faisal Reza Kyle Roberts Daniel Keedy Pablo Gainza Donald Lab Funding: • NIH