270 likes | 511 Views
Comparative modeling. Ole Lund, Associate Professor, CBS, BioCentrum, DTU. Comparative modeling. Also known as homology modeling Uses template from related protein to build model Based on the finding that
E N D
Comparative modeling Ole Lund, Associate Professor, CBS, BioCentrum, DTU
Comparative modeling • Also known as homology modeling • Uses template from related protein to build model • Based on the finding that • Protein structure tend to remain approximately the same even when many amino acids have changed during evolution! • selection for conservation of structure? • proteins with similar sequences often have similar structures OL
Why make structural models? • Fast and cheap alternative to experimental determination of structures (X-ray & NMR) • Not as accurate as experimental methods • Not all proteins can be modeled with current methods • Applications • Drug discovery (Requires accurate model) • Plan new experiments (mutations) • Understanding of function OL
Steps in comparative modeling • Find template • Make alignment • Build loops • Model side chains • Refinement • Evaluate model OL
Recovery from errors • An error on an earlier step is normally unrecoverable on a later step • The alignment can not make up for a wrong choice of template • Loop modeling can not make up for a wrong alignment • Errors may be discovered on a later step and corrected for by going back and correcting it • i.e. by selecting a new (and better) template OL
Template identification • Search with sequence • Blast • Psi-Blast • Fold recognition methods • Use significance levels (P or E values) - not %ID • BLAST reports E-values: • # of random hits with expected to be found with a given score • Rather than P values: • probability of finding at least one hit with a given score • P = 1- exp(-E) • E=loge(1-P) • http://www.ncbi.nlm.nih.gov/BLAST/tutorial/Altschul-1.html • Use biological information • Functional annotation in databases • Active site/motifs OL
Example: Query sequence >gi|2065035|emb|CAA65601.1| beta-lactamase [Chryseobacterium meningosepticum MLKKIKISLILALGLTSLQAFGQENPDVKIEKLKDNLYVYTTYNTFNGTKYAANAVYLVTDKGVVVIDCP WGEDKFKSFTDEIYKKHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFD NNKSFKVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQ KFSGAQYVVAGHDDWKDQRSIQHTLDLINEYQQKQKASN Since the discovery of penicillin, bacteria have developed defense mechanisms against these drugs. In particular, this has become a problem during the last decades, where certain pathogenic bacteria have become resistant to antibiotics. The primary defense mechanism is production of beta-lactamases, which are enzymes cleaving beta-lactam antibiotics. http://www.matfys.kvl.dk/~antony/ OL
http://www.ncbi.nlm.nih.gov/blast/ Blast search vs. pdb >gi|3318914|pdb|1A7T|A Chain A, Metallo-Beta-Lactamase With Mes gi|3318915|pdb|1A7T|B Chain B, Metallo-Beta-Lactamase With Mes gi|3891997|pdb|1A8T|A Chain A, Metallo-Beta-Lactamase In Complex With L-159,061 gi|3891998|pdb|1A8T|B Chain B, Metallo-Beta-Lactamase In Complex With L-159,061 Length = 232 Score = 126 bits (317), Expect = 7e-30 Identities = 62/216 (28%), Positives = 111/216 (51%), Gaps = 1/216 (0%) Query: 27 DVKIEKLKDNLYVYTTYNTFNG-TKYAANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYK 85 D+ I +L D +Y Y + G +N + ++ + ++D P + + + + + Sbjct: 10 DISITQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTD 69 Query: 86 KHGKKVIMNIATHSHDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSF 145 KV I H H D GGL Y + G ++Y+ +MT + ++ P ++ F ++ + Sbjct: 70 SLHAKVTTFIPNHWHGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTV 129 Query: 146 KVGKSEFQVYYPGKGHTADNVVVWFPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSV 205 + Q YY G GH DN+VVW P E +L GGC++K + +G I +A V W +++ Sbjct: 130 SLDGMPLQCYYLGGGHATDNIVVWLPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTL 189 Query: 206 HNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLINEY 241 ++ KF A+YVV GH ++ I+HT ++N+Y Sbjct: 190 DKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVNQY 225 OL
Template sequence 1A8TB. Chain B, Metallo-...[gi:3891998] BLink, Domains, Links LOCUS 1A8T_B 232 aa linear BCT 23-MAR-1998 DEFINITION Chain B, Metallo-Beta-Lactamase In Complex With L-159,061. ACCESSION 1A8T_B VERSION 1A8T_B GI:3891998 DBSOURCE pdb: molecule 1A8T, chain 66, release Mar 23, 1998; deposition: Mar 23, 1998; class: Hydrolase; source: Mol_id: 1; Organism_scientific: Bacteroides Fragilis; Strain: Tal3636; Variant: Clinical Isolate; Gene: Ccra; Expression_system: Escherichia Coli; Exp. method: X-Ray Diffraction. KEYWORDS . SOURCE Bacteroides fragilis ORGANISM Bacteroides fragilis Bacteria; Bacteroidetes; Bacteroides (class); Bacteroidales; Bacteroidaceae; Bacteroides. …………… ORIGIN 1 aqksvkisdd isitqlsdkv ytyvslaeie gwgmvpsngm ivinnhqaal ldtpindaqt 61 emlvnwvtds lhakvttfip nhwhgdcigg lgylqrkgvq syanqmtidl akekglpvpe 121 hgftdsltvs ldgmplqcyy lggghatdni vvwlptenil fggcmlkdnq ttsignisda 181 dvtawpktld kvkakfpsar yvvpghgnyg gteliehtkq ivnqyiests kp // OL
Template recognitionBlaB – Beta lactamase Template 1A8T Chain A OL
Alignment of query and template • Look at the alignment used to find the template • Are secondary structure elements active sites and other motifs aligned? • Can gaps be closed? • Are there place for the insertions? • Change the alignment manually or by a different alignment program/alignment parameters • Take care not to change it for the worse • On average I only make things slightly worse by manual intervention! OL
Alignment BlaB – Beta lactamase BLAB EKLKDNLYVYTTYNTFNGTKY-AANAVYLVTDKGVVVIDCPWGEDKFKSFTDEIYKKHGKKVIMNIATHS1A8T.A TQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHWBLAB HDDRAGGLEYFGKIGAKTYSTKMTDSILAKENKPRAQYTFDNNKSFKVGKSEFQVYYPGKGHTADNVVVW1A8T.A HGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIVVWBLAB FPKEKVLVGGCIIKSADSKDLGYIGEAYVNDWTQSVHNIQQKFSGAQYVVAGHDDWKDQRSIQHTLDLIN1A8T.A LPTENILFGGCMLKDNQTTSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGHGNYGGTELIEHTKQIVNBLAB EYQQKQK1A8T.A QYIESTS Sequence identity 27% OL
Template vs alignment identification • If the template was hard to find the correct alignment will be tough to make • If the Template is correct part of the model will normally be correct OL
Build loops • Fragment based methods • Many implementations (M Levitt, L Holm, D Baker etc.) • Fast • Energy based methods • Avoid stereo-chemically infeasible solutions • Can see what is bad but not what is good! • Combination of methods is often used • No method can move the model (very much) towards the native conformation i.e reduce the root mean square deviation (RMSD) = How many Ångstrøms you are off OL
http://www.bioinfo.rpi.edu/~bystrc/hmmstr/server.php Loops: The rosetta method • Find fragments (10 per amino acid) with the same sequence and secondary structure profile as the query sequence • Combine them using a Monte Carlo scheme to build them to build the loop Baker et al. OL
Model side chains • Knowledge based methods • SCWRL performed well in CASP4 (http://dunbrack.fccc.edu/SCWRL3.php , http://dunbrack.fccc.edu/scwrl3protsci.pdf ) • Energy calculations • Slow OL
SCWRL (Bower, Cohen & Dunbrack) • Sidechain placement With a Rotamer Library • Assumes constant angles and distances of bonds • Each residue begins in its most favored rotamer • Rotamer search to remove steric clashes between sidechains and backbone • Rotamer search to remove steric clashes between sidechains OL
Model evaluation • Is the structure unlikely? • Distributions of • Dihedral angles (fraction in most favored regions) • Bond lengths and angles • Procheck • www.biochem.ucl.ac.uk/~roman/procheck/procheck.html OL
Benchmarking comparative modeling • CASP • Critical Assessment of Structure Predictions • Sequences from about-to-be-solved-structures are given to groups who submit their predictions before the structure is published • EVA • Newly solved structures are send to prediction servers. • Evaluates automatic servers OL
CASP4: Best overall fold • Venclovas, C • Baker, D • Sternberg, M • Rychlewski, L (Bioinfo.PL) • SBI-AT Tramantano et al., 2001 OL
CASP4: Best details of models • Venclovas, C • Sternberg, M • Honig, B • Baker, D • SBI-AT Tramantano et al., 2001 OL
http://cubic.bioc.columbia.edu/eva/cm/res/rank.html EVA Analysis of Fold accuracy (% Equivalent Positions): Ranking of the methods: 1. sdsc12. 3djigsaw3. SwissModel4. cphmodels5. esypred OL
Links to modeling servers • Database of links • http://mmtsb.scripps.edu/cgi-bin/renderrelres?protmodel • SwissModel • www.expasy.ch/swissmod/SM_FIRST.html • 3D-Jigsaw • www.bmm.icnet.uk/servers/3djigsaw/ • SDSC1 • http://cl.sdsc.edu/hm.html • ESyPred3D • http://www.fundp.ac.be/urbm/bioinfo/esypred/ • CPHmodels • www.cbs.dtu.dk/services/CPHmodels-2.0 OL
Practical conclusions • Several servers exist in the public domain • Template and alignment must be correct • Loops are difficult to model • More info on comparative modeling • http://speedy.embl-heidelberg.de/gtsp/ • http://www.cmbi.kun.nl/gv/course/index.html • http://www.umass.edu/microbio/chime/explorer/homolmod.htm OL