460 likes | 776 Views
6. Homology Modeling. Prediction of structure from sequence. Flowchart. Comparison of query sequence to nr database. Similar to a sequence of known structure ?. No. Yes. Fold Recognition (Threading). Homology Modeling (Comparative Modeling). Fits a known fold ?. Yes. No.
E N D
Prediction of structure from sequence Flowchart Comparison of query sequence to nr database Similar to a sequence of known structure? No Yes Fold Recognition (Threading) Homology Modeling (Comparative Modeling) Fits a known fold? Yes No Ab initio prediction
Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)
Wrong side chain conformations Small backbone deviations Wrong loop modeling Wrong alignment Wrong template Errors in comparative modeling (Marti-Renom & Sali, 2000)
Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)
Sequence-structure identity depends on length of protein No dissimilar pairs above the threshold line Sander & Schneider, 1991
Template matching • Given: sequence • Wanted: • structural template • sequence-structure alignment • Easiest approach: • Blast / Psiblast against sequences with known structure • select template based on sequence identity • >70%: straight forward • ~40-50%: usually clear • lower seqid: alignment is challenge • State of the art protocols include • more sophisticated searches • additional information for improved template selection • Profile-profile comparison (HHSEARCH) • Seq-structure compatibility (Threading: RAPTOR) Alignment step is critical
Sequence-sequence alignment • Information content: • Sequence • Profile (Position specific scoring matrix -PSSM) aa preferences for each position • Hidden Markov Models (HMM) Contains in addition position-specific in/del penalties M match D deletion I insertion
Sequence-sequence alignment • Information content: • Sequence-sequence comparison • e.g. BLAST • Profile-sequence comparison • e.g. PSI-BLAST • Profile-profile comparison • e.g. LAMA, PROF_SIM, COMPASS • HMM-HMM comparison • e.g. HHSEARCH More information – increased sensitivity in detecting template
No new folds & superfamilies lately -> template available for everyone • SCOP Folds • # of folds • # of new folds No new folds in the last years!! # of unique folds ~1400 folds Year • SCOP Superfamilies • # of superfamilies • # of new superfamilies No new superfamilies in the last years!! # of unique superfamilies ~2300 superfamilies Year Many sequences – few folds: How can I detect my fold?
E C C C C A A A A 4 D Eab A C D E ….. A -3 -1 0 0 .. C -1 -4 1 2 .. D 0 1 5 6 .. E 0 2 6 7 .. . . . . . 3 10 5 2 9 6 1 8 7 Additional ways to include structural information: Threading Evaluate compatibility of sequence with fold, based on pairwise residue potentials • Essential components: • structural template • neighbor definition • energy function E = S Eaibj positions i,j ACCECADAAC -3-1-4-4-1-4-3-3=-23
Potential fold Threading (fold recognition): Find best template for given sequence 1) ... 56) ... n) ... ... -10 ... -123 ... 20.5 MAHFPGFGQSLLFGYPVYVFGD...
RAPTOR State of the art threading method of choice • Successful for “low-homology” proteins (few homolog sequences – low entropy in alignment) • State-of-the art threading protocol: uses linear programming to efficiently find best seq-str threading (linear combination of regression trees) • Optimizes use of several templates http://raptorx.uchicago.edu/ Jian Peng and Jinbo Xu. RaptorX: exploiting structure information for protein alignment by statistical inference. PROTEINS, 2011; A multiple-template approach to protein threading. PROTEINS, 2011.
Combine sequence-structure and sequence-sequence comparisons • Example 1: GENTHREADER (ANN) How likely are 2 aas to be neighbors?? How likely is aa to be buried/exposed??
Combine sequence and structure for template selection Example 2: HHSEARCH*: • Based on hidden markov models (HMM) • Sequence-HMM alignment • Here: extended to HMM-HMM alignment * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
HHSEARCH: HMM-HMM alignment • Formalization: • more sensitive (for hard cases with <20% seqid) than: • Profile-profile comparison • Profile-sequence comparison • Sequence-sequence comparison * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
HHSEARCH includes structural information about template Include secondary structure preference in model: • Score pairs of aligned secondary structure elements with substitution matrix • Query sequence: • Predicted secondary structure • (PSIPRED: H/E/C) with confidence [0..9] DSSP: H = alpha helix E = extended strand B = residue in isolated beta-bridge G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend • Structural template: • Secondary structure • (DSSP: H/E/B/G/I/T/S) 10 x 3 x7substitution values * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951
Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)
Build model • Copy aligned regions from template • Rebuild missing pieces: Model loops • Refine model: add side chains (and minimize; relax)
Build model: Loop modeling Input: • 2 anchors • length of missing residues 2 approaches: • Loop libraries: construct loops from fragments of known structures • Loop closure algorithms • model new conformations • good for longer loops
Fold-trees for loop modeling tasks loop modeling N 1 x 1’ 2 x 2’ C Color – flexible bb Gray – fixed bb Flexible “peptide” edge rigid “peptide” edge 1 1’ 1 1’ rigid “jump” flexible “jump” N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;
Rosetta loop modeling • Define regions that are flexible, and perturb these in a fixed background • Same moves as described in ab initio, but more restricted • Use fold tree architecture: connect take off and landing segment by a jump, cut loop (at defined place, or arbitrarily), apply perturbation, reclose loop • Loop closure: using cyclic coordinate descent (CCD) or kinematic loop closure (KC) • Fragments can be used to improve knowledge-based modeling
Cyclic Coordinate Descend (CCD) closure by moving each joint separately.. Canutescu & Dunbrack,. Protein Sci. 12, 963–972 (2003).
Repeat to obtain several conformations…. Refine, and select best!
Loop closure and degrees of freedom • Over-constrained for <6 DOFs • Under-constrained for >6 DOF: infinite number of solutions. • A molecular loop closure problem with 6 DOF has at most 16 solutions. • Kinematic loop closure allows calculation of analytical solution
Kinematic loop closure Coutsias (2004) From robotics: Analytical solution of loop closure for 6 degrees of freedom Challenge: • find analytical formulation to extract • all possible backbone structures of a chain segment, that are • geometrically consistent with preceding and following parts of the given structure. Setup:
Kinematic loop closure, cont. Solutions aligned to each other aligned to constant part
Kinematic closure (KC) • Analytical solution of loop closure for 6 degrees of freedom • Extension: analytical determination of all mechanically accessible conformations for 6 torsions of a peptide chain of any length (e.g. 25 residues) • (1) Randomly perturb non-pivot positions • (2) Apply KC to pivot positions
Perturbation + KC Loop backbone minimization Kinematic closure (KC) in Rosetta • Embedded into MCM protocol (low-res + high-res) • 720 steps • Repeat 1000 times
Kinematic closure (KC) • Improves median modeling quality from 2.0Å to 0.8ÅRMSD (on set of 25 loops) (CCD)
Improve loop modeling by sampling along Principle Components (PC) of natural variation • Collect loops of a set of homolog templates • Perform Principle Component Analysis (PCA): Collection of loops can be described by a few (3) PCs only • Improves model quality: more similar to the final structure than to template. • Depends on a set of known homolog structures 8 protein structures PCA1 PCA3 PCA2 Qian 2004 PNAS
Free-energy optimization along PC of natural variation: example • Red: model (2.36A RMSD) • Blue: native • Green: refined (1.42A RMSD) Qian 2004 PNAS
Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)
Side chain optimization Side chain optimization+ minimization Backbone optimization Backbone optimization Rosetta: Refine model with relax protocol Same as in last ab initio modeling step*: • Introduce general flexibility • Relax protocol finds near-by minima (within 4-5Å RMSD) vdw repulsive Small backbone moves and MCM * MCM protocol: small & shear moves (120 steps; see lecture 5)
Homology modeling with Rosetta Summary - Basic protocol: • Detect template and align sequence: based on HHSEARCH (alignment of two HMMs) or RAPTOR (Threading) • Define aligned regions and loop regions; copy aligned regions and complete protein structure with loop modeling (with KIC - kinematic loop closure, or CCD) • Refine structure with the “relax” protocol
Improvement over single best target single impressive improvements many targets better than template Reasons multiple templates free modeling refinement CASP - Template-based modeling (TBM) worse better
Blue: Native Green: Baker Model04 Red: Template CASP7: Example for improved TBM with Rosetta (T330) Distance cutoff % of residues aligned
Rosetta in CASP7 & 8: use of several templates improves prediction Templates that produce lower energy structures produce better models
Homology modeling - summary • Homology modeling to high resolution is challenging (~ ab initio modeling) • Today models are already better than the template – GOOD NEWS! • Good alignment and template selection are critical • Sophisticated new approaches have improved homology modeling in recent years • Include additional information during template selection, alignment and refinement