6. Homology Modeling

6. Homology Modeling

Prediction of structure from sequence Flowchart Comparison of query sequence to nr database Similar to a sequence of known structure? No Yes Fold Recognition (Threading) Homology Modeling (Comparative Modeling) Fits a known fold? Yes No Ab initio prediction

Homology modeling 4 steps: • Detect template • Align sequence onto template • Build model (loop modeling) • Refine model (relax)

Wrong side chain conformations Small backbone deviations Wrong loop modeling Wrong alignment Wrong template Errors in comparative modeling (Marti-Renom & Sali, 2000)

Sequence-structure identity depends on length of protein No dissimilar pairs above the threshold line Sander & Schneider, 1991

Template matching • Given: sequence • Wanted: • structural template • sequence-structure alignment • Easiest approach: • Blast / Psiblast against sequences with known structure • select template based on sequence identity • >70%: straight forward • ~40-50%: usually clear • lower seqid: alignment is challenge • State of the art protocols include • more sophisticated searches • additional information for improved template selection • Profile-profile comparison (HHSEARCH) • Seq-structure compatibility (Threading: RAPTOR) Alignment step is critical

Sequence-sequence alignment • Information content: • Sequence • Profile (Position specific scoring matrix -PSSM) aa preferences for each position • Hidden Markov Models (HMM) Contains in addition position-specific in/del penalties M match D deletion I insertion

Sequence-sequence alignment • Information content: • Sequence-sequence comparison • e.g. BLAST • Profile-sequence comparison • e.g. PSI-BLAST • Profile-profile comparison • e.g. LAMA, PROF_SIM, COMPASS • HMM-HMM comparison • e.g. HHSEARCH More information – increased sensitivity in detecting template

No new folds & superfamilies lately -> template available for everyone • SCOP Folds • # of folds • # of new folds No new folds in the last years!! # of unique folds ~1400 folds Year • SCOP Superfamilies • # of superfamilies • # of new superfamilies No new superfamilies in the last years!! # of unique superfamilies ~2300 superfamilies Year Many sequences – few folds: How can I detect my fold?

E C C C C A A A A 4 D Eab A C D E ….. A -3 -1 0 0 .. C -1 -4 1 2 .. D 0 1 5 6 .. E 0 2 6 7 .. . . . . . 3 10 5 2 9 6 1 8 7 Additional ways to include structural information: Threading Evaluate compatibility of sequence with fold, based on pairwise residue potentials • Essential components: • structural template • neighbor definition • energy function E = S Eaibj positions i,j ACCECADAAC -3-1-4-4-1-4-3-3=-23

Potential fold Threading (fold recognition): Find best template for given sequence 1) ... 56) ... n) ... ... -10 ... -123 ... 20.5 MAHFPGFGQSLLFGYPVYVFGD...

RAPTOR State of the art threading method of choice • Successful for “low-homology” proteins (few homolog sequences – low entropy in alignment) • State-of-the art threading protocol: uses linear programming to efficiently find best seq-str threading (linear combination of regression trees) • Optimizes use of several templates http://raptorx.uchicago.edu/ Jian Peng and Jinbo Xu. RaptorX: exploiting structure information for protein alignment by statistical inference. PROTEINS, 2011; A multiple-template approach to protein threading. PROTEINS, 2011.

Combine sequence-structure and sequence-sequence comparisons • Example 1: GENTHREADER (ANN) How likely are 2 aas to be neighbors?? How likely is aa to be buried/exposed??

Combine sequence and structure for template selection Example 2: HHSEARCH*: • Based on hidden markov models (HMM) • Sequence-HMM alignment • Here: extended to HMM-HMM alignment * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

HHSEARCH: HMM-HMM alignment • Formalization: • more sensitive (for hard cases with <20% seqid) than: • Profile-profile comparison • Profile-sequence comparison • Sequence-sequence comparison * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

HHSEARCH includes structural information about template Include secondary structure preference in model: • Score pairs of aligned secondary structure elements with substitution matrix • Query sequence: • Predicted secondary structure • (PSIPRED: H/E/C) with confidence [0..9] DSSP: H = alpha helix E = extended strand B = residue in isolated beta-bridge G = 3-helix (3/10 helix) I = 5 helix (pi helix) T = hydrogen bonded turn S = bend • Structural template: • Secondary structure • (DSSP: H/E/B/G/I/T/S) 10 x 3 x7substitution values * Söding. Protein homology detection by HMM-HMM comparison. Bioinformatics (2005) 21: 951

Build model • Copy aligned regions from template • Rebuild missing pieces: Model loops • Refine model: add side chains (and minimize; relax)

Build model: Loop modeling Input: • 2 anchors • length of missing residues 2 approaches: • Loop libraries: construct loops from fragments of known structures • Loop closure algorithms • model new conformations • good for longer loops

Fold-trees for loop modeling tasks loop modeling N 1 x 1’ 2 x 2’ C Color – flexible bb Gray – fixed bb Flexible “peptide” edge rigid “peptide” edge 1 1’ 1 1’ rigid “jump” flexible “jump” N: N-terminal; C: C-terminal; X: chain break; O: root of the tree;

Rosetta loop modeling • Define regions that are flexible, and perturb these in a fixed background • Same moves as described in ab initio, but more restricted • Use fold tree architecture: connect take off and landing segment by a jump, cut loop (at defined place, or arbitrarily), apply perturbation, reclose loop • Loop closure: using cyclic coordinate descent (CCD) or kinematic loop closure (KC) • Fragments can be used to improve knowledge-based modeling

Cyclic Coordinate Descend (CCD) closure by moving each joint separately.. Canutescu & Dunbrack,. Protein Sci. 12, 963–972 (2003).

..to maximally approach end

Repeat to obtain several conformations…. Refine, and select best!

Loop closure and degrees of freedom • Over-constrained for <6 DOFs • Under-constrained for >6 DOF: infinite number of solutions. • A molecular loop closure problem with 6 DOF has at most 16 solutions. • Kinematic loop closure allows calculation of analytical solution

Kinematic loop closure Coutsias (2004) From robotics: Analytical solution of loop closure for 6 degrees of freedom Challenge: • ﬁnd analytical formulation to extract • all possible backbone structures of a chain segment, that are • geometrically consistent with preceding and following parts of the given structure. Setup:

Kinematic loop closure, cont. Solutions aligned to each other aligned to constant part

Kinematic closure (KC) • Analytical solution of loop closure for 6 degrees of freedom • Extension: analytical determination of all mechanically accessible conformations for 6 torsions of a peptide chain of any length (e.g. 25 residues) • (1) Randomly perturb non-pivot positions • (2) Apply KC to pivot positions

Perturbation + KC Loop backbone minimization Kinematic closure (KC) in Rosetta • Embedded into MCM protocol (low-res + high-res) • 720 steps • Repeat 1000 times

Kinematic closure (KC) • Improves median modeling quality from 2.0Å to 0.8ÅRMSD (on set of 25 loops) (CCD)

Improve loop modeling by sampling along Principle Components (PC) of natural variation • Collect loops of a set of homolog templates • Perform Principle Component Analysis (PCA): Collection of loops can be described by a few (3) PCs only • Improves model quality: more similar to the final structure than to template. • Depends on a set of known homolog structures 8 protein structures PCA1 PCA3 PCA2 Qian 2004 PNAS

Free-energy optimization along PC of natural variation: example • Red: model (2.36A RMSD) • Blue: native • Green: refined (1.42A RMSD) Qian 2004 PNAS

Side chain optimization Side chain optimization+ minimization Backbone optimization Backbone optimization Rosetta: Refine model with relax protocol Same as in last ab initio modeling step*: • Introduce general flexibility • Relax protocol finds near-by minima (within 4-5Å RMSD) vdw repulsive Small backbone moves and MCM * MCM protocol: small & shear moves (120 steps; see lecture 5)

Homology modeling with Rosetta Summary - Basic protocol: • Detect template and align sequence: based on HHSEARCH (alignment of two HMMs) or RAPTOR (Threading) • Define aligned regions and loop regions; copy aligned regions and complete protein structure with loop modeling (with KIC - kinematic loop closure, or CCD) • Refine structure with the “relax” protocol

Improvement over single best target single impressive improvements many targets better than template Reasons multiple templates free modeling refinement CASP - Template-based modeling (TBM) worse better

Blue: Native Green: Baker Model04 Red: Template CASP7: Example for improved TBM with Rosetta (T330) Distance cutoff % of residues aligned

Rosetta in CASP7 & 8: use of several templates improves prediction Templates that produce lower energy structures produce better models

Homology modeling - summary • Homology modeling to high resolution is challenging (~ ab initio modeling) • Today models are already better than the template – GOOD NEWS! • Good alignment and template selection are critical • Sophisticated new approaches have improved homology modeling in recent years • Include additional information during template selection, alignment and refinement

6. Homology Modeling

6. Homology Modeling

Presentation Transcript

Homology Modeling via Protein Threading

Homology Modeling Workshop

Homology modeling

Homology Modeling

Homology Modeling

Predicting 3D Protein Structure using Homology Modeling

Homology modeling with SWISS-MODEL

Applications of Homology Modeling

Applications of Homology Modeling

Protein structure and homology modeling

Lab 9.3a: Homology Modeling

Homology modeling of G protein-coupled receptors

Homology

Homology Modeling

an introduction for Homology Modeling

Homology modeling in short…

Homology Modeling

Structure prediction: Homology modeling