480 likes | 636 Views
Automated High-Resolution Protein Structure Determination using Residual Dipolar Couplings. Anna Yershova Department of Computer Science Duke University February 5, 2010. Feb 5 2010, NC State University. Automated Protein Structure Determination using RDCs. Introduction. Motivation.
E N D
Automated High-Resolution Protein Structure Determination using Residual Dipolar Couplings Anna Yershova Department of Computer Science Duke University February 5, 2010 Feb 5 2010, NC State University Automated Protein Structure Determination using RDCs
Introduction Motivation Protein Structure Determination is Important Amino acid sequences Structures Functions Protein redesign • High-resolution structures are needed for: • Determining protein functions • Protein redesign 2
Introduction Motivation 1 2 3 4 What is Protein Structure: Primary Structure The sequence of amino acids forms the backbone.Residues are sidechains attached to the backbone. 3 Dihedral angle Side chain Amino acid
Introduction Motivation What is Protein Structure: Secondary Structure Elements Local folding is maintained by short distance interactions. 4
Introduction Motivation What is Protein Structure: 3D Fold Global 3D folding is maintained by more distant interactions. Alpha-helix Side chain Loop Beta-strands 5
Introduction Motivation High-Throughput Structure Determination Is Important The gap between sequences and structures http://www.metabolomics.ca/News/lectures/CPI2008-short.pdf 6
Introduction Motivation Current Approaches for Structure Determination • X-ray crystallography • Difficulty: growing good quality crystals • Nuclear Magnetic Resonance (NMR) spectroscopy • Difficulty: lengthy (expensive) time in processing and analyzing experimental data Both require expressing and purifying proteins. 7
Introduction Motivation Bruce Donald’s Lab • Michael Zeng Chittu Tripathy • Lincong Wang Pei Zhou Bruce Donald Cheng-Yu Chen John MacMaster 8
Introduction Motivation Types of NMRSpectroscopy Data 4.2 R Ha NOE 133.1 172.1 B0 8.9 • Chemical shift (CS) • Unique resonance frequency, serves as an ID • Nuclear Overhauser effect (NOE) • Local distance restraint between two protons • Residual dipolar coupling (RDC) • Global orientational restraint for bond vectors 9
Introduction Motivation Resonance Assignment Problem Assigning chemical shifts to each atom 10 http://www.pnas.org/content/102/52/18890/suppl/DC1 Bailey-Kellogg et al., 2000, 2004
Introduction Motivation NOE Assignment Problem Obtain local distance restraints between protons A famous bottleneck 11 Bailey-Kellogg et al., 2000, 2004
Introduction Motivation . . . a1 a2 a3 an . . . a1 4 3 . . . a2 4 ? . . . a3 3 ? . . . . . . . . . . . . . . . an Structure Determination from NOEs NOESY spectrum Resonance assignments NOE assignment Assignment Ambiguity Distance Geometry NP-Hard [Saxe ’79; Hendrickson ’92, ’95] 12
Introduction Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination Protocol A famous bottleneck 13
Introduction Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination Protocol error propagation local minima manual intervention for initial fold and for evaluation of NOE assignments A famous bottleneck Can we have a poly-time algorithm using orientational restraints? • Yes: Wang and Donald, 2004; Wang et al, 2006 14
Introduction Motivation Types of NMRSpectroscopy Data 4.2 R Ha NOE 133.1 172.1 B0 8.9 • Chemical shift (CS) • Unique resonance frequency, serves as an ID • Nuclear Overhauser effect (NOE) • Local distance restraint between two protons • Residual dipolar coupling (RDC) • Global orientational restraint for bond vectors 15
Background RDCs Szz Syy Sxx v D RDC Equation for a Single Bond Alignment medium b B0 v a S – Saupe Matrix S is traceless and symmetric S contains 5 dofs 16
Introduction Motivation Resonance assignments NOESY spectra SA/MD Initial fold NOE Assignments XPLOR-NIH Structure Refinement RDCs NOE Assignments 3D Structures Protein Structure Determination is Hard Traditional Structure Determination VS RDC-Panda RDC-PANDA Protocol Constaint number of NOEs RDCs error propagation RDC-ANALYTIC PACKER local minima GlobalFold manual intervention for initial fold and for evaluation of NOE assignments Sidechain Placement NOE Assignments XPLOR-NIH NOEAssignments3DStructures 17 Zeng et al. (Jour. Biomolecular NMR,2009)
Introduction Motivation Importance of Backbone Structure Determination Global orientational restraints from RDCs Sparce data (high-throughput, large proteins, membraine proteins) Compute initial fold using exact solutions to RDC equations Avoid the NP-Hard problem of structure determination from NOEs Resolve NOE assignment ambiguity Automated side-chain resonance assignment 18
Introduction Motivation Current Limitations of RDC-Panda Because it requires only 2 RDCs per residue: • Only SSE elements can be reliably determined, NOEs are needed to determine structure of loops • Difficulty in handling missing data 19
Introduction Motivation My Current Project • Improve current protein structure determination techniques from our lab • Design new algorithms for protein backbone structure determination using orientational restraints from RDCs 20
Distance geometry based structure determination Braun, 1987 Crippen and Havel, 1988 More and Wu, 1999 Heuristic based structure determination Brünger, 1992 Nilges et al., 1997 Güntert, 2003 Rieping et al., 2005 RDC-based structure determination Tolman et al., 1995 Tjandra and Bax, 1997 Hus et al., 2001 Tian et al., 2001 Prestegard et al., 2004 Wang and Donald (CSB 2004) Wang and Donald (Jour. Biomolecular NMR, 2004) Wang, Mettu and Donald (JCB 2005) Donald and Martin (Progress in NMR Spectroscopy, 2009 ) Ruan et al., 2008 Zeng et al. (Jour. Biomolecular NMR,2009) Introduction Motivation Literature Overview • Heuristic based automated NOE assignment • Mumenthaler et al., 1997 • Nilges et al., 1997, 2003 • Herrmann et al., 2002 • Schwieters et al., 2003 • Kuszewski et al., 2004 • Huang et al., 2006 • Automated NOE assignment starting with initial fold computed from RDCs • Wang and Donald (CSB 2005) • Zeng et al. (CSB 2008) • Zeng et al. (Jour. Biomolecular NMR,2009) • Automated side-chain resonance assignment • Li and Sanctuary, 1996, 1997 • Marin et al., 2004 • Masse et al., 2006 • Zeng et al. (In submission, 2009) 21
Background RDCs Szz Syy Sxx v D RDC Equation for a Single Bond Linear in S, A fixed v defines a hyperplane Quadratic in v, A fixed S defines a hyperboloid S 22
Background RDCs RDC Equation for a Single Bond 1 RDC equation defines a collection of hyperplanes, 7 variables Linear in S, A fixed v defines a hyperplane Quadratic in v, A fixed S defines a hyperboloid S 23
Background RDCs 1 2 3 4 RDC Equations for a Protein Portion 24
Background RDCs RDC Equations for a Protein Portion 1 2 3 4 u1 v1 v2 [1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004. [2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009. Too few equations, too many variables! 25
Background RDCs Forward Kinematics Reduces the Number of Variables v1 Fix coordinate system. v2 u1 26
Background RDCs RDC Equations for a Protein Portion v1 v2 u1 27
Background RDCs RDC Equations for a Protein Portion Recursive representation is possible! 28
Background RDCs One Equation Per Dihedral Angle is Not Enough! • Each equation is linear in S, and quartic in either tan() or tan() • To be able to solve this system there must be additional information: • Possible scenarios: • Additional RDC measurement(s) for each dihedral angle. • Additional alignment media. • Additional NOE data. • Modeling (Ramachandran regions, steric clashes, energy function) • Sampling (for alignment tensors) 29
Background RDC-Panda The RDC-PANDA Structure Determination Package • Current requirements • 2 RDCs per residue to obtain SSE structures • Sparse NOEs to pack the SSEs • Current bottlenecks • Missing data (even in long SSEs) • Long loops • Sampling for computing alignment tensor(s) • Sampling for the orientation of the first pp [1] L. Wang and B. R. Donald. J. Biomol. NMR, 29(3):223–242, 2004. [2] J. Zeng, J. Boyles, C. Tripathy, L. Wang, A. Yan, P. Zhou, and B. R. Donald J. Biomol. NMR, [Epub ahead of print] PMID:19711185, 2009. 30
Background RDC-Panda When Saupe Matrix is Known Solution Can Be Found Exactly! Ellipse equations for CH bond vector Wang & Donald, 2004; Donald & Martin, 2009.
Background RDC-Panda Solution Structure of FF Domain 2 of human transcription elongation factor CA150 (FF2) using RDC-PANDA Solution Structure Deposited Using RDC-Panda PDB ID: 2KIQ In collaboration with Dr. Zhou’s Lab 32
Current Project Problem Formulation: NH, CH RDCs in 2 Media We require measurements for at least 9 consecutive bond vectors (4.5 residues) in 2 media. The goal is to handle more equations and errors. 33
Current Project Relationship to Minimization 34
Current Project b A s Relationship to Minimization and SVD Solving an over constrained system of linear equations is equivalent to finding a projection of the b vector on the A hyperplane. This is also equivalent to minimizing the least square function of the terms. 35
Current Project Relationship to Minimization 36
Current Project Relationship to Minimization and SVD b A(i i) s Solving such a system of non-linear equations is not trivial! There are multiple local minima in the corresponding minimization problem. 37
Current Project Advantages • If the minimization problem is solved then • Computation of packed SSEs and loops is possible without additional NOE data. • Saupe matrices for each of the alignment medium can be computed without sampling. • Robust handling of missing values 38
Current Project The Algorithm: Initialization Using Helix Initialize(i,i) for a helix Compute initial approximation for Si using SVD Compute (i,i) using tree search and minimization Update Si using SVD 39
Current Project The Algorithm: Protein Portion Initialize Si to computed approximations Compute (i,i) using tree search and minimization Update Si using SVD 40
Current Project The Algorithm: Computing Dihedrals 1 Minimize each of the RMSD terms as a univariate function. ψ1 x x n x ψn Iteratively minimize the RMSD function x Compute the list of best solutions. 41
Current Project Advantages • The algorithm is converging, since every step minimizes RMSD function • If the data was “perfect” then the solution to the minimization problem would be the roots of the polynomials in the RMSD terms, and the algorithm would find ALL of them. • The minima of the RMSD terms give a good collection of initial structures for finding local and global minima • Robust handling of missing values 42
Preliminary Results Preliminary Results: Ubiquitin Helix Conformation of the portion [25-31] of the helix for human ubiquitin computed using NH and CH RDCs in two media (red) has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ) (green). The backbone RMSD is 0.58 Å. 43
Preliminary Results Preliminary Results: Ubiquitin Strand Conformation of the portion [2-7] of the beta-strand for human ubiquitin computed using NH and CH RDCs in two media has been superimposed on the same portion from high-resolution X-ray structure (PDB Id: 1UBQ). The backbone RMSD is 1.151 Å. 44
Conclusions • Complete and exhaustive search over the space of all structures minimizing the RDC fit function seems feasible due to understanding the structure of the solution. • Possible and exiting extensions to more/different data Funding: NIH Thank you! 45
Comparison Sparse Accuracy: Data requirements vs. Accuracy (Ubiquitin): 46