1 / 20

Tolga Can and Yuan-Fang Wang

CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features. Tolga Can and Yuan-Fang Wang. Introduction. Importance of discovering structural relationships between proteins Structural Alignment: NP-Hard

tod
Download Presentation

Tolga Can and Yuan-Fang Wang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CTSS: A Robust and Efficient Method for Protein Structure Alignment Based on Local Geometrical and Biological Features Tolga Can and Yuan-Fang Wang

  2. Introduction • Importance of discovering structural relationships between proteins • Structural Alignment: NP-Hard • Protein structure representation: no standard as in sequence alignment • Many algorithms • Inter-atomic Distances (CE, DALI) • SSE vectors (VAST, 3D-Lookup) • Different similarity measures • RMSD, p-value, etc.

  3. 1l3l:C 2spc:A 1jig:A 1fse:A 1jig:A 1jek:B 1alu:_ 1l3l:C 1kzu:B 1k61:D 1wdc:A 1nkd:_ 1fmh:A 1gl2:A 1et1:A 1nkd:_ 1kzu:B Problem Definition • Given a protein structure, find similar protein structures from a database of protein structures. ? =

  4. Protein Structure? We use Cα coordinates to represent the protein structure. HEADER PHEROMONE 20-DEC-95 2ERL .................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA .................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318 .................................. PDB File

  5. Protein Structure The Cα coordinates of a protein define a curve in 3D space. HEADER PHEROMONE 20-DEC-95 2ERL .................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA .................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318 .................................. PDB File

  6. Spline Approximation We smooth the Cα curve based on secondary structure information. HEADER PHEROMONE 20-DEC-95 2ERL .................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA .................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318 .................................. PDB File

  7. Turn Spline Approximation We smooth the Cα curve based on secondary structure information. HEADER PHEROMONE 20-DEC-95 2ERL .................................. SEQRES 1 40 ASP ALA CYS GLU GLN ALA .................................. ATOM 1 N ASP 1 -1.115 8.537 7.075 ATOM 2 CA ASP 1 -1.925 7.470 6.547 ATOM 3 C ASP 1 -2.009 6.333 7.522 ATOM 4 O ASP 1 -1.467 6.394 8.624 ATOM 5 CB ASP 1 -1.526 6.993 5.163 ATOM 6 N ALA 2 -2.745 5.280 7.165 ATOM 7 CA ALA 2 -2.945 4.152 7.987 ATOM 8 C ALA 2 -1.606 3.448 8.305 ATOM 9 O ALA 2 -1.440 3.010 9.454 ATOM 10 CB ALA 2 -3.966 3.256 7.436 ATOM 11 N CYS 3 -0.777 3.267 7.329 ATOM 12 CA CYS 3 0.570 2.624 7.511 ATOM 13 C CYS 3 1.328 3.308 8.626 ATOM 14 O CYS 3 1.802 2.679 9.562 ATOM 15 CB CYS 3 1.351 2.667 6.209 ATOM 16 SG CYS 3 2.981 1.901 6.318 .................................. PDB File Helix

  8. Matching Two Curves Are they similar?

  9. Curvature and Torsion • Curvature: • Torsion: Measure of how far the curve deviates from being planar Measure of how far the curve deviates from being linear • Fundamental Theorem of Space Curves: If two single-valued continuous functions (s)and (s) are given for s > 0, then there exists exactly one space curve, determined except for orientation and position in space (i.e., up to a Euclidian motion), where s is the intrinsic arc length,  is the curvature, and is the torsion.

  10. Curvature and Torsion • They are invariant to rotation and translation. • They are localized. Curvature Torsion

  11. Feature Extraction • For each amino acid a (Curvature, Torsion) tuple is computed and Secondary Structure assignment information from PDB web site is gathered • This constitutes a 3D feature vector of length n, where n is the number of amino acids in the protein Torsion Curvature + Secondary Structure Information (3rd dimension not shown above)

  12. Indexing the Features • Why is indexing necessary? • Hash Table (show in 2D below, 3rd Dimension is the SSE type) A Hash Bin Torsion Curvature

  13. Query Execution • Hierarchical approach: • Pruning before detailed pairwise alignment hash table • Accumulate vote • voteprotein++ • Normalize vote • voteprotein/lengthprotein • Threshold

  14. Gap Query Execution • Pairwise alignment by Smith-Waterman dynamic programming technique performed after screening process: Distance Matrix SW 1l3l:C length:63 RMSD:1.61 Ao 1fse:A

  15. SW Alignment Result 1fse:A 1l3l:C

  16. Sample Query Results • Query: 1faz:A, database: 1938 protein chains • Screening time: 18 seconds • Pairwise Alignment time: 29 seconds 1faz:A & 1ytf:D 1faz:A & 1dj7:A length:38 RMSD:3.68 Ao length:42 RMSD:2.8 Ao

  17. Sample Query Results • Query: 1b16:A, database: 1938 protein chains • Screening time: 25 seconds • Pairwise Alignment time: 68 seconds 1b16:A & 1h05:A 1b16:A & 1qp8:A length:35 RMSD:3.26 Ao length:35 RMSD:1.58 Ao

  18. Current and Future Work • Evaluation of • Accuracy • Comparison with SCOP classification • Efficiency • Comparison with other techniques like CE, or DALI • Better index structures • Faster and more accurate screening of candidates • Incorporating biological, chemical properties of amino acids to the structure signatures of proteins.

  19. Conclusions • A new method for protein structure alignment is presented: • Extracted structural features are: • Compact: O(n) • Localized: computed for each amino acid • Robust: error handling by spline approximation • Invariant: suitable for indexing • Meaningful: Biological, chemical properties can be incorporated easily • An indexing technique is deployed to avoid exhaustive scan of the structure database • Experiment results show that this method is suitable for finding structural motifs.

  20. Thank you for your attention! For More Information: Tolga Can Department of Computer Science University of California at Santa Barbara Santa Barbara, CA 93106, U.S. Email: tcan@cs.ucsb.edu URL: http://www.cs.ucsb.edu/~tcan/CTSS/

More Related