310 likes | 415 Views
BioInfoSummer07 ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia. CLe PAPS : Fast P air A lignment of P rotein S tructures Based on C onformational Le tters. Sheng WANG, Wei-Mou ZHENG*
E N D
BioInfoSummer07ICE-EM Summer Symposium in BioInformatics , Dec 10-14, Australian National University . Canberra, Australia CLePAPS: Fast Pair Alignment of Protein Structures Based on Conformational Letters Sheng WANG, Wei-Mou ZHENG* Institute of Theoretical Physics, CAS zheng@itp.ac.cn *To whom correspondence should be addressed
Outline • [1] Introduction • [2] The flow chart of CLePAPS Algorithm • [2-1] Find SFPs by CLeSUM • [2-2] Construct ‘Star-Tree’ • [2-3] The ‘Zoon-In’ Strategy • [3] Result & Discussion
Chapter[1] : Introduction Page 1 (Chp1) • Structure alignment --- a self-consistent problem • Correspondence Rigid transformation • However, when aligning two protein structures, at the beginning we know neither the transformation nor the correspondence. • DALI, CE • VAST • STRUCTAL, ProSup • CLePAPS: Conformational Letters based Pairwise Alignment of Protein Structures • Initialization + iteration • Similar Fragment Pairs (SFPs); • Anchor-based; • Alignment = As many consistent SFPs as possible
Chapter[1] : Introduction Page 2 (Chp1) SFPs Anchor-based superposition consistent Anchor SFP inconsistent Alignment = Collect as many consistent SFPs as possible
Chapter[1] : Introduction Page 3 (Chp1) Align Structure Alignment => a self-consistent problem ProteinA ProteinB Initial correspondence (Anchor SFP) Optimal transformation for the correspondence No Convergence? Yes End Correspondence update (adding consistent SFPs)
Chapter[1] : Introduction Page 4 (Chp1) Four Main Problems [1] How can we find SFPs as fast as possible? [2] How can we balanceSpecificity and Sensitivity of the found SFPs? [3] How can we avoid a start? [4] How can we haste the convergence while not to be Local Traped? LOCAL TRAP
An example of LOCAL TRAP
Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Find SFPs By CLeSUM Initial correspondence (Select an Optimal Anchor SFP) SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Part_I: SFP Part_II: ‘Star-Tree’ Optimal Anchor SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence) Sensitivity SFP List (width 8) d2 blank-filling Second Update Part_III: ‘Zoom-In’ d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment
Chapter[2-1] : Find SFPs by CLeSUM Page 6 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP(Similarity Fragment Pair) CLeSUM (Conformational Letter SUbstitution Matrix )
Chapter[2-1] : Find SFPs by CLeSUM Page 7 (Chp2) The main difference of CLePAPS from other existing algorithms for structure alignment is the use of Conformational Letters. Conformational letters = discretized states of 3D segmental conformations. A letter = a cluster of combinations of three angles formed by Capseudobonds of four contiguous residues. (obtained by clustering according to the probability distribution.) Fig.1 Centers of 17 conformational letters
Chapter[2-1] : Find SFPs by CLeSUM Page 8 (Chp2) Similarity between conformational letters CLeSUM: Conformational Letter SUbstitution Matrix typical helix evolutionary + geometric typical sheet Mij = 20* log 2 (Pij/PiPj) ~ BLOSUM83, H ~ 1.05 constructed using FSSP representatives.
Chapter[2-1] : Find SFPs by CLeSUM Page 9 (Chp2) • SFP => highly scored string pair • Fast search for SFPs by string comparison • CLESUM similarity score importance of SFPs • Guided by CLESUM scores, only the top few SFPs need to be examined • to determine the superposition for alignment, and hence a reliable greedy strategy becomes possible. Example similar Protein A seed Protein B (smaller)
1cewI 1molA An example of Find SFP Align To find SFP , we take the shorter sequence as template , and record every pair position which score is higher than the threshold , the fragment is at a given length seed >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR >1cewI RRCECECAJGBIHHHHHHHHIIHHHIIGPGBLDFFCPLDPLEEFEDPOLCEEEEEEDEFDEAGCAKLAJGKHHIIMNGKLQQQDEEEDEEEEEBPKKOGEEDPLEEER Similar Fragment Pair (SFP) FEDECCGA OLCEEEEE FEDPLDEQ EEDPLEEE PLDDEEED PLEEFEDP CEDEEEEE EEDEEEEE HHHHHHHH AJGKHHII 1 2 3 4 5 Score rank
Chapter[2-2] : Construct ‘Star-Tree’ Page 10 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP List (width 20) => We create a list of SFP with length 20 and sort them by CLeSUM score Top_K & Top_J ( J > K ) => We only select the Top_K of the list as Anchor SFP and check their consistency use Top_J for neighbor
Chapter[2-2] : Construct ‘Star-Tree’ Page 11 (Chp2) Example Selection of Optimal Anchor SFP Score rank 1 5 4 2 3 Example: Top K, K = 2; Top J,J = 5 1 Anchor SFP 2 Anchor SFP # of consistent SFPs = 4 # of consistent SFPs = 1 Top_1 SFP is globally supported by three other SFPs, while Top_2 SFP is supported only by itself.
1cewI 1molA An example of ‘Star-Tree’ construct Align Top_2 SFP Top_1 SFP Anchor Anchor Consistent # of consistent SFBs = 4 # of consistent SFBs = 1 ‘Star-Tree’ view
Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Top 1(4) Top 2(1) Find SFPs By CLeSUM SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Optimal Anchor SFP Part_I: SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Correspondence update (adding consistent SFPs without Local Trap and to haste the convergence) Sensitivity SFP List (width 8) d2 blank-filling Second Update Part_III: ‘Zoom-In’ d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment
Chapter[2-3] : The ‘Zoon-In’ Strategy Page 12 (Chp2) Find SFPs By CLeSUM Part_I: SFP Hint: SFP List (width 8) => We create a list of SFP with length 8 and sort them by CLeSUM score (descending order) blank-filling => We add consistent SFPs one by one from SFP List (width 8) to update the correspondence
Chapter[2-3] : The ‘Zoon-In’ Strategy Page 13 (Chp2) Example d1 d2 d1 > d2 > d3 。 。 。 8A 6A 5A d3 [1] The first transformation is determined by the Optimal Anchor SFP , so we use a large cutoff d1 to avoid LOCAL TRAP [2] The later transformation is determined by a set of globally consistent SFPs , so we use a lower cutoff to add new consistent SFPs
An example of ‘Zoom-In’ strategy d1 > d2 > d3 。 。 。 8A 6A 5A d1 d2 Fisrt Update Second Update Elongation Shrink Final Alignment Third Update d3
Chapter[2] : The flow chart of CLePAPS Algorithm Page 5 (Chp2) Top 1(4) Top 2(1) Find SFPs By CLeSUM SFP List (width 20) Star-Tree Construct Top K for anchor Top J for neighbor Specificity Optimal Anchor SFP Part_I: SFP Part_II: ‘Star-Tree’ d1 blank-filling First Update Sensitivity SFP List (width 8) d2 blank-filling Second Update d3 blank-filling Third Update Part_III: ‘Zoom-In’ Final Alignment
Chapter[3] : Result & Conclusion Page 14 (Chp3) Four Main Problems CLePAPS ‘s Solution [1] How can we find SFPs as fast as possible? [2] How can we balanceSpecificity and Sensitivity of the found SFPs ? [3] How can we avoid a Local Trap start? [4] How can we haste the convergence while not to be Local Traped ? [1] Fast search for SFPs by merely string comparison [2] Width 20 for Specificity and width 8 for Sensitivity, both sorted by CLeSUM score [3] Optimal Anchor SFP selected through ‘Star-Tree’ [4] Fast ‘Zoom-In’ strategy to convergence only within three times
Chapter[3] : Result & Conclusion Page 15 (Chp3) • The Fischer benchmark test • Database search with CLePAPS • Multi-Solution of alignments: symmetry, domain move, repeats • Non-topological alignment and domain shuffling [pdb:1ihwA] [pdb:1ssoA]
Multi-Solution[1] : Symmetry Red structure fixed [pdb:4fgf] [OGCCFEFAHOGEED] [OGDCEDFAIOGEED] [KGFCEDDAJOGCCC] [pdb:4fgf][pdb:8i1b] Solution [A] Solution [B] Solution [C]
Multi-Solution[2] : Domain Move Blue structure fixed [pdb:2gbp][pdb:2liv] Domain_1 Domain_2 Solution [A] Solution [B]
Multi-Solution[3] : Repeats Blue structure fixed [pdb:4cpv][pdb:1osa] Repeat_1 Repeat_2 Solution [A] Solution [B]
Chapter[3] : Result & Conclusion Page 16 (Chp3) • Conclusion • CLePAPS distinguishes itself from other existing algorithms for pairwise structure alignment in its use of conformational letters. • conformational letters : aptly balance precision with simplicity • CLeSUM: a proper measure of similarity between states • CLeSUM extracted from the database FSSP contains information of structure database statistics, which reduces the chance of accidental matching of two irrelevant helices. evolutionary + geometric = specificity gain • For example, two frequent helices are geometrically very similar, • but their score is relatively low. • CLeSUM similarity score can be used to sort the importance of SFPs for a greedy algorithm. Only the top few SFPs need to be examined.
Chapter[3] : Result & Conclusion Page 17 (Chp3) 1, Fast search for SFPs by merely string comparison 2, Width 20 for specificity + width 8 for sensitivity 3, Optimal Anchor SFP selected by checking consistency 4, Avoid Local Trap by ’zoom-in’ The running time for the 68 pairs of the Fischer benchmark is less than 2% of that of the downloaded CE local version. Next steps 1, BLOMAPS: fast multiple structure alignment; SFPs → Highly Similar Fragment Blocks (HSFBs) 2, Include biochemical information into CLESUM by amino acid clustering. Entropic clustering: AVCFIWLMY (h) + DEGHKNPQRST (p)
Step 1 Step 2 N-Terminal Step 3 C-Terminal Step 1 get four continuous Cα atom Step 2 get two bending angle θ and θ’ and one torsion angle τ Step 3 select the most similar one from the 17 states Step 4 assign the code Step 4 >1molA RRFEDECCGAIHHHHHHHHHHHHHHHOMICQEECBLDFQNBFEEEEFEQNNGCPLDDEEEDEEENOGCEDEEEEEEPKKOGFEDPLDEQBGCCR
θ τ θ’