720 likes | 879 Views
Alignment of Flexible Molecular Structures. Motivation. Proteins are flexible. One would like to align proteins modulo the flexibility. Hinge and sh ear protein domain motions (Gerstein, Lesk , Chotia). Conformational flexibility in drugs. Motivation.
E N D
Alignment of Flexible Molecular Structures
Motivation • Proteins are flexible. One would like to align proteins modulo the flexibility. • Hinge and shear protein domain motions (Gerstein, Lesk , Chotia). • Conformational flexibility in drugs.
Flexible protein alignment without prior hinge knowledge FlexProt - algorithm • detects automatically flexibility regions • exploits amino acid sequence order
Task:largest flexible alignment by decomposing the two molecules into a minimal number of rigid fragment pairs having similar 3-D structure.
FlexProt Main Steps Detection of Congruent Rigid Fragment Pairs Joining Rigid Fragment Pairs Rigid Structural Comparison Clustering (removing ins/dels)
Congruent Rigid Fragment Pair Structural Similarity Matrix
k+l-1 k t+l-1 t Fragkt(l) = vk…vi ...vk+l-1 wt…wj…wt+l-1 RMSD (Fragkt(l) ) < e Detection of Congruent Rigid Fragment Pairs i-1 i+1 i j-1 j+1 j vi-1vivi+1 wj-1 wjwj+1
FlexProt Main Steps Detection of Congruent Rigid Fragment Pairs Joining Rigid Fragment Pairs Rigid Structural Comparison Clustering (removing ins/dels)
Graph Representation Graph Node Graph Edge
Graph Representation • The fragments are in ascending order. • The gaps (ins/dels) are limited. • Allow some overlapping. W a b +Size of the rigid fragment pair (node b) - Gaps (ins/dels) - Overlapping Penalties
W_k W_m W_n W_t W_i Graph Representation • DAG (directed acyclic graph)
W_k W_m W_n W_t W_i • “Single-source shortest paths” • O(|E|+|V|)
FlexProt Main Steps Detection of Congruent Rigid Fragment Pairs Joining Rigid Fragment Pairs Rigid Structural Comparison Clustering (removing ins/dels)
Clustering (removing ins/dels) T1 T2 If joining two fragment pairs gives small RMSD (T1 ~ T2) then put them into one cluster.
FlexProt Main Steps Detection of Congruent Rigid Fragment Pairs Joining Rigid Fragment Pairs Rigid Structural Comparison Clustering (removing ins/dels)
Rigid Structural Comparison
Multiple Structural Alignment Schemes • Linear progressive. Starts with one object and successively compares the other objects to the results. • Tree progressive. The alignment is created according to a similarity tree. The alignment direction is from the leaves to the tree root. • Gerstein and Levitt 1998. • Orengo and Taylor 1994. SSAPm method. • Sali and Blundell 1990 • Russell and Barton 1992 • Ding et al. 1994
Multiple Structural Alignment Schemes • Pivot. Uses one object as the pivot and compares it to all other objects. The results are then analyzed to find the common similarities. • Leibowitz, Fligelman, Nussinov, and Wolfson 1999. Geometric Hashing technique. • Escalier, Pothier, Soldano, Viari 1998. Exploits all common substructures.
Multiple Structural Alignment Schemes • Optimization Techniques. • Guda, Scheeff, Bourne, Shindyalov.Monte Carlo optimization.
Previous Work – Multiple Structural Alignment • Disadvantages: • Most methods do not detect partial solutions. • The methods which detect partial solutions are not efficient for a large number of molecules.
Partial Solutions B • Detection of local similarities. • Detection of subset of molecules that share some local structural pattern. A A B is harder to detect than A A B
Multiple-LCP is NP-hard even in one dimensional space for the case of exact congruence (Akutsu 2000). • 3-D + ε-congruence more complex problem Largest Common Point Set (LCP) Given two point sets detect the largest common sub-set. [exact congruence or ε-congruence]
Solution Space • The number of solutions, which answer the minimal criteria, could be exponential. α-1 α-2 α-3 3•2•3 kM α-1 α-2 α-1 α-2 α-3
Partial Multiple-LCP Detect t largest alignments between exactly k molecules. We are interested in above solutions for each k, 2 k m.
MultiProt /home/silly6/mol/demos/MultiProt/ • Non-predefined Pattern detection. • Partial Solutions. • Time Efficient – • 5 protein in 14 seconds • 20 proteins (~500 a.a.) in 10 minutes • 50 proteins (~200 a.a.) in 19 minutes • [PentiumII 500MHz 512Mb memory]
α-1 α-2 α-3 α-1 α-1 α-1 α-2 α-2 α-2 α-1 α-1 α-1 α-2 α-2 α-2 α-3 α-3 α-3
Algorithm Features • Assumption: any multiple alignment of proteins should align, at least short, contiguous fragments (minimum 3 points) of input points. • Reduction of solution space: The aligned contiguous fragments are of maximal length. • All (almost, because of ε-congruence) possible solutions (transformations) are detected (optimal solutions are ‘hard’ to select).
Multiple Alignment with Pivot Input: Pivot Molecule: Mp (participates in all solutions) Set of Molecules: S`=S\{Mp } Error Threshold: ε • Detect all possibly aligned fragments of maximal length between the input molecules (chance to detect subtle similarities). • Select solutions that give high scoring global structural similarity. • Iterate over all possible pivots, Mp = M1… Mm
Bio-Core Detection • Geom. + Bio. Constraints • Classification: • hydrophobic (Ala, Val, Ile, Leu, Met, Cys) • polar/charged(Ser, Thr, Pro, Asn, Gln, Lys, Arg, His, Asp, Glu) • aromatic(Phe, Tyr, Trp) • glycine(Gly) Or any other scoring matrix!
Partial Solution Detection B 1adj 1hc7 1qf6 1ati A Task to detect A and B B x A z y A B A B
Domain A ranked first(142 matched atoms) • Domain B ranked eight’th(85 matched atoms)
B A Application to G proteins A
Substrate assisted catalysis – application to G proteins Substrate assisted catalysis – application to G proteins. Mickey Kosloff and Zvi Selinger, TRENDS in Biochemical Sciences Vol.26 No.3 March 2001 161
Aspects of Structural Comparison • A large number of structures (hundreds) – Molecular Dynamics. • Structural flexibility – proteins are not rigid structures. • Structure representation – • C-alpha atoms are suitable for comparisons of folds. • Detection of similar function requires different representation. This brings another problem – side chain flexibility. • Sequence order in structural alignment. • Detection of active sites might require different approach. Proteins with different folds might provide the same function. • Statistical Significance • Measure of geometrical similarity (RMSD, bottleneck, …), biological scoring function.