1.41k likes | 1.71k Views
Protein structure comparison and contact maps. A Protein is a complex molecule with a primary, linear structure (a sequence of aminoacids ) and a 3-Dimensional structure (the protein fold ). Protein STRUCTURE determines its FUNCTION. For instance, the Drug Design problem
E N D
A Protein is a complex molecule with a primary, linear structure (a sequence of aminoacids) and a 3-Dimensional structure (the protein fold). Protein STRUCTURE determines its FUNCTION For instance, the Drug Design problem calls for constructing peptides with a 3D shape complementary to a protein, so as to dock onto it.
Problem: Align two 3D protein structures Motivation: Structure Alignment is Important for: - Discovery of Protein Function (shape determines function) - Search in 3D data bases - Protein Classification and Evolutionary Studies • Assessment of Fold Prediction quality (e.g. CASP) • …..
CONTACT MAPS Unfolded protein
CONTACT MAPS Unfolded protein Folded protein = contacts
CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph
CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph OBJECTIVE:align 3d folds of proteins = align contact maps
Contact Maps are related to fold: Similar folds similar contact maps We studied the problem of determining contact map similarity
Contact Maps are related to fold: Similar folds similar contact maps We studied the problem of determining contact map similarity • In the period 2001-2004 • ------------------------------ • I.P. formulation via Branch & Cut (RECOMB) • Use of Compact Optimization instead of separation (AIRO) • Lagrangian Relaxation (RECOMB) • (Pubblications: RECOMB proceedings, AIRO proceedings, • OR Letters, Journal of Comp. Bio., 4OR)
Non-crossing Alignments Protein 1 Protein 2 non-crossing map of residues in protein 1 and protein 2
The value of an alignment Value = 3
The value of an alignment Value = 3 We want to maximize the value
The value of an alignment NP-Hard (Goldman, Istrail, Papadimitriou, 1999)
Integer Programming Formulation The use of Integer Linear Programming • Model a difficult problem by 0-1 variables, linear objective function and • linear constraints • Can find optimal solution by branch and bound • Bound comes from LP relaxation (polynomial) • Bound can be used to access quality of any feasible sol
(i) 0-1 VARIABLES e CONTACT-CONTACT VARS yef yeffor e and f contacts f RESIDUE-RESIDUE VARS i xijfor i and j residues xij yef j
(ii) OBJECTIVE maximize SeSfyef over all feasible x and y
(iii) CONSTRAINTS (FEASIBILITY) i i’ p i q j j’ j xij + xi’j’<= 1 y(ip)(jq) <= xij and y(ip)(jq)<= xpq activation non-crossing
Non-crossing clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j
Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j • An independent set corresponds to a noncrossing alignment • Gx has nice proprieties (it’s a perfect graph) • It’s easy (poly) to find large independent sets in Gx
Clique Constraints Non-crossing constraints can be extended to CLIQUE CONSTRAINTS S xij<= 1 [i,j] in M For all sets M of mutually incompatible (i.e. crossing) lines All clique constraints satisfied imply a strong bound!
Structure of Maximal cliques in Gx 1. Pick two subsets of same size
Structure of Maximal cliques in Gx 2. Connect them in a zig-zag fashion
Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag
Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag
Structure of Maximal cliques in Gx The result is a maximal clique in Gx
Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ?
PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as…
PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M
PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M THEOREM We can find the most violated clique inequality in time O(n2)
Separation of Clique Inequalities 2 n1 1 i 1 2 u n2 Create n1x n2 grid
Separation of Clique Inequalities 2 n1 1 i 1 2 x*iu u x*iu n2 Create n1x n2 grid Orient all edges and give weights
Separation of Clique Inequalities B=(n1,1) 0 .20 .15 .35 0 0 .25 .20 .30 A=(1,n2) Create n1x n2 grid Orient all edges and give weights There is violated clique iff longest A,B path has length > 1
The method which adds violated inequalities by separation is called BRANCH-and-CUT • The method can get stuck in long runs of cut additions each of which “cuts very little” • There is an alternative to this, called COMPACT OPTIMIZATION