980 likes | 1.14k Views
101 Optimal PDB Structure Alignments: A Branch-and-Cut Algorithm for the Maximum Contact Map Overlap Problem. Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail. Contact Maps. CONTACT MAPS. Unfolded protein. CONTACT MAPS. Unfolded protein. Folded protein = contacts. CONTACT MAPS.
E N D
101 Optimal PDB Structure Alignments:A Branch-and-Cut Algorithm for the Maximum Contact Map Overlap Problem Giuseppe Lancia Robert Carr Brian Walenz Sorin Istrail
CONTACT MAPS Unfolded protein
CONTACT MAPS Unfolded protein Folded protein = contacts
CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph
CONTACT MAPS Unfolded protein Folded protein = contacts Contact map = graph OBJECTIVE:align 3d folds of proteins = align contact maps
Contact Map of a Self-Avoiding Walk 1 2 3 4 5 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 0 0 1 2 3 4 5 1 2 4 3 5 1 2 3 4 5
Non-crossing Alignments Protein 1 Protein 2 non-crossing map of residues in protein 1 and protein 2
The value of an alignment Value = 3
The value of an alignment Value = 3 We want to maximize the value
Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound)
Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts f
Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts e’ e f CONSTRAINTS yef + ye’f’<= 1 f’ f
Integer Programming Formulation The use of Integer Linear Programming * Exact solution * Heuristic + guarantee (LP upper bound) e 0-1 VARIABLES yef yef for e and f contacts Gy e’ e f CONSTRAINTS yef + ye’f’<= 1 y f’ f max OBJECTIVE e f ef
Independent Set Problem It’s just a huge max independent set problem in Gy: • a node for each sharing • an edge for each pair of incompatible sharings e’ e e’’ e’ f’ e f e’’ f’’ f’’ f f’
Independent Set Problem It’s just a huge max independent set problem in Gy: • a node for each sharing • an edge for each pair of incompatible sharings e’ e e’’ e’ f’ e f e’’ f’’ f’’ f f’ |Gy|=|E1|*|E2| (approximately 5000 for two proteins with 50 residues and 75 contacts each) The best exact algorithm for independent set can solve for at most a few hundred nodes
Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f
Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f NEW CONSTRAINTS i i’ j’ j xij + xi’j’ <= 1
Node to Node Variables New variables x provide an easy check for the non-crossing conditions e NEW VARIABLES i xij for i and j residues yef xij j f NEW CONSTRAINTS i i’ p i q j j’ j xij + xi’j’<= 1 y(ip)(jq) <= xij and y(ip)(jq)<= xpq
Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j
Clique Constraints Variables x define a graph Gx: • A node for each line • An edge between each pair of crossing lines i i’ i’ j’ i j j’ j • Gx is much smaller than Gy • Gx has nice proprieties (it’s a perfect graph) • It’s easier to find large independent sets in Gx
Clique Constraints Non-crossing constraints can be extended to CLIQUE CONSTRAINTS S xij<= 1 [i,j] in M For all sets M of mutually incompatible (i.e. crossing) lines All clique constraints satisfied (and Gx perfect) imply a strong bound!
Structure of Maximal cliques in Gx 1. Pick two subsets of same size
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 2. Connect them in a zig-zag fashion
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx 3. Throw in all lines included in a zig or a zag
Structure of Maximal cliques in Gx Structure of Maximal cliques in Gx The result is a maximal clique in Gx
Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ?
Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as…
Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M
Separation of Clique Inequalities PROBLEM There exist exponentially many such cliques (O(22n) inequalities). How do we add them ? SOLUTION We don’t add them in the original LP, but only when needed at run time. Not all of them will be needed, so we are fine as long as… SEPARATION …we can generate in polynomial time a clique inequality when needed, i.e., when violated by the current LP solution x* S x*ij> 1 [i,j] in M THEOREM We can find the most violated clique inequality in time O(n2)
Separation of Clique Inequalities PROOF (sketch) 1) Clique = zigzag path
Separation of Clique Inequalities PROOF (sketch) 1) Clique = zigzag path 1 2 3 4 5 6 7 8
Separation of Clique Inequalities PROOF (sketch) 2) Flip one graph: zigzag leftright 1) Clique = zigzag path 1 2 3 4 5 6 7 8 8 7 6 5 4 3 2 1