290 likes | 307 Views
Protein Design with DEE/A*. Algorithms for Drug Design 03/02/2011. General Protein Redesign Scheme. Rotamer Library. Input Structure. Energy Function:. …. …. <. <. <. Benefits of Provable Methods. In order enumeration. Conformations: Low E. High E. heuristic (MC, SCMF, GA).
E N D
Protein Design with DEE/A* Algorithms for Drug Design 03/02/2011
General Protein Redesign Scheme Rotamer Library Input Structure Energy Function: … … < < <
Benefits of Provable Methods In order enumeration Conformations: Low E High E heuristic (MC, SCMF, GA) Enumeration with gaps provable (A* Enumeration) Gap free enumeration
ir E it conformations Dead-End Elimination (DEE) it ir
Enumeration with A* • After DEE, more than one possibility at each design position. • Need to evaluate conformations. • Find the GMEC and a list of ordered conformations. • A* search algorithm. • First used by Leach and Lemon. PROTEINS 1998
Slight Modification to DEE Leach. et. al. PROTEINS. 33:227-239 (1998).
Slight Modification to DEE Leach. et. al. PROTEINS. 33:227-239 (1998).
A* search • Finds the least-cost path from the root node to one or more goal nodes. • Evaluation function – f* • At any node n, f* = g* + h* • g* = cost of reaching node n from the root node. • h* = estimated cost of reaching the goal node from n.
A* search continued • Search maintains a priority queue, with nodes ordered according to the value of f*. • At each stage – node with minimum value of f* is expanded and its successor nodes calculated. • Successor nodes entered in the queue, maintaining the f* order.
An example • Design of a Tri-peptide. • After DEE prunning, • Residue A – 3 rotamers. • Residue B – 3 rotamers. • Residue C – 2 rotamers. • Assume values of g* and h* are given at every node. Leach. et. al. PROTEINS. 33:227-239 (1998).
100 200 10 A 1 3 2
100 200 10 A 1 2 3 8 11 6 A2(21),A1(108),A3(206).
100 200 10 A 1 2 3 8 11 6 4 3 3 1 8 2 8 3 10 B A2B2(21),A2B1(22),A2B3(23),A1(108),A3(206).
100 200 10 A 1 2 3 8 11 6 4 3 3 1 8 2 8 3 10 B 12 8 C 1 2 A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108),A3(206).
100 200 10 A 1 2 3 8 11 6 4 3 3 1 8 2 8 3 10 B 12 8 1 2 C Rank 1 A2B2C2(21),A2B1(22),A2B3(23),A2B2C1(25),A1(108),A3(206).
100 200 10 A 1 2 3 8 11 6 4 3 3 1 8 2 8 3 10 B 8 12 12 8 1 2 1 2 C Rank 1 A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108),A3(206).
100 200 10 A 1 2 3 8 11 6 4 3 3 1 8 2 8 3 10 B 8 12 12 8 1 2 1 2 C Rank 2 Rank 1 A2B1C1(22),A2B3(23),A2B2C1(25),A2B1C2(26),A1(108),A3(206).
Provable Guarantees with A* • At any node n, g* is known exactly. • g* - known exactly • h* - estimate • h* should be admissible. • If C* is the actual cost, h*<= C* • A* guarantees to never overlook the possibility of a lower-cost path.
Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum.
Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes.
Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes. • estimated cost <= actual cost.
Proof • A* returns a goal node, when it is at the head of the queue => its cost is minimum. • Actual cost of head node <= the estimated cost of other nodes. • estimated cost <= actual cost. • Actual cost of head node <= actual cost of other nodes.
Protein Design and A* • Each node in the tree – partially assigned conformation. • g* = energy of the partially assigned conformations. • h* = minimum energy required to complete the model – hence will never overestimate the actual energy.
A 1 2 3 1 2 3 B
A 1 2 3 1 2 3 B g* = E(A2) + E(B1) + E(A2,B1)
A 1 2 3 1 2 3 B g* = E(A2) + E(B1) + E(A2,B1) 1 2 C h* = min [ E(Ci) + E(A2,Ci) + E(B1,Ci) ] i=1,2
A 1 2 3 1 2 3 B 1 2 C g*??, h*??