250 likes | 389 Views
COMP 482: Design and Analysis of Algorithms. Prof. Swarat Chaudhuri. Spring 2012 Lecture 17. Q1: Longest palindromic subsequence. Give an algorithm to find the longest subsequence of a given string A that is a palindrome. “ amantwocamelsacrazyplanacanalpanama ”. Q1-a: Palindromes (contd.).
E N D
COMP 482: Design and Analysis of Algorithms Prof. Swarat Chaudhuri Spring 2012 Lecture 17
Q1: Longest palindromic subsequence • Give an algorithm to find the longest subsequence of a given string A that is a palindrome. • “amantwocamelsacrazyplanacanalpanama”
Q1-a: Palindromes (contd.) • Every string can be decomposed into a sequence of palindromes. • Give an efficient algorithm to compute the smallest number of palindromes that makes up a given string.
RNA Secondary Structure • RNA. String B = b1b2bn over alphabet { A, C, G, U }. • Secondary structure. RNA is single-stranded so it tends to loop back and form base pairs with itself. This structure is essential for understanding behavior of molecule. C A Ex: GUCGAUUGAGCGAAUGUAACAACGUGGCUACGGCGAGA A A A U G C C G U A A G G U A U U A G A C G C U G C G C G A G C G A U G complementary base pairs: A-U, C-G
RNA Secondary Structure • Secondary structure. A set of pairs S = { (bi, bj) } that satisfy: • [Watson-Crick.] S is a matching and each pair in S is a Watson-Crick complement: A-U, U-A, C-G, or G-C. • [No sharp turns.] The ends of each pair are separated by at least 4 intervening bases. If (bi, bj) S, then i < j - 4. • [Non-crossing.] If (bi, bj) and (bk, bl) are two pairs in S, then we cannot have i < k < j < l. • Free energy. Usual hypothesis is that an RNA molecule will form the secondary structure with the optimum total free energy. • Goal. Given an RNA molecule B = b1b2bn, find a secondary structure S that maximizes the number of base pairs. approximate by number of base pairs
RNA Secondary Structure: Examples • Examples. G G G G G G G C U C U G C G C U C A U A U A G U A U A U A base pair U G U G G C C A U U G G G C A U G U U G G C C A U G A A A 4 ok sharp turn crossing
RNA Secondary Structure: Subproblems • First attempt. OPT(j) = maximum number of base pairs in a secondary structure of the substring b1b2bj. • Difficulty. Results in two sub-problems. • Finding secondary structure in: b1b2bt-1. • Finding secondary structure in: bt+1bt+2bn-1. match bt and bn t n 1 OPT(t-1) need more sub-problems
Dynamic Programming Over Intervals • Notation. OPT(i, j) = maximum number of base pairs in a secondary structure of the substring bibi+1bj. • Case 1. If i j - 4. • OPT(i, j) = 0 by no-sharp turns condition. • Case 2. Base bj is not involved in a pair. • OPT(i, j) = OPT(i, j-1) • Case 3. Base bj pairs with bt for some i t < j - 4. • non-crossing constraint decouples resulting sub-problems • OPT(i, j) = 1 + maxt { OPT(i, t-1) + OPT(t+1, j-1) } • Remark. Same core idea in CKY algorithm to parse context-free grammars. take max over t such that i t < j-4 andbt and bj are Watson-Crick complements
Bottom Up Dynamic Programming Over Intervals • Q. What order to solve the sub-problems? • A. Do shortest intervals first. • Running time. O(n3). RNA(b1,…,bn) { for k = 5, 6, …, n-1 for i = 1, 2, …, n-k j = i + k Compute M[i, j] return M[1, n] } 4 0 0 0 3 0 0 i 0 2 1 6 7 8 9 using recurrence j
Shortest Paths • Shortest path problem. Given a directed graph G = (V, E), with edge weights cvw, find shortest path from node s to node t. • Ex. Nodes represent agents in a financial setting and cvw is cost of transaction in which we buy from agent v and sell immediately to w. allow negative weights 10 2 3 9 s 18 6 6 -16 6 4 19 30 11 5 15 -8 6 16 20 t 7 44
5 5 6 6 0 Shortest Paths: Failed Attempts • Dijkstra. Can fail if negative edge costs. • Re-weighting. Adding a constant to every edge weight can fail. u 3 2 s v -6 1 t 2 2 s t 3 3 -3
Shortest Paths: Negative Cost Cycles • Negative cost cycle. • Observation. If some path from s to t contains a negative cost cycle, there does not exist a shortest s-t path; otherwise, there exists one that is simple. -6 -4 7 s t W c(W) < 0
Shortest Paths: Dynamic Programming • Def. OPT(i, v) = length of shortest v-t path P using at most i edges. • Case 1: P uses at most i-1 edges. • OPT(i, v) = OPT(i-1, v) • Case 2: P uses exactly i edges. • if (v, w) is first edge, then OPT uses (v, w), and then selects best w-t path using at most i-1 edges • Remark. By previous observation, if no negative cycles, thenOPT(n-1, v) = length of shortest v-t path.
Shortest Paths: Implementation • Analysis. (mn) time, (n2) space. • Finding the shortest paths. Maintain a "successor" for each table entry. Shortest-Path(G, t) { foreach node v V M[0, v] M[0, t] 0 for i = 1 to n-1 foreach node v V M[i, v] M[i-1, v] foreach edge (v, w) E M[i, v] min { M[i, v], M[i-1, w] + cvw } }
Shortest Paths: Practical Improvements • Practical improvements. • Maintain only one array M[v] = shortest v-t path that we havefound so far. • No need to check edges of the form (v, w) unless M[w] changedin previous iteration. • Theorem. Throughout the algorithm, M[v] is length of some v-t path, and after i rounds of updates, the value M[v] is no larger than the length of shortest v-t path using i edges. • Overall impact. • Memory: O(m + n). • Running time: O(mn) worst case, but substantially faster in practice.
Bellman-Ford: Efficient Implementation Push-Based-Shortest-Path(G, s, t) { foreach node v V { M[v] successor[v] } M[t] = 0 for i = 1 to n-1 { foreach node w V { if (M[w] has been updated in previous iteration) { foreach node v such that (v, w) E { if (M[v] > M[w] + cvw) { M[v] M[w] + cvw successor[v] w } } } If no M[w] value changed in iteration i, stop. } }
Dynamic Programming Summary • Recipe. • Characterize structure of problem. • Recursively define value of optimal solution. • Compute value of optimal solution. • Construct optimal solution from computed information. • Dynamic programming techniques. • Binary choice: weighted interval scheduling. • Multi-way choice: segmented least squares. • Adding a new variable: knapsack. • Dynamic programming over intervals: RNA secondary structure. • Top-down vs. bottom-up: different people have different intuitions. Viterbi algorithm for HMM also usesDP to optimize a maximum likelihoodtradeoff between parsimony and accuracy CKY parsing algorithm for context-freegrammar has similar structure
Detecting Negative Cycles • Lemma. If OPT(n,v) = OPT(n-1,v) for all v, then no negative cycles. • Pf. Bellman-Ford algorithm. • Lemma. If OPT(n,v) < OPT(n-1,v) for some node v, then (any) shortest path from v to t contains a cycle W. Moreover W has negative cost. • Pf. (by contradiction) • Since OPT(n,v) < OPT(n-1,v), we know P has exactly n edges. • By pigeonhole principle, P must contain a directed cycle W. • Deleting W yields a v-t path with < n edges W has negative cost. v t W c(W) < 0
Detecting Negative Cycles • Theorem. Can detect negative cost cycle in O(mn) time. • Add new node t and connect all nodes to t with 0-cost edge. • Check if OPT(n, v) = OPT(n-1, v) for all nodes v. • if yes, then no negative cycles • if no, then extract cycle from shortest path from v to t t 0 0 0 0 0 18 2 6 -23 5 -11 v -15
Detecting Negative Cycles: Summary • Bellman-Ford. O(mn) time, O(m + n) space. • Run Bellman-Ford for n iterations (instead of n-1). • Upon termination, Bellman-Ford successor variables trace a negative cycle if one exists. • See p. 288 for improved version and early termination rule.
Q2. Arbitrage • Arbitrage is the use of discrepancies in currency exchange rates to transform one unit of a currenct into more than one unit of the same currency. For example, suppose that 1 US dollar buys 0.7 British pound, 1 British pound buys 9.5 French francs, and 1 French franc buys 0.16 US dollar. Then, by converting currencies, a trader can start with a US dollar and buy 0.7 x 9.5 x 0.16 = 1.064$ US dollars, thus turning a profit of 6.4 percent. • Suppose that we are given n currencies c1,…, cn and an n x n table R of exchange rates, such that one unit of currency ci buys R[i,j] units of currency cj. • Give an efficient algorithm to determine whether or not there exists a sequence of currencies (ci1, …, cik) such that • R[i1, i2] x R[i2, i3] x… x R[ik-1,ik] x R[ik,i1] > 1. • Give an efficient algorithm to print out such a sequence if one exists. Analyze the running time of your algorithm.
Q3. Number of shortest paths • Suppose we have a directed graph with costs on the edges. The costs may be positive or negative, but every cycle in the graph has a strictly positive cost. We are also given two nodes v, w. Give an efficient algorithm that computes the number of shortest v-w paths in G.