1 / 23

Algorithm Design and Analysis

L ECTURE 8 Greedy Algorithms V Huffman Codes. Algorithm Design and Analysis. CSE 565. Adam Smith. Review Questions. Let G be a connected undirected graph with distinct edge weights. Answer true or false: Let e be the cheapest edge in G. Some MST of G contains e?

minda
Download Presentation

Algorithm Design and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LECTURE 8 • Greedy Algorithms V • Huffman Codes Algorithm Design and Analysis CSE 565 Adam Smith A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  2. Review Questions Let G be a connected undirected graph with distinct edge weights. Answer true or false: • Let e be the cheapest edge in G. Some MST of G contains e? • Let e be the most expensive edge in G. No MST of G contains e? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  3. Review • Exercise: given an undirected graph G, consider spanning trees produced by four algorithms • BFS tree • DFS tree • shortest paths tree (Dijsktra) • MST • Find a graph where • all four trees are the same • all four trees must be different (note: DFS/BFS may depend on exploration order) A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  4. Non-distinct edges? • Read in text A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  5. Implementing MST algorithms • Prim: similar to Dijkstra • Kruskal: • Requires efficient data structure to keep track of “islands”: Union-Find data structure • We may revisit this later in the course • You should know how to implement Prim in O(mlogm/nn) time A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  6. Implementation of Prim(G,w) IDEA: Maintain V – Sas a priority queue Q(as in Dijkstra). Key each vertex in Q with the weight of the least-weight edge connecting it to a vertex in S. • Q V • key[v]  ¥ for all vÎV • key[s]  0 for some arbitrary sÎV • whileQ¹  • dou  EXTRACT-MIN(Q) • for each vÎAdjacency-list[u] • do ifvÎQ and w(u, v) < key[v] • thenkey[v]  w(u, v) ⊳DECREASE-KEY • p[v]  u At the end, {(v, p[v])} forms the MST. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  7. Q(n) total n times degree(u) times Analysis of Prim • Q V • key[v]  ¥ for all vÎV • key[s]  0 for some arbitrary sÎV • whileQ¹  • dou  EXTRACT-MIN(Q) • for each vÎAdj[u] • do ifvÎQ and w(u, v) < key[v] • thenkey[v]  w(u, v) • p[v]  u Handshaking Lemma  Q(m)implicit DECREASE-KEY’s. Time: as in Dijkstra A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  8. n times degree(u) times Analysis of Prim • whileQ¹  • dou  EXTRACT-MIN(Q) • for each vÎAdj[u] • do ifvÎQ and w(u, v) < key[v] • thenkey[v]  w(u, v) • p[v]  u Handshaking Lemma  Q(m)implicit DECREASE-KEY’s. PQ Operation Prim Array Binary heap d-way Heap Fib heap † ExtractMin n n log n HW3 log n DecreaseKey m 1 log n HW3 1 Total n2 m log n m log m/n n m + n log n † Individual ops are amortized bounds A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  9. Greedy Algorithms for MST • Kruskal's:Start with T = . Consider edges in ascending order of weights. Insert edge e in T unless doing so would create a cycle. • Reverse-Delete: Start with T = E. Consider edges in descending order of weights. Delete edge e from T unless doing so would disconnect T. • Prim's: Start with some root node s. Grow a tree T from s outward. At each step, add to T the cheapest edge e with exactly one endpoint in T. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  10. Union-Find Data Structures • With modifications, amortized time for tree structure is O(nAck(n)), where Ack(n), the Ackerman function grows much more slowly than log n. • See KT Chapter 4.6 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  11. Huffman codes A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  12. Prefix-free codes • Binary code maps characters in an alphabet (say {A,…,Z}) to binary strings • Prefix-free code: no codeword is a prefix of any other • ASCII: prefix-free (all symbols have same length) • Not prefix-free: • a  0 • b 1 • c 00 • d 01 • … • Why is prefix-free good? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  13. A prefix-free code for a few letters • e.g. e 00, p 10011 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne Source: WIkipedia

  14. A prefix-free code • e.g. T  1001, U  1000011 Source: Jeff Erickson notes. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  15. How good is a prefix-free code? • Given a text, let f[i] = # occurrences of letter i • Total number of symbols needed • How do we pick the best prefix-free code? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  16. Huffman’s Algorithm (1952) • Given individual letter frequencies f[1, .., n]: • Find the two least frequent letters i,j • Merge them into symbol with frequency f[i]+f[j] • Repeat • e.g. • a: 6 • b: 6 • c: 4 • d: 3 • e: 2 Theorem: Huffman algorithm finds an optimal prefix-free code A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  17. Warming up • Lemma 0: Every optimal prefix-free code corresponds to a full binary tree. • (Full = every node has 0 or 2 children) • Lemma 1: Let x and y be two least frequent characters. There is an optimal code in which x and y are siblings. A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  18. Huffman codes are optimal Proof by induction! • Base case: two symbols; only one full tree. • Induction step: • Suppose f[1], f[2] are smallest in f[1,…,n] • T is an optimal code for {1,…,n} • Lemma 1 ==> can choose T where 1,2 are siblings. • New symbol numbered n+1, with f[n+1] = f[1]+f[2] • T’ = code obtained by merging 1,2 into n+1 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  19. Cost of T in terms of T’: • Let H be Huffman code for {1,…,n} • Let H’ be Huffman code for {3,…,n+1} • Induction hypothesis cost(H’) ≤ cost(T’) • cost(H) = cost(H’)+f[1]+f[2] ≤ cost(T). QED A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  20. Notes • See Jeff Erickson’s lecture notes on greedy algorithms: • http://theory.cs.uiuc.edu/~jeffe/teaching/algorithms/ A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  21. Data Compression for real ? • Generally, we don’t use letter-by-letter encoding • Instead, find frequently repeated substrings • Lempel-Ziv algorithm extremely common • also has deep connections to entropy • If we have time for string algorithms, we’ll cover this… A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  22. Huffman codes and entropy • Given a set of frequencies, consider probabilities p[i] = f[i] / (f[1] + … + f[n]) • Entropy(p) = Σip[i] log(1/p[i]) • Huffman code has expected depth Entropy(p) ≤ Σip[i]depth(i) ≤ Entropy(p) +1 • To prove the upper bound, find some prefix free code where • depth(i) ≤ log(1/p[i]) +1 for every symbol i • Exercise! • The bound applies to Huffman too, by optimality A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

  23. Prefix-free encoding of arbitrary length strings • What if I want to send you a text • But you don’t know ahead of time how long it is? • 1: put length at the beginning: n+log(n) bits • requires me to know the length • 2: every B bits, put a special bit indicating whether or not we’re done: n(1+1/B) +B-1 bits • Can we do better? A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova, K. Wayne

More Related