330 likes | 341 Views
CPSC 411 Design and Analysis of Algorithms. Set 4: Greedy Algorithms Prof. Jennifer Welch Spring 2011. Greedy Algorithm Paradigm. Characteristics of greedy algorithms: make a sequence of choices each choice is the one that seems best so far, only depends on what's been done so far
E N D
CPSC 411Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Spring 2011 CPSC 411, Spring 2011: Set 4
Greedy Algorithm Paradigm • Characteristics of greedy algorithms: • make a sequence of choices • each choice is the one that seems best so far, only depends on what's been done so far • choice produces a smaller problem to be solved • In order for greedy heuristic to solve the problem, it must be that the optimal solution to the big problem contains optimal solutions to subproblems CPSC 411, Spring 2011: Set 4
Designing a Greedy Algorithm • Cast the problem so that we make a greedy (locally optimal) choice and are left with one subproblem • Prove there is always a (globally) optimal solution to the original problem that makes the greedy choice • Show that the choice together with an optimal solution to the subproblem gives an optimal solution to the original problem CPSC 411, Spring 2011: Set 4
Some Greedy Algorithms • fractional knapsack algorithm • Huffman codes • Kruskal's MST algorithm • Prim's MST algorithm • Dijkstra's SSSP algorithm • … CPSC 411, Spring 2011: Set 4
Knapsack Problem • There are n different items in a store • Item i : • weighs wi pounds • worth $vi • A thief breaks in • Can carry up to W pounds in his knapsack • What should he take to maximize the value of his haul? CPSC 411, Spring 2011: Set 4
0-1 vs. Fractional Knapsack • 0-1 Knapsack Problem: • the items cannot be divided • thief must take entire item or leave it behind • Fractional Knapsack Problem: • thief can take partial items • for instance, items are liquids or powders • solvable with a greedy algorithm… CPSC 411, Spring 2011: Set 4
Greedy Fractional Knapsack Algorithm • Sort items in decreasing order of value per pound • While still room in the knapsack (limit of W pounds) do • consider next item in sorted list • take as much as possible (all there is or as much as will fit) • O(n log n) running time (for the sort) CPSC 411, Spring 2011: Set 4
Greedy 0-1 Knapsack Alg? • 3 items: • item 1 weighs 10 lbs, worth $60 ($6/lb) • item 2 weighs 20 lbs, worth $100 ($5/lb) • item 3 weighs 30 lbs, worth $120 ($4/lb) • knapsack can hold 50 lbs • greedy strategy: • take item 1 • take item 2 • no room for item 3 suboptimal! CPSC 411, Spring 2011: Set 4
0-1 Knapsack Problem • Taking item 1 is a big mistake globally although looks good locally • Later we'll see a different algorithm design paradigm that does work for this problem CPSC 411, Spring 2011: Set 4
Finding Optimal Code • Input: • data file of characters and • number of occurrences of each character • Output: • a binary encoding of each character so that the data file can be represented as efficiently as possible • "optimal code" CPSC 411, Spring 2011: Set 4
Huffman Code • Idea: use short codes for more frequent characters and long codes for less frequent CPSC 411, Spring 2011: Set 4
How to Decode? • With fixed length code, easy: • break up into 3's, for instance • For variable length code, ensure that no character's code is the prefix of another • no ambiguity 101111110100 b d e a a CPSC 411, Spring 2011: Set 4
Binary Tree Representation fixed length code 0 1 cost of code is sum, over all chars c, of number of occurrences of c times depth of c in the tree 0 0 1 0 1 0 1 0 1 a b c d e f CPSC 411, Spring 2011: Set 4
1 Binary Tree Representation 0 1 variable length code a 0 1 cost of code is sum, over all chars c, of number of occurrences of c times depth of c in the tree 0 0 1 b c 0 1 d f e CPSC 411, Spring 2011: Set 4
Algorithm to Construct Tree Representing Huffman Code • Given set C of n chars, c occurs f[c] times • insert each c into priority queue Q using f[c] as key • for i := 1 to n-1 do • x := extract-min(Q) • y := extract-min(Q) • make a new node z w/ left child x (label edge 0), right child y (label edge 1), and f[z] = f[x] + f[y] • insert z into Q CPSC 411, Spring 2011: Set 4
<board work> CPSC 411, Spring 2011: Set 4
Graphs • Directed graph G = (V,E) • V is a finite set of vertices • E is the edge set, a binary relation on E (ordered pairs of vertice) • In an undirected graph, edges are unordered pairs of vertices • If (u,v) is in E, then v is adjacent to u and (u,v) is incident from u and incident to v; v is neighbor of u • in an undirected graph, u is also adjacent to v, and (u,v) is incident on u and v • Degree of a vertex is number of edges incident from or to (or on) the vertex CPSC 411, Spring 2011: Set 4
More on Graphs • A path of length kfrom vertex u to vertex u’ is a sequence of k+1 vertices s.t. there is an edge from each vertex to the next in the sequence • In this case, u’ is reachable from u • A path is simple if no vertices are repeated • A path forms a cycle if last vertex = first vertex • A cycle is simple if no vertices are repeated other than first and last CPSC 411, Spring 2011: Set 4
Graph Representations • So far, we have discussed graphs purely as mathematical concepts • How can we represent a graph in a computer program? We need some data structures. • Two most common representations are: • adjacency list – best for sparse graphs (few edges) • adjacency matrix – best for dense graphs (lots of edges) CPSC 411, Spring 2011: Set 4
Adjacency List Representation • Given graph G = (V,E) • array Adj of |V| linked lists, one for each vertex in V • The adjacency list for vertex u, Adj[u], contains all vertices v s.t. (u,v) is in E • each vertex in the list might be just a string, representing its name, or it might be some other data structure representing the vertex, or it might be a pointer to the data structure for that vertex • Requires Θ(V+E) space (memory) • Requires Θ(degree(u)) time to check if (u,v) is in E CPSC 411, Spring 2011: Set 4
Adjacency Matrix Representation • Given graph G = (V,E) • Number the vertices 1, 2, ..., |V| • Use a |V| x |V| matrix A with (i,j)-entry of A being • 1 if (i,j) is in E • 0 if (i,j) is not in E • Requires Θ(V2) space (memory) • Requires Θ(1) time to check if (u,v) is in E CPSC 411, Spring 2011: Set 4
More on Graph Representations • Read Ch 22, Sec 1 for • more about directed vs. undirected graphs • how to represented weighted graphs (some value is associated with each edge) • pros and cons of adjacency list vs. adjacency matrix CPSC 411, Spring 2011: Set 4
Minimum Spanning Tree 16 5 4 11 12 7 3 14 9 6 2 8 10 15 17 13 18 Given a connected undirected graph with edge weights, find subset of edges that spans all the nodes, creates no cycle, and minimizes sum of weights CPSC 411, Spring 2011: Set 4
Facts About MSTs • There can be many spanning trees of a graph • In fact, there can be many minimum spanning trees of a graph • But if every edge has a unique weight, then there is a unique MST CPSC 411, Spring 2011: Set 4
Uniqueness of MST • Suppose in contradiction there are 2 MSTs, M1 and M2. • Let e be edge with minimum weight that is in one MST but not the other (say it is in M1). • If e is added to M2, a cycle is formed. • Let e' be an edge in the cycle that is not in M1 CPSC 411, Spring 2011: Set 4
Uniqueness of MST e: in M1 but not M2 M2: e': in M2 but not M1; wt of e' is less than wt of e, by choice of e Replacing e with e' creates a new MST M3 whose weight is less than that of M2 contradiction CPSC 411, Spring 2011: Set 4
Kruskal's MST algorithm 16 5 4 11 12 7 3 14 9 6 2 8 10 15 17 13 18 consider the edges in increasing order of weight, add in an edge iff it does not cause a cycle CPSC 411, Spring 2011: Set 4
Why is Kruskal's Greedy? • Algorithm manages a set of edges s.t. • these edges are a subset of some MST • At each iteration: • choose an edge so that the MST-subset property remains true • subproblem left is to do the same with the remaining edges • Always try to add cheapest available edge that will not violate the tree property • locally optimal choice CPSC 411, Spring 2011: Set 4
Correctness of Kruskal's Alg. • Let e1, e2, …, en-1 be sequence of edges chosen • Clearly they form a spanning tree • Suppose it is not minimum weight • Let eibe the edge where the algorithm goes wrong • {e1,…,ei-1} is part of some MST M • but {e1,…,ei} is not part of any MST CPSC 411, Spring 2011: Set 4
Correctness of Kruskal's Alg. M: ei, forms a cycle in M wt(e*) > wt(ei) replacing e* w/ eiforms a spanning tree with smaller weight than M, contradiction! e* : min wt. edge in cycle not in e1 to ei-1 gray edges are part of MST M, which contains e1 to ei-1, but not ei CPSC 411, Spring 2011: Set 4
Note on Correctness Proof • Argument on previous slide works for case when every edge has a unique weight • Algorithm also works when edge weights are not necessarily unique • Modify proof on previous slide: contradiction is reached to assumption that ei is not part of any MST CPSC 411, Spring 2011: Set 4
Implementing Kruskal's Alg. • Sort edges by weight • efficient algorithms known • How to test quickly if adding in the next edge would cause a cycle? • use disjoint set data structure, later CPSC 411, Spring 2011: Set 4
Another Greedy MST Alg. • Kruskal's algorithm maintains a forest that grows until it forms a spanning tree • Alternative idea is keep just one tree and grow it until it spans all the nodes • Prim's algorithm • At each iteration, choose the minimum weight outgoing edge to add • greedy! CPSC 411, Spring 2011: Set 4