1 / 30

Advanced Algorithms

Advanced Algorithms. Piyush Kumar ( Lecture 17: Online Algorithms). Welcome to COT5405. On Bounds. Worst Case. Average Case: Running time over some distribution of input. (Quicksort) Amortized Analysis: Worst case bound on sequence of operations. (Bit Increments, Union-Find)

Download Presentation

Advanced Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Algorithms Piyush Kumar (Lecture 17: Online Algorithms) Welcome to COT5405

  2. On Bounds • Worst Case. • Average Case: Running time over some distribution of input. (Quicksort) • Amortized Analysis: Worst case bound on sequence of operations. • (Bit Increments, Union-Find) • Competitive Analysis: Compare the cost of an on-line algorithm with an optimal prescient algorithm on any sequence of requests. • Today.

  3. Problem 1 • The online dating game. • You get to date fixed number of partners. • You either choose to pick them up or try your luck again. • You can not go back in time. • What strategy would you use to pick?

  4. Problem 2. • You like to Ski. • When weather AND mood permits, you go skiing • If you own the equipment, you take it with you, Otherwise Rent. • You can buy the equipment whenever you decide, but not while skiing.

  5. Costs • 1 Unit to rent, M units to buy • If you go ski I times, what is OPT? OPT = min (I,M) What algorithm should you use to decide whether you Should buy the equipment?

  6. Algorithms • Algorithm 1: • Buy equipment ofter first day. • Competitive algorithm • CostALG(σ) <= ρCostOPT(σ)+b An Algorithm is called ρ-competitive if there exists some constant b such that for every sequence of inputs σ Cost OPT (σ)=Min(I,M) = 1? Cost ALG (σ)=M ρ >= M

  7. Algorithms • Algorithm 2: Rent for (M-1) days and buy on Mth day. • L < M : CostALG(σ) = CostOPT(σ) • L >= M : CostALG(σ) = 2M – 1 CostOPT(σ) = M • Competitive ratio = 2 – 1/M

  8. Ski Rental • Alg 3: Rent for k days and buy on (k+1)th day. • CostALG(σ) = k+M • CostOPT(σ) = min(M,k) • Competitive ratio = 2?

  9. Problem 3: (1D)Monkey Looking for food Hidden What is the best competitive algorithm you can come up With? What is its competitive ratio?

  10. Problem 3.(3D) • Monkey looking for food. Hidden

  11. On Line Algorithms • Work without full knowledge of the future • Deal with a sequence of events • Future events are unknown to the algorithm • The algorithm has to deal with one event at each time. The next event happens only after the algorithm is done dealing with the previous event

  12. On-Line versus off-line • We compare the behavior of the on-line algorithm to an optimal off-line algorithm “OPT” which is familiar with the sequence • The off-line algorithm knows the exact properties of all the events in the sequence

  13. Absolute competitive ratio (for minimization problems) • We measure the performance of an on-line algorithm by the competitive ratio • This is the ratio between what the on-line algorithms “pays” to what the optimal off-line algorithm “pays”

  14. Formally: let be the cost of the on-line algorithm on sequence . Let be the optimal off-line cost on then the competitive ratio is: • Calculus: supremum is similar to maximum but may be achieved in the limit

  15. Problem 4: Caching • K-competitive caching. • Two level memory model • If a page is not in the cache , a page fault occurs. • A Paging algorithm specifies which page to evict on a fault. • Paging algorithms are online algorithms for cache replacement.

  16. Online Paging Algorithms • Assumption: cache can hold k-pages. • CPU accesses memory thru cache. • Each request specifies a page in the memory system. • We want to minimize the page faults.

  17. A Lower bound • Theorem: Let A be a deterministic online paging algorithm. If A is -competitive, then k. • Pf: Let S ={p_1,p_2, … , p_k+1} be a set of k+1 arbitrary memory pages. Assume w.l.g. that A and OPT initially have p_1, … , p_k in their cache. In the worst case A has a page fault on any request t.

  18. Online Algorithm and Competitive Analysis • Theorem. LRU is k-competitive. • Proof:Let  be a subsequence of  on which LRU faults exactly k times. Let p denote page requested just before . • Case 1: LRU faults in sequence  on p. •  requests at least k+1 different pages MIN faults at least once • Case 2: LRU faults on some page, say q, at least twice in . •  requests at least k+1 different pages MIN faults at least once LRU : Least recently used Evicts page whose most recent access was earliest

  19. Theorem. LRU is k-competitive. • Proof:Let  be a subsequence of  on which LRU faults exactly k times. Let p denote page requested just before . • Case 3: LRU does not fault on p, nor on any page more than once. • k different pages are accessed and faulted on, none of which is p • p is in MIN's cache at start of   MIN faults at least once MIN faults  1 times : 0 1 2 1 . . . p . . . LRU faults k times LRU faults k times

  20. Universal Hashing

  21. Dictionary Data Type • Dictionary. Given a universe U of possible elements, maintain a subset S  U so that inserting, deleting, and searching in S is efficient. • Dictionary interface. • Create(): Initialize a dictionary with S = . • Insert(u): Add element u  U to S. • Delete(u): Delete u from S, if u is currently in S. • Lookup(u): Determine whether u is in S. • Challenge. Universe U can be extremely large so defining an array of size |U| is infeasible. • Applications. File systems, databases, Google, compilers, checksums P2P networks, associative arrays, cryptography, web caching, etc.

  22. Hashing • Hash function. h : U  { 0, 1, …, n-1 }. • Hashing. Create an array H of size n. When processing element u, access array element H[h(u)]. • Collision. When h(u) = h(v) but u  v. • A collision is expected after (n) random insertions. This phenomenon is known as the "birthday paradox." • Separate chaining: H[i] stores linked list of elements u with h(u) = i. jocularly seriously H[1] null H[2] suburban untravelled considerating H[3] browsing H[n]

  23. Ad Hoc Hash Function • Ad hoc hash function. • Deterministic hashing. If |U|  n2, then for any fixed hash function h, there is a subset S  U of n elements that all hash to same slot. Thus, (n) time per search in worst-case. • Q. But isn't ad hoc hash function good enough in practice? int h(String s, int n) { int hash = 0; for (int i = 0; i < s.length(); i++) hash = (31 * hash) + s[i]; return hash % n; } hash function ala Java string library

  24. Algorithmic Complexity Attacks • When can't we live with ad hoc hash function? • Obvious situations: aircraft control, nuclear reactors. • Surprising situations: denial-of-service attacks. • Real world exploits. [Crosby-Wallach 2003] • Bro server: send carefully chosen packets to DOS the server, using less bandwidth than a dial-up modem • Perl 5.8.0: insert carefully chosen strings into associative array. • Linux 2.4.20 kernel: save files with carefully chosen names. malicious adversary learns your ad hoc hash function (e.g., by reading Java API) and causes a big pile-up in a single slot that grinds performance to a halt

  25. Hashing Performance • Idealistic hash function. Maps m elements uniformly at random to n hash slots. • Running time depends on length of chains. • Average length of chain =  = m / n. • Choose n  m  on average O(1) per insert, lookup, or delete. • Challenge. Achieve idealized randomized guarantees, but with a hash function where you can easily find items where you put them. • Approach. Use randomization in the choice of h. adversary knows the randomized algorithm you're using, but doesn't know random choices that the algorithm makes

  26. Universal Hashing chosen uniformly at random • Universal class of hash functions. [Carter-Wegman 1980s] • For any pair of elements u, v  U, • Can select random h efficiently. • Can compute h(u) efficiently. • Ex. U = { a, b, c, d, e, f }, n = 2. H = {h1, h2} Pr h  H[h(a) = h(b)] = 1/2 Pr h  H[h(a) = h(c)] = 1Pr h  H[h(a) = h(d)] = 0. . . a b c d e f not universal h1(x) 0 1 0 1 0 1 h2(x) 0 0 0 1 1 1 a b c d e f H = {h1, h2 , h3 , h4} Pr h  H[h(a) = h(b)] = 1/2Pr h  H[h(a) = h(c)] = 1/2 Pr h  H[h(a) = h(d)] = 1/2 Pr h  H[h(a) = h(e)] = 1/2 Pr h  H[h(a) = h(f)] = 0 . . . h1(x) 0 1 0 1 0 1 universal h2(x) 0 0 0 1 1 1 h3(x) 0 0 1 0 1 1 h4(x) 1 0 0 1 1 0

  27. Universal Hashing • Universal hashing property. Let H be a universal class of hash functions; let h  H be chosen uniformly at random from H; and letu  U. For any subset S  U of size at most n, the expected number of items in S that collide with u is at most 1. • Pf. For any element s  S, define indicator random variable Xs = 1 if h(s) = h(u) and 0 otherwise. Let X be a random variable counting the total number of collisions with u. universal(assumes u  S) linearity of expectation Xs is a 0-1 random variable

  28. Designing a Universal Family of Hash Functions no need for randomness here • Theorem. [Chebyshev 1850] There exists a prime between n and 2n. • Modulus. Choose a prime number p  n. • Integer encoding. Identify each element u  U with a base-p integer of r digits: x = (x1, x2, …, xr). • Hash function. Let A = set of all r-digit, base-p integers. For eacha = (a1, a2, …, ar) where 0  ai < p, define • Hash function family. H = { ha : a  A }.

  29. Designing a Universal Class of Hash Functions • Theorem. H = { ha : a  A } is a universal class of hash functions. • Pf. Let x = (x1, x2, …, xr) and y = (y1, y2, …, yr) be two distinct elements of U. We need to show that Pr[ha(x) = ha(y)]  1/n. • Since x  y, there exists an integer j such that xj  yj. • We have ha(x) = ha(y) iff • Can assume a was chosen uniformly at random by first selecting all coordinates ai where i  j, then selecting aj at random. Thus, we can assume ai is fixed for all coordinates i  j. • Since p is prime, aj z = m mod p has at most one solution among p possibilities. • Thus Pr[ha(x) = ha(y)] = 1/p  1/n. ▪ see lemma on next slide

  30. Number Theory Facts • Fact. Let p be prime, and let z  0 mod p. Then z = m mod p has at most one solution 0   < p. • Pf. • Suppose  and  are two different solutions. • Then ( - )z = 0 mod p; hence ( - )z is divisible by p. • Since z  0 mod p, we know that z is not divisible by p;it follows that ( - ) is divisible by p. • This implies  = . ▪ • Bonus fact. Can replace "at most one" with "exactly one" in above fact. • Pf idea. Euclid's algorithm.

More Related