15-211 Fundamental Structures of Computer Science

15-211 Fundamental Structures of Computer Science April 29, 2003

Announcements • HW6 due on Thursday at 11:59pm • Final exam review on Sunday • 4:30-6:00pm in WeH 7500 • Othello tournament on May 7 • 4:30-6:00pm in WeH 7500 • Final exam on May 8 • 8:30-11:30am in UC McConomy • Special problem session • tomorrow’s recitation!

Randomized Algorithms

Quicksort, revisited

Quicksort idea • Choose a pivot.

Quicksort idea • Choose a pivot. • Rearrange so that pivot is in the “right” spot.

Quicksort idea • Choose a pivot. • Rearrange so that pivot is in the “right” spot. • Recurse on each half and conquer!

105 47 13 17 30 222 5 19 5 17 13 47 30 222 105 5 17 30 222 105 105 222 Quicksort algorithm 19 13 47

105 47 13 17 30 222 5 19 5 17 13 47 30 222 105 105 222 Quicksort algorithm 19 13 47 5 17 30 222 105

Performance of quicksort • In the worst case, quicksort runs in O(N2) time. • This happens when the input is sorted (or “mostly” sorted). • However, the average-case running time is O(Nlog N). • Height of the quicksort tree is expected to be O(log N).

Worst-case analysis • “Quicksort has a worst-case running time of O(N2).” • This means that there is at least one input of size N that will require O(N2) operations.

Average-case analysis • “Quicksort has an average-case running time of O(Nlog N).” • This means that if we run quicksort on all inputs of size N, then on average O(Nlog N) operations will be required for each run.

Average case vs. worst case • So, is quicksort a good algorithm or not? • In other words, in the real world, do we get the worst-case or the average-case performance? • Unfortunately, in practice, mostly sorted inputs often occur much more often than is statistically expected.

Randomized Algorithms Read Chapter 9

Randomized quicksort • As a slight variation of quicksort, let’s arrange for the pivot element to be chosen at random. • Then a sorted input will not necessarily exhibit the O(N2) worst-case behavior. • Indeed, there are no longer any “bad inputs”; there are only “unlucky” choices of pivots.

Randomized algorithms • Algorithms that make use of randomness are called randomized algorithms.

Analysis of randomized qsort • Consider taking a single, specific input of size N. • Maybe even a sorted input. • If we repeatedly run our randomized quicksort on this input, we would expect to get different running times for many of the runs. • But what would be the average running time?

105 47 13 17 30 222 5 19 5 17 13 47 30 222 105 19 5 17 30 222 105 13 47 105 222 Analysis of randomized qsort • Consider the quicksort tree:

Analysis of randomized qsort • The time spent at each level of the tree is O(N). • So, on average, how many levels? • That is, what is the expected height of the tree? • If on average there are O(log N) levels, then randomized quicksort is O(Nlog N) on average.

5 13 17 19 30 47 105 222 Expected height of qsort tree • Assume that pivot is chosen randomly. • When is a pivot “good”? When is it “bad”? Probability of a good pivot is 0.5. After good pivot, each child is at most 3/4 size of parent.

Expected height of qsort tree • So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the kth child is: • N(3/4)(3/4) … (3/4) (k times) • = N(3/4)k • But on average, only half of the pivots will be good, so • N(3/4)k/2 = 1 • K= 2log4/3N = O(log N)

Randomized quicksort • So, for any particular input, we expect randomized quicksort to run in O(Nlog N) time. • This is referred to as the expected-case running time.

Expected running time • Worst-case: • The time required for the “pathological” input. • Average-case: • The time required, averaged over all inputs. • Expected-case: • The time required, averaged over infinitely many runs on any input.

Implementing Randomness

Choosing a pivot at random • Given an array of N elements, we want to pick one of the elements at random. • One way is to flip a coin multiple times. • Another is to generate a random number between 0 and N-1. • How to do these things?

Random sequences • Note that generally we need sequences of random choices. • Thus, approaches such as reading the system clock or other “background noise” from nature aren’t practical. • See http://www.fourmilab.ch/hotbits/

Random numbers in Java • The class java.util.Random provides methods for generating sequences of random bits and numbers. • java.util.Random() • creates a random number generator • java.util.nextBoolean() • returns the next random bit • java.util.nextInt(n) • returns the next random number

Pseudorandom numbers • Actually, java.util.Random does not generate true random numbers, but pseudorandom numbers. • Pseudorandom numbers appear to be random and have many of the properties of random numbers. • But they generally the sequences have a (large) period.

Linear congruential Method • A well known algorithm for generating random numbers – by D. Lehmer (1951) • To generate a sequence of pseudorandom numbers x1, x2, … • xi+1 = (A xi)mod M • x0 is the seed value, and should be chosen so that 1x0M. A is some constant value • This will generate a sequence of numbers with a maximum period of M-1. • If M= 231-1 and A = 48,271, then we get a period of 231-2.

Overflow • The LCG is not entirely practical, because computing Axi can result in very large numbers that overflow. • Java integers are represented in 32 bits. • Range of -231…231-1. • If a computation results in a value that is out of range, then overflow occurs. • Overflow affects the randomness of the LCG sequence. • Read page 328 of the text to see how to deal with overflow

Other Randomized Algorithms

Randomized Primality testing • There are no known polynomial-time algorithms for testing whether a large integer is prime. • Fermat’s Little Theorem • If p is prime and 0<A<P, • then Ap-1 = 1(mod P) • Converse is not true • However, there are polynomial-time randomized algorithms that work with very high probability. • Algorithm might incorrectly say “prime”. • The chance of an incorrect answer can be made smaller than the chance of a hardware failure.

Thursday • We will revisit all of the topics we have discussed in the course. • Go to Recitation tomorrow

15-211 Fundamental Structures of Computer Science