410 likes | 751 Views
Integrating Advanced Algorithms into Undergraduate Computer Science Curriculum. Yana Kortsarts Widener University Computer Science Department . Plan. Randomized Algorithms Teaching a Power of Randomization Using a Simple Game
E N D
Integrating Advanced Algorithms into Undergraduate Computer Science Curriculum Yana Kortsarts Widener University Computer Science Department
Plan • Randomized Algorithms • Teaching a Power of Randomization Using a Simple Game • Additional Examples of Advanced Algorithms in Undergraduate Computer Science Curriculum
Algorithm • An algorithm is a sequence of instructions for solving a problem • Deterministic Algorithm runs in the same way on the same input every time. Deterministic algorithm has predicted behavior • Randomized Algorithm is an algorithm that makes random choices during execution
Deterministic Algorithm THE SAME INPUT THE SAME BEHAIVOR OUTPUT
Randomized Algorithm THE SAME INPUT DIFFERENT BEHAIVOR OUTPUT
Why Should We Teach Randomized Algorithms? • Randomization is a general tool that applies in various computer science areas and not just a subject by itself. • Significance: many of the breakthroughs in various algorithmic areas have used randomization. • Example: Prime Number Test • Simple polynomial one-sided error Monte Carlo algorithm – Rabin Algorithm (1980) • A deterministic polynomial time algorithm was given by Agarwal, Kayal, and Saxena (2002).
Advantages of Randomized Algorithms • Performance: for many problems, randomized algorithms run faster than the best known deterministic algorithm • Simplicity: many randomized algorithms are simpler to describe and implement than deterministic algorithms of comparable performance.
Challenges and Solutions • The concept of a randomized algorithm can be difficult to understand. • Usually, there is no separate course on Randomized Algorithms in undergraduate CS curriculum • The idea of a randomized algorithm is clearer for students when presented as a game. • Topic could be integrated into introductory courses
Algorithm as Part of the Game Design of an Algorithm for a Combinatorial Problem GAME Algorithm Player Input Player Designs the Algorithms Goal: Minimize Running Time of the Algorithm Goal: Maximize Running Time of the Algorithm Selects Test Input for Selected Algorithm
Deterministic Algorithms Algorithm Player Input Player Deterministic Strategy (Deterministic Algorithm) Best Strategy Finding the Worst Input for the Algorithm Produced by the Algorithm Player • Reveals an entire strategy • (algorithm) first • Input Player can pick • the worst example for • the suggested algorithm
Deterministic Algorithms The problem facing the algorithm player is that if it uses a deterministic strategy, then since in a sense it “moves first", the second (input) player can indeed pick the worst example for the suggested algorithm
Randomized Algorithms Algorithm Player Input Player Randomized Strategy (Randomized Algorithm) A “bad” input for a randomized algorithm has to be an input which is bad for several algorithms simultaneously • Randomized algorithm can be seen • as a distribution over all possible • deterministic algorithms • Doesn’t reveal his cards fully in advance • Tells the second player the probability by • which it selects any one of the possible • deterministic algorithms • The coins have not fallen yet, and the game • only begins after the input player chooses • its adversarial input.
Game Description • Player 1: Decides on integer x > 0 • Player 2 : Has to find a number ynso that yn x • Rules: • Player 2: y1<y2< …< yn • y1<y2< …< yn-1< x and yn x • On a guess yj,player 1 either says: • smaller than x, please provide a next guess • larger or equal x, and reveals x stopping the game
Optimization Criteria • Let the guesses be y1, y2, ….yn, so that yn x yj< x for all j≤n – 1 • The optimization criteria is the performance ratio:
EXAMPLE Player 1 Player 2 x is chosen, pleaseprovide a guess Smaller than x, next guess 1 Smaller than x, next guess 3 Smaller than x, next guess 10 Smaller than x, next guess 28 STOP! x = 37 76 Performance Ratio 118 / 37 = 3.189189
EXAMPLE Player 1 Player 2 x is chosen, pleaseprovide a guess Smaller than x, next guess 1 Smaller than x, next guess 2 Smaller than x, next guess 3 Smaller than x, next guess 4 STOP! x = 5 23 Performance Ratio (1+2+3+4+23) / 5 = 6.6
Teaching the Game • Discussion of the Optimization Criteria selection: • Why not to choose yi/x, where yi x? • Answer: simple strategy 1, 2, 3, …is optimal • Discussion of possible strategies: • Why not to start with some “large” number? • Why do we not benefit from increasing the next guess only a little compared to the previous guess? • What is the disadvantage of making the next guess, say, 100 times larger than previous guess?
The Powers of 2 Strategy for the Second Player • It turns out that the simple strategy that selects powers of 2: y0 = 1, y1 = 2, y2 = 4, … yi = 2i … is an optimal deterministic strategy for this game • The worst case for the strategy is when the number selected by the first player is x = 2j+ 1 • In this case the game is played until the second player suggests 2j+1
EXAMPLE • x = 13 • guesses: 1, 2, 4, 8, 16 • sum: 1+2+4+8+16 = 31 • performance ratio is 2.384615 • x = 65 • guesses: 1, 2, 4, 8, 16, 32, 64, 128 • sum: 1+2+4+8+16+32+64+128 = 255 • performance ratio is 3.923
The Powers of 2 Strategy Analysis • The strategy y0 = 1, y1 = 2, y2 = 4, … yi = 2i gives a following performance ratio: • Worst Case: x = 2j+ 1, the performance ratio is:
Teaching the Game • Encourage students to find by themselves the worst case for the powers of 2 strategy. • This example serves well in illustrating the strict notion of the worst case input. • The bad instance for the powers of 2 strategy is a very specific and rare number. (1, 2, 4, 8, 16, 17, 32)
Teaching the Game • If x is some random number, the powers of 2 strategy performs much better. • A good place to discuss the difference between random strategy and random inputs. • The input is sometimes not within our control, while the randomized algorithm is within our control as the designers of the algorithms.
A Deterministic Worst CaseLower Bound • Let > 0 be a small as desired constant. • We show that any deterministic strategy has examples with performance ratio at least 4 - • The powers of 2 is the optimal deterministic strategy.
A Randomized Strategy The following simple randomized strategy gives an improved expected value • Let R [0, 1) – randomly and uniformly chosen from interval [0, 1) • Define yj = exp( j + ) • Let i be so that exp( i - 1 + ) < x≤exp( i + ) • The expected performance ratio is:
EXAMPLE Player 1 Player 2 x is chosen, pleaseprovide a guess = 0.419 Smaller than x, next guess exp() = 1 Smaller than x, next guess exp(+1) = 4 Smaller than x, next guess exp(+2) = 11 Smaller than x, next guess exp(+3) =30 STOP! x = 48 exp(+4) = 83 Performance Ratio 129 / 48 = 2.6875
EXAMPLE Player 1 Player 2 x is chosen, pleaseprovide a guess = 0.866 Smaller than x, next guess exp() = 2 Smaller than x, next guess exp(+1) = 6 Smaller than x, next guess exp(+2) = 17 Smaller than x, next guess exp(+3) =47 STOP! x = 63 exp(+4) = 129 Performance Ratio 201 / 63 = 3.190476
EXAMPLE Player 1 Player 2 x is chosen, pleaseprovide a guess = 0.195 Smaller than x, next guess exp() = 1 Smaller than x, next guess exp(+1) = 3 STOP! x = 6 exp(+2) = 8 Performance Ratio 12 / 6 = 2.0000
A Deterministic Worst CaseLower Bound Intuitive Explanation The second player cannot choose yi+1 to be “too large” compared to yi If after may times the second player makes such choices x = yi + 1 << yi+1 (yi+1 +yi)/x is already a large number For some yj y1+y2+…yj-1 >> yj The second player always selects yi+1 to be not much larger than yi The choice x = yjis bad for second player, since (y1+y2+….+yj)/yj is large
Summary • Simple game illustrating the power of randomization. • The full analysis of the game is presented in “Teaching the Power of Randomization Using Simple Game”, SIGCSE 2006 (Y. Kortsarts, J. Rufinus) • Teaching the game: • Introduction to Computer Science I and II • Design and Analysis of Algorithms • Undergraduate Research Projects
Summary • The game is well-motivated from the point of view of modern scheduling research • Even though this specific game seems not to have been studied before, the techniques illustrated here have been used in a series of papers on approximating scheduling problems [11, 7, 8, 4]. These papers study the fast scheduling of conflicting jobs with the goal of minimizing the sum of finish times of these jobs. Hence, the suggested game is at the heart of modern research.
Advanced Algorithms in Introductory CS Curriculum • Las Vegas - always gives the correct solution. • Monte Carlo - may sometimes produce an incorrect solution • How (and why) to Introduce Monte Carlo Randomized Algorithms Into a Basic Algorithms Course?, Y. Kortsarts, J. Rufinus, Journal of Computing Sciences in Colleges, December 2005 • Integrating a real-world scheduling problem into the basic algorithms course, Yana Kortsarts, Journal of Computing Sciences in Colleges, June 2007
Advanced Algorithms in Introductory CS Curriculum • Merkle-Hellman Knapsack Cryptosystem [31] • Elegant and beautiful underlying mathematics • Due to its simple structure, the knapsack cryptosystem is an ideal model for introducing algorithmic techniques and a concept of Public Key cryptosystem to computer science students • Sequence Alignment [32, 33] • Needleman and Wunsch Algorithm (Global Alignment) • Smith-Waterman Algorithm (Local Alignment)
Knapsack Cryptosystem in Computer Science Curriculum Cryptology Design and Analysis of Algorithms Introduction to Computer Science Concept of Public Key Cryptosystem • Knapsack Problem • Subset-Sum Problem • Algorithmic Techniques • Concept of Public Key • Cryptosystem • Computational Problems: • Prime Numbers • GCD, Euclidian Algorithm • Modular Exponentiation • Primitive Roots for Primes Undergraduate Student Research Projects
Sequence Alignment • Global Alignment: compare two sequences in their entirety; the gap penalty is assessed regardless of whether gaps are located internally within a sequence, or at the end of one or both sequences. • The Needleman and Wunsch Algorithm. • Local Alignment: find best matching subsequences within the two search sequences. • The Smith-Waterman Algorithm.
REFERENCES [1] S. Arora, C. Lund, R. Motwani, M. Sudan and M. Szegedy. Proof verication and the hardness of approximation problems. Journal of ACM, 45(3):501-555, 1998. [2] G. J. Brebner and L. G. Valiant, Universal schemes for parallel communication. Proceedings of the thirteenth annual ACM symposium on Theory of computing, Pages: 263 - 277, 1981 [3] T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to algorithms. The MIT Press, 2nd edition, 2001. [4] S. Chakrabarti, C. A. Phillips, A. S. Schulz, D. B. Shmoys, C. Stein and J. Wein. Improved scheduling algorithms for-minsum criteria. ICALP '96, 875-886. [5] A. Fiat, R. M. Karp, M. Luby, L. A. McGeoch, D. D. Sleator, and N. E. Young, Competitive paging algorithms. Journal of Algorithms archive Volume 12(4): 685 - 699 1991 [6] O. Goldreich, S. Micali, and A. Wigderson. Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems. Journal of the ACM, 38(3):690 - 728, 1991
REFERENCES [7] L. A. Hall, D. B. Shmoys, and J. Wein. Scheduling to minimize average completion time: O-line and on-line algorithms. SODA'96, 142-151. 42-151, Jan 1996. [8] L. A. Hall, A. Schulz, D. B. Shmoys, and J. Wein. Scheduling to minimize average completion time: O-line nd on-line approximation algorithms. Math. Operations Research 22:513-544, 1997. [9] G. Kalai, A subexponential randomized simplex algorithm, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, 475 - 482, 1992 [10] R. M. Karp, E. Upfal and A. Wigderson. Constructing a perfect matching is in random NC. Combinatorica Volume 6(1):35-48, 1986 [11] M. Queyranne, M. Sviridenko. Approximation algorithms for shop scheduling problems with minsum objective. J. Scheduling 5:287-305, 2002. [12] R. L. Rivest, A. Shamir, L. M. Adleman, A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Commun. ACM 21(2):120-126, 1978
REFERENCES [13] N. Alon and R Yuster and U Zwick. Color-coding Journal of the ACM, 42(4):844 - 856 [14] A. Bjorklund, T. Husfeldt and S. Khanna. Approximating Longest Directed Path. Symposium on Automata, Languages and Programming (ICALP) 2004, to appear. [15] D. Dor, U. Zwick, Selecting the Median, SIAM J. Comput, 28(5): 1722-1758, 1999. [16] D. Dor and U. Zwick, Median Selection Requires (2+epsilon)n Comparisons, SIAM Journal on Discrete Mathematics, 14(3):312-325
REFERENCES [17] R. W. Floyd and R. L. Rivest Expected time bounds for selection Communications of the ACM, 18(3):165 - 172, 1975. [18] T. Feder, R. Motwani, C. Subi. Finding long paths and cycles in sparse Hamiltonian graphs Proceedings of the ACM symposium on Theory of computing, pages 524 - 529, 1999 [19] H. Gabow, Finding paths and cycles of superpolylogarithmic size. Proceedings of the ACM symposium on Theory of computing, pages 407-416, 2004. [20] M. T. Goodrich and R. Tamassia. Using randomization in the teaching of data structures and algorithms, The proceedings of the thirtieth SIGCSE technical symposium on Computer science education, 53 - 57, 1999 [21] D. Karger, R. Motwani, and G.D.S. Ramkumar. On Approximating the Longest Path in a Graph. Algorithmica 18 (1997): 82-98.
REFERENCES [22] R. M. Karp. Reducibility among combinatorial problems, R. E. Miller and J. W. Thatcher, eds., Complexity of Computer Computations, Plenum Press, New York, 1972, pp. 85-103. [23] M. O. Rabin Probabilistic algorithm for testing primality, J. Number Theory, 12, 128-138, 1980. [24] N. Robertson and P. Seymour, Graph minors. II. Algorithmic aspects of tree-width. J. Algorithms 7, 1986. [25] R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge University Press, 1995 [26] R.M. Karp, An Introduction to randomized algorithms, Discrete Applied Mathematics, 34: 165-201, 1991
REFERENCES [27] D.R.Karger, Global min-cuts in RNC, and other ramifications of a simple min-cut algorithm, In Proceedings of the 4th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 21- 30, 1993. [28] M. J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill, 2004 [29] Y. Kortsarts, J. Rufinus, Teaching the Power of Randomization Using Simple Game, SIGCSE 2006 [30] Y. Kortsarts, J. Rufinus, How (and why) to Introduce Monte Carlo Randomized Algorithms Into a Basic Algorithms Course?,Journal of Computing Sciences in Colleges, 2005
REFERENCES [31] R. C. Merkle, M. E. Hellman, Hiding Information and Signatures in Trapdoor Knapsacks, IEEE Transactions on Information Theory, vol. IT-24, 1978, pp. 525-530. [32] An Introduction to Bioinformatics Algorithms, N.C. Jones and P. A. Pevzner, The MIT Press, 2004 [33] Fundamental Concepts of Bioinformatics, D. E. Krane and M . L. Raymer, Publisher: Benjamin Cummings, 2002