CS 150: Analysis of Algorithms

CS 150: Analysis of Algorithms

Goals for this Unit • Begin a focus on data structures and algorithms • Understand the nature of the performance of algorithms • Understand how we measure performance • Used to describe performance of various data structures • Begin to see the role of algorithms in the study of Computer Science

Algorithm • An algorithm is • a detailed step-by-step method for • solving a problem • Computer algorithms • Properties of algorithms • Steps are precisely stated • Determinism: based on inputs, previous steps • Algorithm terminates • Also: correctness, generality, efficiency

C_M_L_X_T_ • R O U F O • U B I R A • O U M A J • I P E Y O This is an important CS term for this unit. Can you guess? Choose a the set of letters below (their order is jumbled).

What makes an algorithm “better”? • Let’s look again at the Fibonacci example • Why does one version run much much slower than the other? • If recursion is this slow, why ever use it?

Why Not Just Time Algorithms? • We want a measure of work that gives us a direct measure of the efficiency of the algorithm • independent of computer, programming language, programmer, and other implementation details. • Usually depending on the size of the input • Also often dependent on the nature of the input • Best-case, worst-case, average

Analysis of Algorithms • Use mathematics as our tool for analyzing algorithm performance • Measure the algorithm itself, its nature • Not its implementation or its execution • Need to count something • Cost or number of steps is a function of input size n: e.g. for input size n, cost is f(n) • Count all steps in an algorithm? (Hopefully avoid this!)

Counting Operations • Strategy: choose one operation or one section of code such that • The total work is always roughly proportional to how often that’s done • So we’ll just count: • An algorithm’s “basic operation” • Or, an algorithms’ “critical section” • Sometimes the basic operation is some action that’s fundamentally central to how the algorithm works • Example: Search a List for a target involves comparing each list-item to the target. • The comparison operation is “fundamental.”

Asymptotic Analysis • Given some formula f(n) for the count/cost of some thing based on the input size • We’re going to focus on its “order” • f(n) = 2n ---> Exponential function • f(n) = 100n2 + 50n + 7 ---> Quadratic function • f(n) = 30 n lg n – 10 ---> Log-linear function • f(n) = 1000n ---> Linear function • These functions grow at different rates • As inputs get larger, the amount they increase differs • “Asymptotic” – how do things change as the input size n gets larger?

Comparison of Growth Rates

Comparison of Growth Rates (2)

Comparison of Growth Rates (3)

Order Classes • For a given algorithm, we count something: • f(n) = 100n2 + 50n + 7 ---> Quadratic function • How different is this than this? f(n) = 20n2 + 7n + 2 • For large inputs? • Order class: a “label” for all functions with the same highest-order term • Label form: O(n2) or Θ(n2) or a few others • “Big-Oh” used most often

Growth Notations • g  O(f)(“Big-Oh”) g grows no faster thanf(upper bound) • g (f) (“Theta”) g grows as fast asf (tight bound) • g  (f) (“Omega”) g grows no slower thanf (lower bound) Which one would we most like to know?

Meaning of O(“big Oh”) g is in O (f) iff: There are positive constants c and n0 such that g(n) ≤cf(n) for all n≥n0.

OExamples g is in O (f) iff there are positive constants c and n0 such that g(n) ≤cf(n) for all n≥n0. Is n in O (n2)? Yes, c = 1 and n0=1 works. Is 10n in O (n)? Yes, c = .09 and n0=1 works. No, no matter what c we pick, cn2 > n for big enough n (n > c) Is n2 in O (n)?

Back to Order Classes • Order classes group “equivalently” efficient algorithms • O(1) – constant time! Input size doesn’t matter • O(lg n) – logarithmic time. Very efficient. E.g. binary search (after sorting) • O(n) – linear time • O(n lg n) – log-linear time. E.g. best sorting algorithms • O(n2) – quadratic time. E.g. poorer sorting algorithms • O(n3) – cubic time • …. • O(2n) – exponential time. Many important problems, often about optimization.

Ω(“Omega”): Lower Bound g is in O (f) iff there are positive constants c and n0 such that g(n) ≤cf(n) for all n≥n0. g is in Ω (f) iff there are positive constants c and n0 such that for all n≥n0. g(n) ≥cf(n)

Example: Watch Code Run • Two implementations to calculateFibonacci numbers: F(0) = F(1) = 1 F(n) = F(n-1) + F(n-2) for n > 1 • Both correct! • Let’s run both for n=8, 16, 32, 64

When Does this Matter? • Size of input matters a lot! • For small inputs, we care a lot less • But what’s a big input? • Hard to know. For some algorithms, smaller than you think!

Two Important Problems • Search • Given a list of items and a target value • Find if/where it is in the list • Return special value if not there (“sentinel” value) • Note we’ve specified this at an abstract level • Haven’t said how to implement this • Is this specification complete? Why not? • Sorting • Given a list of items • Re-arrange them in some non-decreasing order • With solutions to these two, we can do many useful things! • What to count for complexity analysis? For both, the basic operation is:comparing two list-elements

Sequential Search Algorithm • Sequential Search • AKA linear search • Look through the list until we reach the end or find the target • Best-case? Worst-case? • Order class: O(n) • Advantages: simple to code, no assumptions about list

Binary Search Algorithm • Binary Search • Input list must be sorted • Strategy: Eliminate about half items left with one comparison • Look in the middle • If target larger, must be in the 2nd half • If target smaller, must be in the 1st half • Complexity: O(lg n) • Must sort list first, but… • Much more efficient than sequential search • Especially if search is done many times (sort once, search many times) • Note: Java provides static binarySearch()

Class Activity • We’ll give you some binary search inputs • Array of int values, plus a target to find • You tell us • what index is returned, and • the sequence of index values that were compared to the target to get that answer • Work in two’s or three’s, and we’ll call on groups to explain

Binary Search Example #1 • -1 4 • 0 1 • 5 -1 4 • 2 0 1 • other Input: -1 4 5 11 13 and target 4Index returned is?Vote on indices (positions) compared to target-value:

Binary Search Example #2 • 5 2 3 4 • 4 1 2 3 • 4 2 3 4 • other Input: -5 -2 -1 4 5 11 13 14 17 18 and target 3Index returned is?Vote on indices compared below:

Binary Search Example #3 • 0 1 2 3 4 • 2 4 • 2 3 4 • other Input: -1 4 5 11 13 and target 13Index returned is?Vote on indices compared below:

Note on last examples • Input size doubled from 5 to 10 • How did the worst-case number of comparisons change? • Also double? • Something else • Note binary search is O(lg n) • What does this imply here?

How to Sort? • Many sorting algorithms have been found! • Problem is a case-study in algorithm design • Some “straightforward” sorts • Insertion Sort, Selection Sort, Bubble Sort • O(n2) • More efficient sorts • Quicksort, mergesort, heapsort • O(n lg n) • Note: these are for sorting in memory, not on disk

Reminder Slide: Order Classes • For a given algorithm, we count something: • f(n) = 100n2 + 50n + 7 ---> Quadratic function • How different is this than this? f(n) = 20n2 + 7n + 2 • For large inputs? • Order class: a “label” for all functions with the same highest-order term • Label form: O(n2) or Θ(n2) or a few others • “Big-Oh” used most often

Order Classes Details • What does the label mean? O(n2) • Set of all functions that grow at the same rate as n2ormore slowly • I.e. as efficient as any “n2” or more efficient,but no worse • So this is an upper-bound on how inefficient an algorithm can be • Usage: We might say: Algorithm A is O(n2) • Means Algorithm A’s efficiency grows like a quadratic algorithm or grows more slowly. (As good or better) • What about that other label, Θ(n2)? • Set of all functions that grow at exactly the same rate • A more precise bound

Reminders: Input and Performance • Input matters: • First, we said for small size of inputs, often no difference • Also, often focus on the worst-case input • Why? • Sometimes interested in the average-case input • Why? • Also the best-case, but do we care?

Reminder: What to Count • Often count some “basic operation” • Or, we count a “critical section” • Examples: • The block of code most deeply nested in a nested set of loops • An operation like comparison in sorting • An expensive operation like multiplication or database query

Back to Order Classes • Order classes group “equivalently” efficient algorithms • O(1) – constant time! Input size doesn’t matter • O(lg n) – logarithmic time. Very efficient. E.g. binary search (after sorting) • O(n) – linear time • O(n lg n) – log-linear time. E.g. best sorting algorithms • O(n2) – quadratic time. E.g. poorer sorting algorithms • O(n3) – cubic time • …. • O(2n) – exponential time. Many important problems, often about optimization.

Discussion Question • Binary search is faster than sequential search • But extra cost! Must sort the list first! • When do you think it’s worth it?

Bigger Issues in Complexity • Many practical problems seem to require exponential time • Θ(2n) • This grows much faster more quickly than any polynomial • Such problems are called intractable problems • A famous subset of these are calledNP-complete problems • Most famous theoretical question in CS: • Is it really impossible to find a polynomial algorithm for the NP-complete problems? (Does P=NP?)

Example • Weighted-graph problems • Find the cheapest way to connect things? • O(n2) • Find the shortest path between two points? All pairs of points? • O(n2) or better • But, find the shortest path that visits all points in the graph? • Best we can do is exponential, Θ(2n) or worse • Is it impossible to do better? No one knows!

Summary and Major Points • When we measure algorithm complexity: • Base this on size of input • Count some basic operation or how often a critical section is executed • Get a formula f(n) for this • Then we think about it in terms of its “label”, the order class O( f(n) ) • “Big-Oh” means as efficient as “class f(n) or better • We usually use order-class to compare algorithms • We can measure worst-case, average-case

CS 150: Analysis of Algorithms