290 likes | 369 Views
Algorithm Analysis. An Overview. The purpose of this section is to develop some competencies in analyzing algorithms -- that is, given any two (or more) algorithms, is there a way to determine which would be more efficient?
E N D
An Overview • The purpose of this section is to develop some competencies in analyzing algorithms -- that is, given any two (or more) algorithms, is there a way to determine which would be more efficient? • The value of this capability should be obvious – you could select algorithms based on performance metrics.
The Tool Set • We have seen, over and over, that each level of complexity in our theoretical study has utilized a toolset • Algorithm analysis is no different • Here, we will be using mathematics to help us quantify performance requirements
The Toolset Refresher • As we move forward in our analysis, a bit of a refresher might be helpful • Our goal is to mathematically describe the performance attributes of a given algorithm • We will think of an algorithm as being composed of lines of programming code • Now, from our discussion of the von Neumann model of computing operation, we know that each line of code is translated into many “lines” of machine code instructions • Each instruction, in turn, takes some finite amount of time to be processed
Toolset • And, from our unit on programming, we learned that control structures can be created without knowing in advance how many times they will be executed… • How, then, will it be possible to figure out how long a given algorithm will take to execute?
The Toolset • This will be possible because we will conduct our analysis at a level of abstraction • We will develop a notation set that allows us to symbolically represent various computing operations
The Constant, C • The first piece of notation we will develop will allow us to recognize that processing speed and instruction set translation will differ from machine to machine, and from instruction to instruction • So… we will say something like this: • For a typical line of code, let’s say an assignment statement such as x = 1, the time this line of code takes to execute is some constant, C • This is admittedly fudging… can we get away with it?
Fudging, Justified • As it turns out, yes we can. We can isolate the contribution of processor speed by running identical coding programs on various processors, and recording time differences • And, the instruction set, by design, executes so rapidly that translation differences are typically negligible • We will see that a much more significant driver to how long a given algorithm takes to execute, other things being equal, is the number of times re-processing occurs
How We Can Get Started • So… let’s start by assigning a typical line of code in an algorithm a processing time of some constant, C • We don’t know exactly how long each C will take, but we can safely fudge and say that they will be order of magnitude equivalent
Analysis Example 1 • Consider this algorithm: Begin X = 1 Y = 2 Z = x + y Print z End • About how long does this algorithm take to execute? About 6C • How did we come up with this number? It’s this simple – every line of code in the above algorithm takes some constant, C, time to execute – there are 6 lines of code, adding up 6 C’s gives us: 6C!
What’s the Big Deal? • What if we don’t know exactly how many lines of code get executed… • For example, we learned that while loops can be driven by sentinel variables, so let’s revisit some code that reflects that complexity
Analysis Example 2 • Consider this algorithm: Begin While x < n, do the following steps Y = 1 Z = 2 P = Y + Z Print P End the while End the program
Analysis Example 2 • Now, we can see that each line of code would take C amount of time to execute, and since there are 7 lines of code, we could safely say that if each line of code were executed only once, this code would take 7C long to execute…. • But… we don’t know how many times the while loop will execute, because we don’t know the value of n (or x)
Toolset Refinement • Can we perform our same analysis trick as before? Let’s presume that our code loops some constant amount of time – N… Now we can say that the loop execution time is: • 6C * N • How did we get this? 6 lines of code (the while and endwhile count, along with the 4 nested lines of code), each of which take C time to execute, all of which executes N amount of times • Overall algorithm execution time: 2C + 6CN
More Fun • What if we had an algorithm with nested control structures, such as a while within a while…. • The same toolset trick will apply, but now we would see an N2 execution impact, because an internal loop processing of N would occur for each external loop processing of N….
C’s and N’s • Now, let’s look just at our toolset for a moment. • Remember, C is some constant number… while we don’t know what it is, it is constant • N, on the only hand, varies • In fact, we just saw that N can become squared • Consider the relative contributions to overall execution time from a constant, and a square-able number…. • That N can vary means that overall, the N term can have a much greater contribution to overall execution time than can C terms…
N Growth • Other kinds of algorithms and control structures have different impacts on N…. • For example, if you think about it a moment, N can represent the size of a dataset… you need a line of code to touch element in your set • If you have to pass through your entire data set of length N, then your algorithm would be of complexity N • If you pass thru twice, then your algorithm would be of complexity N2
Complexity Analysis • We have been looking at ways to analyze algorithms for efficiency • We saw that, depending on the architecture of the algorithm, processing requirements vary greatly • These requirements can be mathematically analyzed by assigning a constant value (in our case, c) to individual lines of code, and variable amounts (in our case, n) to indefinite coding execution such as that seen in looping structures
Adding it Up • Once symbolic notation has been applied to the individual lines of code, simple algebraic rules can be utilized to determine overall efficiency values • The efficiency value is typically expressed as an order of complexity value, (theta)
Order of Magnitude • Let’s see how this works • As an example, consider a search algorithm called a sequential search, that finds a particular element within an unordered list • Let’s say there are N items in our list • In the worse case, we would have to look at every element in the list (presuming the item we are looking for is at the very end) • This algorithm would have an order of magnitude of n)
Another Search • As another example, consider the case of searching for an element when you can be guaranteed that the list has been ordered • The most efficient algorithm here models the childhood game of “Guess My Secret Number” • The way to win this game was to guess a number at the mid-point each time, thereby discarding half of the dataset with every guess
Binary Search • There is a formal algorithm that does this same thing, called a binary search • The binary search is often employed in text searches, in which letters are given numeric ASCII values, and numeric comparisons are performed at successive midpoints • Any algorithm that reduces the dataset by half with each successive pass has an order of complexity of θ(lg n)
Graphical Searches • There is one other form of search that we will consider: a graphical search • Imagine a binary tree structure, in which every point on the tree branches in two paths • A worse case search on such a structure would be of order complexity 2n;an exponential algorithm
Other Familiar Algorithms • We now have a growing list of analyzed algorithms • The point here is that, once an algorithm has been analyzed, it can be compared to other possible solutions…. • In this way, a programmer can choose between competing algorithm, based on performance metrics
Order of Magnitude Comparisons • From a mathematical view, we have seen the following order of complexities: • Lg n – binary search • N – sequential search • N2 – selection sort • 2n – exponential search • How do these compare in execution efficiencies?
Comparing Various Order of Magnitudes • Given sufficient size N, these algorithms show rankable growth rates… • In other words, as the data size of N grows, 2n work grows the fastest, then N2, then n, and least of all, lg n • In fact, the 2n algorithm growth is so steep, that once the dataset surpasses 100 elements, execution time is no longer reasonable!
Comparing Various Order of Magnitudes • Remember we said that these comparisons are true, given sufficient size of N • What this meant was, with very small data sets (around the origin, if we were plotting the curves), there are some crossovers in curve growth • The complexity growth curves are nicely illustrated in your Schneider and Gersting text on pages 105 and 106
Conclusion • By thinking abstractly, and utilizing a notational scheme, we can mathematically compare algorithms • This comparison allows the computer scientist to make design decisions that control for complexity and execution requirements • As more and more algorithms are defined and added to the “classic” solution set, the job of algorithm selection will become easier… • Just think, you will be able to hear a problem and think to yourself – I could solve that with a binary search algorithm, which will execute in lg n time…