10 likes | 265 Views
Comparison of Citation Discovery Methods. Using AMS, AGU and IEEE Journal Search Tools: AMS 2 EASI 7 IEEE 2 Wiley 4 Elsevier 7 ( can not confirm ) Total 22 Advantages: C an specify month range in search Can search in full text of document Can time order search results
E N D
CS201: PART 1 Data Structures & Algorithms S. Kondakcı
Analysis of Algorithms Algorithm Input Output • An algorithmis a step-by-step procedure for • solving a problem in a finite amount of time. • Theoretical Analysis of Algorithms: • Uses a high-level description of thealgorithm instead of an implementation • Characterizes running time as afunction of the input sizen. • Takes into account all possible inputsAllows us to evaluate the speed of any design independent of its implementation. Analysis of Algorithms
Program Efficiency Program efficiency: is a measure of the amount of resources required to produce desired results. Efficiency Aspects: 1) What are the important resources we should try to optimize? 2) Where are the important efficiency gains to be made? 3) How important is efficiency in the first place? Analysis of Algorithms
Efficiency Today • User Efficiency.The amount of time and effort users will spend to learn how o use the program, how to prepare the data, how to configure and customize the program, and how to interpret and use the output. • Maintenance Efficiency. The amount of time and effort maintenance group will spend reading a program and its technical documentation in order to understand it well enough to make any necessary modifications. • Algorithmic Complexity. The inherent efficiency of the method itself, regardless of which machine we run it on or how we code it. • Coding Efficiency. This is the traditional efficiency measure. Here we are concerned with how much processor time and memory space a computer program requires to produce desired results. Coding efficiency is the key step towards optimal usage of machine resources. Analysis of Algorithms
Programmer’s Duty • Programmers should should keep these in mind: • Correct, robust, and reliable. • Easy to use for its intended end-user group. • Easy to understand and easy to modify. • Portable. • Consistency in Input/Output behavior. • User documentation. Analysis of Algorithms
Optimization • Optimization on CPU-Time: Consider a network security assessment tool as a real-time application. The application works like a security scanner protocol designed to audit, monitor, and correct all aspects of network security. Real-time processing of the intercepted network packets containing inspection information requires faster data processing. Besides, such a process should generate some auditing information. • Optimization on Memory:Developing programs that do not fit into the memory space available on your systems is often quite a bit demanding. Kernel level processing of the network packets requires kernel memory optimization and a powerful and failsafe memory management capability. • Providing Run-time Continuity:Extensive machine-level optimization is a major requirement for continuously running programs, such as the security scanner daemons. • Reliability and Correctness:One of the inevitable efficiency requirements is the absolute reliability. The second important efficiency factor is correctness. That is, your program should do exactly what it is supposed to do. Choosing and implementing a reliable inspection methodology should be done with precision. • Optimization on Programmer’s Time:How efficient a programmer works depends on the choice of team policy and developmen tool selection. Analysis of Algorithms
CodingEfficiency: Unstructured Code /Efficient Programming/S. Kondakci-1999 Analysis of Algorithms
CodingEfficiency: Structured Code /Efficient Programming/S. Kondakci-1999 Analysis of Algorithms
Protecting Against Run-time Errors • Illegal pointer operations. • Array subscript out of bound. • Endless loops may cause stacks grow into the heap area. • Presentational errors, such as network byte order, number conversions, division by zero, undefined results, e.g., tan(90) = undefined. • Trying to write over the kernel’s text area, or the data area. • Referencing objects declared as prototype but not defined. • Performing operations on a pointer pointing at NULL. • Operating system weaknesses. Analysis of Algorithms
Assertions A general pitfall:making assumptions that turn out not to be justified. Most of the mistakes arise from simply misunderstanding the interaction between various pieces of code The assertion rulestates that you should always express yourself boldly or forcefully of the fact that there are some other things that you have not covered clear enough yet. Any assumptions you make in writing your programs should be documented somewhere in the code itself, particularly if you know or expect the assumption to be false in other environments. Analysis of Algorithms
Does the Machine Understand Your Assumptions? Remember those assumptions are yours:They should be presented to the machine by any means that you are supposed to provide in your code. The machine will not be able to check your assumptions. This is simply a matter of including explicit checks in your code, even for things that “cannot happen”. if (p == NULL) panic(“Driver routine: p is NULL\n”); if (p->p_flags & BUSY); /* Safe to continue */ …<etcetera> ASSERT(p !=NULL); If (p->p_flags & BUSY); /* Safe to continue */ …<etcetera> … Analysis of Algorithms
Guidelines for the implementation • Protect input parameters using call-by-value. • Avoid global variables and functions with side effects. • Make all temporary variables local to functions where they are used. • Never halt or sleep in a function. Spawn a dedicated function if necessary. • Avoid producing output within a function unless the sole purpose of the function is output. • Where appropriate use return values to return the status of function calls. • Avoid confusing programming tricks. • Always strive for simplicity and clarity. Never sacrifice clarity of expression for cleverness of expression. • If any keep your assertions local to your code. • Never sacrifice clarity of expression for minor reductions in execution time. Analysis of Algorithms
Debugging and Tracing Making use of the preprocessor can allow you to incorporate many debugging aids in your module, for instance, the driver module. Later, in the production version these debugging aids can be removed. #ifdef DEBUG #define TRACE_OPEN (debugging && 0x01) #define TRACE_CLOSE (debugging && 0x02) #define TRACE_READ (debugging && 0x04) #define TRACE_WRITE (debugging && 0x08) int debugging = -1; /* enable all traces output */ #else #define TRACE_OPEN 0 #define TRACE_CLOSE 0 #define TRACE_READ 0 #define TRACE_WRITE 0 #endif ... Analysis of Algorithms
Tracing: Later in the Program Later, in the code the output of the trace information can be done by a manner similar to this: if (TRACE_READ) printf(‘’Device driver read, Packet number (%d) \n’’,pack_no); … <etcetera>… Analysis of Algorithms
Checking Programs With lint (Unix) The lint utility is intended to verify some facets of a C program, such as its potential portability. lint derives from the idea of picking the “fluff” out of a C program. It does this, by advising on C constructs (including functions) and usage which might turn out to be ‘bugs’, portability problems, inconsistent declarations, bad function and argument types, or dead code. See the manual section lint(1)for further explanations. Analysis of Algorithms
Now, Lint’ing $ lint –hxa mytest.c (8) warning: loop not entered at top (8) warning: constant in conditional context variable unused in function (3) z in main implicitly declared to return int (10) printf declaration unused in block (5) duble function returns value, which is always ignored printf Analysis of Algorithms
Test Coverage Analysis Yet another tool born for execution tracing and analysis of programscalled tcov,it can be used to trace and analyze a source code to report a coverage test. tcov does this by analysing the source code step-by-step. The extra code is generated by giving the –xa option to the compiler command, i.e., $ gcc -xa -o src src.c The –xa option invokes a runtime recording mechanism that creates a .d file for every .c file. The .dfile accumulates execution data for the corresponding source file. The tcov utility can then be run on the source file to generate statistics about the program. The following example source file, getmygid.c, is analysed as: $ cc -xa -o getmygid getmygid.c $ tcov -a getmygid.c $ ls –l getmy???* -rwxr-xr-x 1 staff 25120 Feb 11 12:07 getmygid -rw------- 1 staff 519 Sep 9 1994 getmygid.c -rw-r--r-- 1 staff 9 Feb 11 12:07 getmygid.d -rw-r--r-- 1 staff 1025 Feb 11 12:08 getmygid.tcov Analysis of Algorithms
Example: getmygid.c $ cat getmygid.c #include <stdio.h> char *msg = "I am sorry I cannot tell you everything" ; int gid,egid; int uid,euid, pid ,ppid, i; int main() { gid = getgid(); if (gid >= 0) printf("1- My GID is: %d\n", gid); egid = getegid(); if (egid >=0 ) printf("2- My EGID is: %d\n", egid); uid = getuid(); if ( uid >=0) printf("3- My uid is: %d\n", uid); euid = geteuid(); if (euid >= 0) printf("4- My Euid is: %d\n", euid); pid = getpid(); if ( pid >=0 ) printf("5- My pid is: %d\n", pid); ppid = getppid(); if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid); prt_msg("We came to end!!!"); return 0; prt_msg(msg); } prt_msg(char *mesg){ printf("%s \n", mesg); } Analysis of Algorithms
Tcov’ing getmygid.c $ cat getmygid.tcov ##### -> #include <stdio.h> ##### -> char *msg = "I am sorry I cannot tell you everything" ; ##### -> ##### -> int gid,egid; ##### -> int uid,euid, pid ,ppid, i; ##### -> int main() ##### -> { 2 -> gid = getgid(); 2 -> if (gid >= 0) printf("1- My GID is: %d\n", gid); 2 -> egid = getegid(); 2 -> if (egid >=0 ) printf("2- My EGID is: %d\n", egid); 2 -> uid = getuid(); 2 -> if ( uid >=0) printf("3- My uid is: %d\n", uid); 2 -> euid = geteuid(); 2 -> if (euid >= 0) printf("4- My Euid is: %d\n", euid); 2 -> pid = getpid(); 2 -> if ( pid >=0 ) printf("5- My pid is: %d\n", pid); 2 -> ppid = getppid(); 2 -> if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid); 2 -> prt_msg("We came to end!!!"); 2 -> return 0; 2 -> prt_msg(msg); 2 -> } 2 -> prt_msg(mesg) 2 -> char *mesg; 2 -> { 2 -> printf("%s \n", mesg); 2 -> } Analysis of Algorithms
Tcov’ing getmygid.c As shown, tcov(1) generates an annotated listing of the source file (getmygid.tcov), where each line is prefixed with a number indicating the count of execution of each statement on the line. Finally per line and per block statistics are shown. Top 10 Blocks Line Count 9 2 11 2 13 2 15 2 17 2 19 2 21 2 292 8 Basic blocks in this file 8 Basic blocks executed 100.00 Percent of the file executed 16 Total basic block executions 2.00 Average executions per basic block Analysis of Algorithms
Have nice break! Analysis of Algorithms
Analysis of Algorithms Input Algorithm Output An algorithm is a step-by-step procedure for solving a problem in a finite amount of time.
Running Time • Most algorithms transform input objects into output objects. • The running time of an algorithm typically grows with the input size. • Average case time is often difficult to determine. • We focus on the worst case running time. • Easier to analyze • Crucial to applications such as games, finance and robotics Analysis of Algorithms
Experimental Studies • Write a program implementing the algorithm • Run the program with inputs of varying size and composition • Use a function, like the built-in clock() function, to get an accurate measure of the actual running time • Plot the results Analysis of Algorithms
Limitations of Experiments • It is necessary to implement the algorithm, which may be difficult • Results may not be indicative of the running time on other inputs not included in the experiment. • In order to compare two algorithms, the same hardware and software environments must be used Analysis of Algorithms
Theoretical Analysis • Uses a high-level description of the algorithm instead of an implementation • Characterizes running time as a function of the input size, n. • Takes into account all possible inputs • Allows us to evaluate the speed of an algorithm independent of the hardware/software environment Analysis of Algorithms
Example: find max element of an array AlgorithmarrayMax(A, n) Inputarray A of n integers Outputmaximum element of A currentMaxA[0] fori1ton 1do ifA[i] currentMaxthen currentMaxA[i] returncurrentMax Pseudocode • High-level description of an algorithm • More structured than English prose • Less detailed than a program • Preferred notation for describing algorithms • Hides program design issues Analysis of Algorithms
Control flow if…then… [else…] while…do… repeat…until… for…do… Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input… Output… Method/Function call method (arg [, arg…]) Return value returnexpression Expressions Assignment(like in C++) Equality testing(like in C++) n2 Superscripts and other mathematical formatting allowed Pseudocode Details Analysis of Algorithms
2 1 0 The Random Access Machine (RAM) Model • A CPU • A potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character • Memory cells are numbered and accessing any cell in memory takes unit time. Analysis of Algorithms
Basic computations performed by an algorithm Identifiable in pseudocode Largely independent from the programming language Exact definition not important Assumed to take a constant amount of time in the RAM model Examples: Evaluating an expression Assigning a value to a variable Indexing into an array Calling a method Returning from a method Primitive Operations Analysis of Algorithms
By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size AlgorithmarrayMax(A, n) # operations currentMaxA[0] 2 fori1ton 1do 2+n ifA[i] currentMaxthen 2(n 1) currentMaxA[i] 2(n 1) { increment counter i } 2(n 1) returncurrentMax 1 Total 7n 1 Counting Primitive Operations Analysis of Algorithms
Algorithm arrayMax executes 7n 1 primitive operations in the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be worst-case time of arrayMax.Thena (7n 1) T(n)b(7n 1) Hence, the running time T(n) is bounded by two linear functions Estimating Running Time Analysis of Algorithms
Growth Rate of Running Time • Changing the hardware/ software environment • Affects T(n) by a constant factor, but • Does not alter the growth rate of T(n) • The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax Analysis of Algorithms
Growth Rates • Growth rates of functions: • Linear n • Quadratic n2 • Cubic n3 • In a log-log chart, the slope of the line corresponds to the growth rate of the function Analysis of Algorithms
Constant Factors • The growth rate is not affected by • constant factors or • lower-order terms • Examples • 102n+105is a linear function • 105n2+ 108nis a quadratic function Analysis of Algorithms
Big-Oh Notation • Given functions f(n) and g(n), we say that f(n) is O(g(n))if there are positive constantsc and n0 such that f(n)cg(n) for n n0 • Example: 2n+10 is O(n) • 2n+10cn • (c 2) n 10 • n 10/(c 2) • Pick c = 3 and n0 = 10 Analysis of Algorithms
Big-Oh Example • Example: the function n2is not O(n) • n2cn • n c • The above inequality cannot be satisfied since c must be a constant Analysis of Algorithms
More Big-Oh Examples • 7n-2 7n-2 is O(n) need c > 0 and n0 1 such that 7n-2 c•n for n n0 this is true for c = 7 and n0 = 1 • 3n3 + 20n2 + 5 3n3 + 20n2 + 5 is O(n3) need c > 0 and n0 1 such that 3n3 + 20n2 + 5 c•n3 for n n0 this is true for c = 4 and n0 = 21 • 3 log n + log log n 3 log n + log log n is O(log n) need c > 0 and n0 1 such that 3 log n + log log n c•log n for n n0 this is true for c = 4 and n0 = 2 Analysis of Algorithms
Big-Oh and Growth Rate • The big-Oh notation gives an upper bound on the growth rate of a function • The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) • We can use the big-Oh notation to rank functions according to their growth rate Analysis of Algorithms
Big-Oh Rules • If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., • Drop lower-order terms • Drop constant factors • Use the smallest possible class of functions • Say “2n is O(n)”instead of “2n is O(n2)” • Use the simplest expression of the class • Say “3n+5 is O(n)”instead of “3n+5 is O(3n)” Analysis of Algorithms
Asymptotic Algorithm Analysis • The asymptotic analysis of an algorithm determines the running time in big-Oh notation • To perform the asymptotic analysis • We find the worst-case number of primitive operations executed as a function of the input size • We express this function with big-Oh notation • Example: • We determine that algorithm arrayMax executes at most 7n 1 primitive operations • We say that algorithm arrayMax “runs in O(n) time” • Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Analysis of Algorithms
Computing Prefix Averages • We further illustrate asymptotic analysis with two algorithms for prefix averages • The i-th prefix average of an array X is average of the first (i+ 1) elements of X: A[i]= (X[0] +X[1] +… +X[i])/(i+1) Analysis of Algorithms
Prefix Averages (Quadratic) • The following algorithm computes prefix averages in quadratic time by applying the definition AlgorithmprefixAverages1(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n fori0ton 1do n sX[0] n forj1toido 1 + 2 + …+ (n 1) ss+X[j] 1 + 2 + …+ (n 1) A[i]s/(i+ 1)n returnA 1 Analysis of Algorithms
The running time of prefixAverages1 isO(1 + 2 + …+ n) The sum of the first n integers is n(n+ 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Arithmetic Progression Analysis of Algorithms
Prefix Averages (Linear) • The following algorithm computes prefix averages in linear time by keeping a running sum AlgorithmprefixAverages2(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n s 0 1 fori0ton 1do n ss+X[i] n A[i]s/(i+ 1)n returnA 1 • Algorithm prefixAverages2 runs in O(n) time Analysis of Algorithms
Computing Spans • We show how to use a stack as an auxiliary data structure in an algorithm • Given an an array X, the span S[i] of X[i] is the maximum number of consecutive elements X[j] immediately preceding X[i] and such that X[j] X[i] • Spans have applications to financial analysis • E.g., stock at 52-week high X S Analysis of Algorithms
Quadratic Algorithm Algorithmspans1(X, n) Inputarray X of n integers Outputarray S of spans of X # S new array of n integers n fori0ton 1do n s 1n while s i X[i - s]X[i]1 + 2 + …+ (n 1) ss+ 11 + 2 + …+ (n 1) S[i]sn returnS 1 • Algorithm spans1 runs in O(n2) time Analysis of Algorithms
Have nice break! Analysis of Algorithms
Recursion Recursion = a function calls itself as a function for unknown times. We call this recursive call for (i = 1 ; i <= n-1; i++) sum = sum +1; int sum(int n) { if (n <= 1) return 1 else return (n + sum(n-1)); } Analysis of Algorithms
Recursive function int f( int x ) { if( x == 0 ) return 0; else return 2 * f( x - 1 ) + x * x; } Analysis of Algorithms