1 / 1

Comparison of Citation Discovery Methods

Comparison of Citation Discovery Methods. Using AMS, AGU and IEEE Journal Search Tools: AMS 2 EASI 7 IEEE 2 Wiley 4 Elsevier 7 ( can not confirm ) Total 22 Advantages: C an specify month range in search Can search in full text of document Can time order search results

tyme
Download Presentation

Comparison of Citation Discovery Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS201: PART 1 Data Structures & Algorithms S. Kondakcı

  2. Analysis of Algorithms Algorithm Input Output • An algorithmis a step-by-step procedure for • solving a problem in a finite amount of time. • Theoretical Analysis of Algorithms: • Uses a high-level description of thealgorithm instead of an implementation • Characterizes running time as afunction of the input sizen. • Takes into account all possible inputsAllows us to evaluate the speed of any design independent of its implementation. Analysis of Algorithms

  3. Program Efficiency Program efficiency: is a measure of the amount of resources required to produce desired results. Efficiency Aspects: 1) What are the important resources we should try to optimize? 2) Where are the important efficiency gains to be made? 3) How important is efficiency in the first place? Analysis of Algorithms

  4. Efficiency Today • User Efficiency.The amount of time and effort users will spend to learn how o use the program, how to prepare the data, how to configure and customize the program, and how to interpret and use the output. • Maintenance Efficiency. The amount of time and effort maintenance group will spend reading a program and its technical documentation in order to understand it well enough to make any necessary modifications. • Algorithmic Complexity. The inherent efficiency of the method itself, regardless of which machine we run it on or how we code it. • Coding Efficiency. This is the traditional efficiency measure. Here we are concerned with how much processor time and memory space a computer program requires to produce desired results. Coding efficiency is the key step towards optimal usage of machine resources. Analysis of Algorithms

  5. Programmer’s Duty • Programmers should should keep these in mind: • Correct, robust, and reliable. • Easy to use for its intended end-user group. • Easy to understand and easy to modify. • Portable. • Consistency in Input/Output behavior. • User documentation. Analysis of Algorithms

  6. Optimization • Optimization on CPU-Time: Consider a network security assessment tool as a real-time application. The application works like a security scanner protocol designed to audit, monitor, and correct all aspects of network security. Real-time processing of the intercepted network packets containing inspection information requires faster data processing. Besides, such a process should generate some auditing information. • Optimization on Memory:Developing programs that do not fit into the memory space available on your systems is often quite a bit demanding. Kernel level processing of the network packets requires kernel memory optimization and a powerful and failsafe memory management capability. • Providing Run-time Continuity:Extensive machine-level optimization is a major requirement for continuously running programs, such as the security scanner daemons. • Reliability and Correctness:One of the inevitable efficiency requirements is the absolute reliability. The second important efficiency factor is correctness. That is, your program should do exactly what it is supposed to do. Choosing and implementing a reliable inspection methodology should be done with precision. • Optimization on Programmer’s Time:How efficient a programmer works depends on the choice of team policy and developmen tool selection. Analysis of Algorithms

  7. CodingEfficiency: Unstructured Code /Efficient Programming/S. Kondakci-1999 Analysis of Algorithms

  8. CodingEfficiency: Structured Code /Efficient Programming/S. Kondakci-1999 Analysis of Algorithms

  9. Protecting Against Run-time Errors • Illegal pointer operations. • Array subscript out of bound. • Endless loops may cause stacks grow into the heap area. • Presentational errors, such as network byte order, number conversions, division by zero, undefined results, e.g., tan(90) = undefined. • Trying to write over the kernel’s text area, or the data area. • Referencing objects declared as prototype but not defined. • Performing operations on a pointer pointing at NULL. • Operating system weaknesses. Analysis of Algorithms

  10. Assertions A general pitfall:making assumptions that turn out not to be justified. Most of the mistakes arise from simply misunderstanding the interaction between various pieces of code The assertion rulestates that you should always express yourself boldly or forcefully of the fact that there are some other things that you have not covered clear enough yet. Any assumptions you make in writing your programs should be documented somewhere in the code itself, particularly if you know or expect the assumption to be false in other environments. Analysis of Algorithms

  11. Does the Machine Understand Your Assumptions? Remember those assumptions are yours:They should be presented to the machine by any means that you are supposed to provide in your code. The machine will not be able to check your assumptions. This is simply a matter of including explicit checks in your code, even for things that “cannot happen”. if (p == NULL) panic(“Driver routine: p is NULL\n”); if (p->p_flags & BUSY); /* Safe to continue */ …<etcetera> ASSERT(p !=NULL); If (p->p_flags & BUSY); /* Safe to continue */ …<etcetera> … Analysis of Algorithms

  12. Guidelines for the implementation • Protect input parameters using call-by-value. • Avoid global variables and functions with side effects. • Make all temporary variables local to functions where they are used. • Never halt or sleep in a function. Spawn a dedicated function if necessary. • Avoid producing output within a function unless the sole purpose of the function is output. • Where appropriate use return values to return the status of function calls. • Avoid confusing programming tricks. • Always strive for simplicity and clarity. Never sacrifice clarity of expression for cleverness of expression. • If any keep your assertions local to your code. • Never sacrifice clarity of expression for minor reductions in execution time. Analysis of Algorithms

  13. Debugging and Tracing Making use of the preprocessor can allow you to incorporate many debugging aids in your module, for instance, the driver module. Later, in the production version these debugging aids can be removed. #ifdef DEBUG #define TRACE_OPEN (debugging && 0x01) #define TRACE_CLOSE (debugging && 0x02) #define TRACE_READ (debugging && 0x04) #define TRACE_WRITE (debugging && 0x08) int debugging = -1; /* enable all traces output */ #else #define TRACE_OPEN 0 #define TRACE_CLOSE 0 #define TRACE_READ 0 #define TRACE_WRITE 0 #endif ... Analysis of Algorithms

  14. Tracing: Later in the Program Later, in the code the output of the trace information can be done by a manner similar to this: if (TRACE_READ) printf(‘’Device driver read, Packet number (%d) \n’’,pack_no); … <etcetera>… Analysis of Algorithms

  15. Checking Programs With lint (Unix) The lint utility is intended to verify some facets of a C program, such as its potential portability. lint derives from the idea of picking the “fluff” out of a C program. It does this, by advising on C constructs (including functions) and usage which might turn out to be ‘bugs’, portability problems, inconsistent declarations, bad function and argument types, or dead code. See the manual section lint(1)for further explanations. Analysis of Algorithms

  16. Now, Lint’ing $ lint –hxa mytest.c (8) warning: loop not entered at top (8) warning: constant in conditional context variable unused in function (3) z in main implicitly declared to return int (10) printf declaration unused in block (5) duble function returns value, which is always ignored printf Analysis of Algorithms

  17. Test Coverage Analysis Yet another tool born for execution tracing and analysis of programscalled tcov,it can be used to trace and analyze a source code to report a coverage test. tcov does this by analysing the source code step-by-step. The extra code is generated by giving the –xa option to the compiler command, i.e., $ gcc -xa -o src src.c The –xa option invokes a runtime recording mechanism that creates a .d file for every .c file. The .dfile accumulates execution data for the corresponding source file. The tcov utility can then be run on the source file to generate statistics about the program. The following example source file, getmygid.c, is analysed as: $ cc -xa -o getmygid getmygid.c $ tcov -a getmygid.c $ ls –l getmy???* -rwxr-xr-x 1 staff 25120 Feb 11 12:07 getmygid -rw------- 1 staff 519 Sep 9 1994 getmygid.c -rw-r--r-- 1 staff 9 Feb 11 12:07 getmygid.d -rw-r--r-- 1 staff 1025 Feb 11 12:08 getmygid.tcov Analysis of Algorithms

  18. Example: getmygid.c $ cat getmygid.c #include <stdio.h> char *msg = "I am sorry I cannot tell you everything" ; int gid,egid; int uid,euid, pid ,ppid, i; int main() { gid = getgid(); if (gid >= 0) printf("1- My GID is: %d\n", gid); egid = getegid(); if (egid >=0 ) printf("2- My EGID is: %d\n", egid); uid = getuid(); if ( uid >=0) printf("3- My uid is: %d\n", uid); euid = geteuid(); if (euid >= 0) printf("4- My Euid is: %d\n", euid); pid = getpid(); if ( pid >=0 ) printf("5- My pid is: %d\n", pid); ppid = getppid(); if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid); prt_msg("We came to end!!!"); return 0; prt_msg(msg); } prt_msg(char *mesg){ printf("%s \n", mesg); } Analysis of Algorithms

  19. Tcov’ing getmygid.c $ cat getmygid.tcov ##### -> #include <stdio.h> ##### -> char *msg = "I am sorry I cannot tell you everything" ; ##### -> ##### -> int gid,egid; ##### -> int uid,euid, pid ,ppid, i; ##### -> int main() ##### -> { 2 -> gid = getgid(); 2 -> if (gid >= 0) printf("1- My GID is: %d\n", gid); 2 -> egid = getegid(); 2 -> if (egid >=0 ) printf("2- My EGID is: %d\n", egid); 2 -> uid = getuid(); 2 -> if ( uid >=0) printf("3- My uid is: %d\n", uid); 2 -> euid = geteuid(); 2 -> if (euid >= 0) printf("4- My Euid is: %d\n", euid); 2 -> pid = getpid(); 2 -> if ( pid >=0 ) printf("5- My pid is: %d\n", pid); 2 -> ppid = getppid(); 2 -> if ( ppid >= 0) printf("6- My ppid is: %d\n", ppid); 2 -> prt_msg("We came to end!!!"); 2 -> return 0; 2 -> prt_msg(msg); 2 -> } 2 -> prt_msg(mesg) 2 -> char *mesg; 2 -> { 2 -> printf("%s \n", mesg); 2 -> } Analysis of Algorithms

  20. Tcov’ing getmygid.c As shown, tcov(1) generates an annotated listing of the source file (getmygid.tcov), where each line is prefixed with a number indicating the count of execution of each statement on the line. Finally per line and per block statistics are shown. Top 10 Blocks Line Count 9 2 11 2 13 2 15 2 17 2 19 2 21 2 292 8 Basic blocks in this file 8 Basic blocks executed 100.00 Percent of the file executed 16 Total basic block executions 2.00 Average executions per basic block Analysis of Algorithms

  21. Have nice break! Analysis of Algorithms

  22. Analysis of Algorithms Input Algorithm Output An algorithm is a step-by-step procedure for solving a problem in a finite amount of time.

  23. Running Time • Most algorithms transform input objects into output objects. • The running time of an algorithm typically grows with the input size. • Average case time is often difficult to determine. • We focus on the worst case running time. • Easier to analyze • Crucial to applications such as games, finance and robotics Analysis of Algorithms

  24. Experimental Studies • Write a program implementing the algorithm • Run the program with inputs of varying size and composition • Use a function, like the built-in clock() function, to get an accurate measure of the actual running time • Plot the results Analysis of Algorithms

  25. Limitations of Experiments • It is necessary to implement the algorithm, which may be difficult • Results may not be indicative of the running time on other inputs not included in the experiment. • In order to compare two algorithms, the same hardware and software environments must be used Analysis of Algorithms

  26. Theoretical Analysis • Uses a high-level description of the algorithm instead of an implementation • Characterizes running time as a function of the input size, n. • Takes into account all possible inputs • Allows us to evaluate the speed of an algorithm independent of the hardware/software environment Analysis of Algorithms

  27. Example: find max element of an array AlgorithmarrayMax(A, n) Inputarray A of n integers Outputmaximum element of A currentMaxA[0] fori1ton  1do ifA[i]  currentMaxthen currentMaxA[i] returncurrentMax Pseudocode • High-level description of an algorithm • More structured than English prose • Less detailed than a program • Preferred notation for describing algorithms • Hides program design issues Analysis of Algorithms

  28. Control flow if…then… [else…] while…do… repeat…until… for…do… Indentation replaces braces Method declaration Algorithm method (arg [, arg…]) Input… Output… Method/Function call method (arg [, arg…]) Return value returnexpression Expressions Assignment(like  in C++) Equality testing(like  in C++) n2 Superscripts and other mathematical formatting allowed Pseudocode Details Analysis of Algorithms

  29. 2 1 0 The Random Access Machine (RAM) Model • A CPU • A potentially unbounded bank of memory cells, each of which can hold an arbitrary number or character • Memory cells are numbered and accessing any cell in memory takes unit time. Analysis of Algorithms

  30. Basic computations performed by an algorithm Identifiable in pseudocode Largely independent from the programming language Exact definition not important Assumed to take a constant amount of time in the RAM model Examples: Evaluating an expression Assigning a value to a variable Indexing into an array Calling a method Returning from a method Primitive Operations Analysis of Algorithms

  31. By inspecting the pseudocode, we can determine the maximum number of primitive operations executed by an algorithm, as a function of the input size AlgorithmarrayMax(A, n) # operations currentMaxA[0] 2 fori1ton 1do 2+n ifA[i]  currentMaxthen 2(n 1) currentMaxA[i] 2(n 1) { increment counter i } 2(n 1) returncurrentMax 1 Total 7n 1 Counting Primitive Operations Analysis of Algorithms

  32. Algorithm arrayMax executes 7n 1 primitive operations in the worst case. Define: a = Time taken by the fastest primitive operation b = Time taken by the slowest primitive operation Let T(n) be worst-case time of arrayMax.Thena (7n 1) T(n)b(7n 1) Hence, the running time T(n) is bounded by two linear functions Estimating Running Time Analysis of Algorithms

  33. Growth Rate of Running Time • Changing the hardware/ software environment • Affects T(n) by a constant factor, but • Does not alter the growth rate of T(n) • The linear growth rate of the running time T(n) is an intrinsic property of algorithm arrayMax Analysis of Algorithms

  34. Growth Rates • Growth rates of functions: • Linear  n • Quadratic  n2 • Cubic  n3 • In a log-log chart, the slope of the line corresponds to the growth rate of the function Analysis of Algorithms

  35. Constant Factors • The growth rate is not affected by • constant factors or • lower-order terms • Examples • 102n+105is a linear function • 105n2+ 108nis a quadratic function Analysis of Algorithms

  36. Big-Oh Notation • Given functions f(n) and g(n), we say that f(n) is O(g(n))if there are positive constantsc and n0 such that f(n)cg(n) for n n0 • Example: 2n+10 is O(n) • 2n+10cn • (c 2) n  10 • n  10/(c 2) • Pick c = 3 and n0 = 10 Analysis of Algorithms

  37. Big-Oh Example • Example: the function n2is not O(n) • n2cn • n c • The above inequality cannot be satisfied since c must be a constant Analysis of Algorithms

  38. More Big-Oh Examples • 7n-2 7n-2 is O(n) need c > 0 and n0 1 such that 7n-2  c•n for n  n0 this is true for c = 7 and n0 = 1 • 3n3 + 20n2 + 5 3n3 + 20n2 + 5 is O(n3) need c > 0 and n0 1 such that 3n3 + 20n2 + 5  c•n3 for n  n0 this is true for c = 4 and n0 = 21 • 3 log n + log log n 3 log n + log log n is O(log n) need c > 0 and n0 1 such that 3 log n + log log n  c•log n for n  n0 this is true for c = 4 and n0 = 2 Analysis of Algorithms

  39. Big-Oh and Growth Rate • The big-Oh notation gives an upper bound on the growth rate of a function • The statement “f(n) is O(g(n))” means that the growth rate of f(n) is no more than the growth rate of g(n) • We can use the big-Oh notation to rank functions according to their growth rate Analysis of Algorithms

  40. Big-Oh Rules • If is f(n) a polynomial of degree d, then f(n) is O(nd), i.e., • Drop lower-order terms • Drop constant factors • Use the smallest possible class of functions • Say “2n is O(n)”instead of “2n is O(n2)” • Use the simplest expression of the class • Say “3n+5 is O(n)”instead of “3n+5 is O(3n)” Analysis of Algorithms

  41. Asymptotic Algorithm Analysis • The asymptotic analysis of an algorithm determines the running time in big-Oh notation • To perform the asymptotic analysis • We find the worst-case number of primitive operations executed as a function of the input size • We express this function with big-Oh notation • Example: • We determine that algorithm arrayMax executes at most 7n 1 primitive operations • We say that algorithm arrayMax “runs in O(n) time” • Since constant factors and lower-order terms are eventually dropped anyhow, we can disregard them when counting primitive operations Analysis of Algorithms

  42. Computing Prefix Averages • We further illustrate asymptotic analysis with two algorithms for prefix averages • The i-th prefix average of an array X is average of the first (i+ 1) elements of X: A[i]= (X[0] +X[1] +… +X[i])/(i+1) Analysis of Algorithms

  43. Prefix Averages (Quadratic) • The following algorithm computes prefix averages in quadratic time by applying the definition AlgorithmprefixAverages1(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n fori0ton 1do n sX[0] n forj1toido 1 + 2 + …+ (n 1) ss+X[j] 1 + 2 + …+ (n 1) A[i]s/(i+ 1)n returnA 1 Analysis of Algorithms

  44. The running time of prefixAverages1 isO(1 + 2 + …+ n) The sum of the first n integers is n(n+ 1) / 2 There is a simple visual proof of this fact Thus, algorithm prefixAverages1 runs in O(n2) time Arithmetic Progression Analysis of Algorithms

  45. Prefix Averages (Linear) • The following algorithm computes prefix averages in linear time by keeping a running sum AlgorithmprefixAverages2(X, n) Inputarray X of n integers Outputarray A of prefix averages of X #operations A new array of n integers n s 0 1 fori0ton 1do n ss+X[i] n A[i]s/(i+ 1)n returnA 1 • Algorithm prefixAverages2 runs in O(n) time Analysis of Algorithms

  46. Computing Spans • We show how to use a stack as an auxiliary data structure in an algorithm • Given an an array X, the span S[i] of X[i] is the maximum number of consecutive elements X[j] immediately preceding X[i] and such that X[j]  X[i] • Spans have applications to financial analysis • E.g., stock at 52-week high X S Analysis of Algorithms

  47. Quadratic Algorithm Algorithmspans1(X, n) Inputarray X of n integers Outputarray S of spans of X # S new array of n integers n fori0ton 1do n s 1n while s i X[i - s]X[i]1 + 2 + …+ (n 1) ss+ 11 + 2 + …+ (n 1) S[i]sn returnS 1 • Algorithm spans1 runs in O(n2) time Analysis of Algorithms

  48. Have nice break! Analysis of Algorithms

  49. Recursion Recursion = a function calls itself as a function for unknown times. We call this recursive call for (i = 1 ; i <= n-1; i++) sum = sum +1; int sum(int n) { if (n <= 1) return 1 else return (n + sum(n-1)); } Analysis of Algorithms

  50. Recursive function int f( int x ) { if( x == 0 ) return 0; else return 2 * f( x - 1 ) + x * x; } Analysis of Algorithms

More Related