250 likes | 408 Views
Herbert G. Mayer, PSU CS Status 7/4/2013. CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity. Syllabus. Thoughts on Complexity Hard to Understand Code? Program Complexity Complex vs. Hard Halstead Program Metrics McCabe Cyclomatic Number
E N D
Herbert G. Mayer, PSU CS Status 7/4/2013 CS 410 / 510Mastery in ProgrammingChapter 3Program and Language Complexity
Syllabus • Thoughts on Complexity • Hard to Understand Code? • Program Complexity • Complex vs. Hard • Halstead Program Metrics • McCabe Cyclomatic Number • Cyclomatic Number Samples • References
Thoughts on Complexity • ‘Complexity’ as used in this class: • Refers to the number of different paths of execution through a given program, dictated by flow of control; synonym: convoluted • Or refers a degree of difficulty of expressing some algorithm via a string of symbols –i.e. the source program; synonym: hard • Some hard to compute functions are easy to code and understand, once invented • E.g. R. E. Tarjan’s SCC algorithm, or Newton’s square-root formula • Complexity, as used here, does not mean: • “intractable to compute”, such as NP-complete problems requiring too much compute power to ever terminate in human time • Complexity also does not mean: • “hard to understand”, as may be the case with obfuscated programming styles; or poorly written code • Synonym for such a type of “complex” may be: difficult to read
Hard to Understand C Code? #include <stdio.h> int a[ 1 ]; // just to have an array to index int p( char arg ) { // p printf( "%c", arg ); return 0; // no array bounds violation! } //end p int main( ) { // main a[ p( 'a' ) ] = a[ p( 'b' ) ] = a[ p( 'c' ) ] = a[ p( 'd' ) ]; printf( "\n" ); return 0; } //end main
Hard to Understand Code? • Output using PSU Unix C compiler is: a b c d • Is this correct? If not, what should output be? • Is this assignment-statement rule respected in the used C++ implementation: • to execute the right-hand side first? • Other outputs feasible, according to rules C++ or Java or C# ?
Hard to Understand, Not Complex #include <stdio.h> #define MAX 7 // 7 redundant? Discuss! int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 }; void p() { // p for( inti = 0; i < MAX; i++ ) { printf( " a[%d] = %d\n", i, a[ i ] ); } //end for printf( "\n" ); } //end p int main() { // main int x = 99; p(); a[ x = 3 ] = a[ x = 5 ] = x = 6; p(); } //end main
Hard to Understand, Not Complex • a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 3 a[4] = 4 a[5] = 5 a[6] = 6 a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 6 a[4] = 4 a[5] = 6 • a[6] = 6 • x ends up being = 6 on [most] C++ run-time systems
Program Complexity • Some computable problems are hard, NP-hard, complex, or hard-to-understand! • Assuming an experienced designer and programmer: • Some problems are laborious to solve; they are “complex” due to amount of work • Others are hard, due to elusiveness of a solution; just try to find a better SCC!!! • Yet others are not solvable; e.g. non computable functions, e.g. Halting Problem [10] • What is program complexity? • Is a large program complex, i.e. one with many lines of code (LOC)? • More complicated code? • Spaghetti code? Labels? Computable labels? Gotos? Poor naming conventions? • Recursive functions? • What unit-of-measure does complexity have? • Time to run? • Number of different paths through control-flow graph? • Space for memory locations needed to run? • Number of processors needed to solve computation? • Number of iterations for suitable solution? E.g. number of digits for π • Degree of “mental hardness” to identify a solution? E.g. in the chess game? • V(G) by McCabe is a stab at a unit of complexity. But will it be universally acceptable?
Program Complexity • Programmatic solution for “chess” is hard or complex or both? • Safely: A complete and correct chess program is hard to code • Yet the rules are simple and relatively few • And it has been solved programmatically to the grand-master level • Kasparov lost to “Deep Blue” in a Tournament in game 1 in 1996, overall competition ended up in a tie in 1997 [8] • Degree of difficulty for finding a solution quantifies complexity! • For example, solving Sudoku? • Some problems seem not hard, yet the number of special cases renders a solution virtually intractable • E.g. US tax code [9]; contains about 9,800 different sections; ~75,000 pages • Could be simpler and fairer, even equally applicable to all citizens • But instead is highly complex, due to “special cases” and requires experts to give definitive answers; has exceptions for individual tax payers! • Numerous CS attempts to formalize complexity, unit, computability • We cover 2 very briefly: Halstead’s and McCabe’s
Complex vs. Hard • Complex is to be interpreted as “Mathematically difficult to find a correct algorithm!” • E.g. find an algorithm to identify all strongly-connected components in a graph: SCC • Hard is to be interpreted as “Very much work to compute the solution”, with the algorithm being not hard • E.g. compute the shortest path for a Travelling Salesman’s n stopping points • Might take so long that we are no longer interested in the solution • Instead: use heuristic provably no worse than x times the best solution • An incorrect solution, is always easy to compute
Halstead Program Metrics • Measures a specific program’s complexity • Metrics developed by the late Maurice Halstead • To directly quantify complexity of any given source program • Solely from operators, operands used in source • Halstead introduced measures in 1977 • Early formal program complexity measures • [1], [2], [3] • Not formally derived, but postulated • Halstead metrics carry an element of arbitrariness • Lack scientific proof! No formal derivation of the rules!
Halstead Program Metrics • Halstead’s metrics count operators and operands in source code of program being analyzed • number of unique (distinct) operators (n1) • number of unique (distinct) operands (n2) • total number of operators (N1) • total number of operands (N2) • Number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated during lexical analysis of source program • Other Halstead measures are derived from these 4 units • but without proof or scientific derivation! • intuition of developer was used as the basis for deriving the measures • Halstead intended to provide formal proofs; but he died!
Halstead Program Metrics • Operands • Literals, AKA constants; e.g. 0, 1000, “hello” • User defined identifiers for values, AKA symbolic constants, e.g. MAX is an operand in: #define MAX 5 • Reserved keywords that denote value, e.g. NIL • Declarations like #define MAX 5 less obvious • Depending on language, some language-defined type specifiers are treated as operands, e.g. in C++ char, int, double
Halstead Program Metrics Operators • Common arithmetic symbols, e.g. + - / * ^ % • Other arithmetic symbols, e.g. ( and ) • Symbols for boolean operations, e.g. > >= < <= != && || • Symbols for all kinds of operations, including cat for concatenation in some languages • Reserved keywords, e.g. or, or else, and, and then, xor • Function names, e.g. add( a, 8 ), sin( 45 ), sqrt( 3 ) • Reserved operations, e.g. try, catch, throw • Type qualifiers, e.g. const, volatile • Scope specifiers, e.g. extern, static1 1 "static” an overloaded qualifier in C for scope & storage
Halstead Program Metrics Operators that are control constructs: • if ( ... ) plus then-clause and optional else-clause • while ( ... ) • do ... • for( ; ; ) ... • catch() • return ... • switch {... }
Halstead Program Metrics Program length N, vocabulary size n, program volume V: Program length N is the sum of total number of operators and operands in the program analyzed: • N = N1 + N2 Vocabulary size n is the sum of the number of unique operators and operands: • n = n1 + n2 Program volume V: information contents of program: • V = N * log2 n
Halstead Program Metrics Difficulty level D, AKA degree of error-proneness: Level of difficulty D of program is proportional to number of unique operators n1 in program And proportional to the total number of operands N2 But with scale-factors applied to both D is postulated to be: • D = ( n1 / 2 ) * ( N2 / n2 ) • Interestingly, total number of operators N1 is not part of the formula for the difficulty level D
Halstead Program Metrics Program level L: Program level L is inverse of error-proneness • i.e. a low level program is more prone to errors than a corresponding high level program for the same computable function • L = 1 / D
Halstead Program Metrics Other measures, for you to elaborate in your paper • Effort to implement • Time to implement • Number of bugs delivered • Etc.
Cyclomatic Number • Goal of McCabe’s Cyclomatic Numbers: • To have a measure of source program complexity • To manage complexity, rather than dealing with an unknown • See [4], [6] • Builds on: • Graph theory • E.g. [7] Berge: “Graphs and Hypergraphs” • Fundamental units: • Graph G –not necessarily connected! • Number of edges: e • Number of nodes: n • Number of connected components: p • i.e. if ( p > 1 ) then G is not connected
Cyclomatic Number V • Cyclomatic number V of a graph G is called V(G) If: • e = number of edges • n = number of nodes, AKA vertices in other literature • p = number of connected components then: • V(G) = e – n + 2 * p
Cyclomatic Number Samples • Sequence of 2 statements • e = 1 • n = 2 • p = 1 • V(G) = 1 – 2 + 2 * 1 = 1 • If Statement with Then- and Else- • e = 4 • n = 4 • p = 1 • V(G) = 4 – 4 + 2 * 1 = 2 • Sequence of 4 statements • e = 3 • n = 4 • p = 1 • V(G) = 3 – 4 + 2 * 1 = 1
Cyclomatic Number of While While Loop • e = 3 • n = 3 • p = 1 • V(G) = 3 - 3 + 2 * 1 = 2
Cyclomatic Number of Program Multiple-Module program with no cross-module vertices • Main Program = M • Module A = A() • Module B = B() • V(G) = V( M U A U B ) = V(M) + V(A) + V(B) M: A: B: V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(G) = 12 – 12 + 2*3 = 6
References • Halstead metrics: http://www.verifysoft.com/en_halstead_metrics.html • Halstead’s book: Maurice Halstead, “Elements of Software Science”, Elsevier, 1977, ISBN 0444002057 • Detail on Halstead: http://www.horst-zuse.homepage.t-online.de/halstead.html • Wiki page on Cyclomatic numbers: http://en.wikipedia.org/wiki/Cyclomatic_complexity • Program complexity: http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-pap/DSS.PDF • Thomas J. McCabe, “A Complexity Measure”, IEEE Transactions on SWE, Viol. SE-2, No. 4, December 1976 • C. Berge: “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973 • Deep Blue Info: http://www.research.ibm.com/deepblue/ • Tax code info: http://www.fourmilab.ch/ustax/ustax.html • Halting Problem: http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html • Robert E. Tarjan: "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972