1 / 61

Code Tuning

Code Tuning. Chapter 25-26. Textbook & Reference. Steve McConnell. Code Complete: A Practical Handbook of Software Construction . 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26 Jon Bentley, Programming Pearls , Second Edition, Addison-Wesley, Inc., 2000. Outline.

sakura
Download Presentation

Code Tuning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Code Tuning Chapter 25-26

  2. Textbook & Reference • Steve McConnell. Code Complete: A Practical Handbook of Software Construction. 2nd Edition. Microsoft Press, 2004. Chapters 25 and 26 • Jon Bentley, Programming Pearls, Second Edition, Addison-Wesley, Inc., 2000.

  3. Outline • Options of Performance Improvement • What Is Code Tuning? • Common Sources of Inefficiency • Code Tuning Process • Code Tuning Techniques • Logic • Loops • Data Transformations • Expressions • Object Creation and Flyweight/Factory Patterns

  4. Options for Performance Improvement • Program requirements • Make sure it is a problem that needs to be solved • Boehm’s story (2000): • A system at TRW initially requiring subsecond response time led to a highly complex design and an estimated cost $100M • Further analysis determined that users would be satisfied with 4 second response 90% of the time • Program design • Some problems have to be addressed at design level • Search algorithms

  5. Options for Performance Improvement • OS interactions • Perhaps the OS routines are slow or fat • Working with external files, dynamic memory, or output device probably interact with the OS • Your compiler may generate (or your libraries may invoke) system calls you would never dream of. • Code compilation • Maybe no need to think about optimizing speed any further if the right compiler is chosen • Hardware • Sometimes buying new hardware is the cheapest and best way

  6. What Is Code Tuning? • The practice of modifying correct code in ways that make it run more efficiently • Small-scale changes that affect a single class / routine or a few lines of code • Not the most effective or easiest or cheapest way to improve performance • Notnecessarily ‘better’ code. • No one, but programmers, usually cares how tight the code is

  7. Why Is Code Tuning Appealing? • It is incredibly satisfying to reduce execution time by tweaking a few lines! • Mastering the art of writing efficient code is a rite of passage to becoming a serious programmer • Programming culture - writing micro-efficient code proves you are cool • Like a tennis player picking up a tennis ball

  8. Pareto Principle: 80/20 Rule • ‘get 80% of the result with 20% of the effort’ • Boehm (1987): 20% of a program’s routines consumes 80% of its execution time • Knuth (1971): less than 4% of a program usually accounts for more than 50% of its runtime. • Bentley (1988): a 1 KLOC program spent 80% of its time in a 5-line square root routine • Measure the code to find the hot spots and put your resources into optimizing the few percent • Write most of code in an interpreted language and rewrite the hot spots in a faster language

  9. Which One Is Faster? • a[1] = 1; a[2] = 2; a[3] = 3; a[4] = 4; a[5] = 5; a[6] = 6; a[7] = 7; a[8] = 8; a[9] = 9; a[10] = 10; • for i=1 to 10{ a[i] = i; }

  10. Old Wives’ Tales • Reducing LOC improves the speed or size of the resulting machine code – false • Previous Java example: 12.6 vs 3.23 • Certain operations are probably faster or smaller than others – false • No room for probably when talking about performance • The rule of the game change every time you change languages, compilers, versions of libraries, processor, amount of memory • ‘Fast’ is as important as ‘correct’ - false

  11. Old Wives’ Tales - cont • You should optimize as you go – false • If you strive to write the fastest and smallest possible code for each routine, your program will be fast and small? • Problematic! • It is almost impossible to identify performance bottlenecks before a program is working completely. • Programmers are very bad at guessing which 4% accounts for 50% of the execution time. • Focusing on optimization during initial development detracts from achieving other program objectives.

  12. When to Tune • Don’t optimize until you know you need to • Use a high-quality design • Make the program right • Make it modular and easily modifiable • Check performance when it is complete and correct • Complier optimizations • They might be more powerful than you expect • Compiler-optimized code may be faster than ‘tricky’ code

  13. Complier optimizations • With a good optimizing compiler, your code speed can improve 40+ percent • Many of the techniques described in the next chapter produce gains of only 15–30 percent. • Why not just write clear code and let the compiler do the work? • Here are the results of a few tests to check how much an optimizer speeded up an insertion-sort routine: • The only difference between versions of the routine was that compiler optimizations were turned off for the first compile and turned on for the second.

  14. Compiler Optimizations - cont sum=0; for (row=0; row<rowCount; row++) { for (column=0; column<columnCount; column++) { sum = sum + martix[row][column] } // expensive multiplications for access to 2D array } sum=0; elementPointer = matrix; lastElementPointer = matrix[rowCount-1][columnCount-1]+1; while (elementPoint <lastElementPointer) { sum = sum + elementPointer++; } • No improvement - digging into the assembly code turned out that the complier did the optimization.

  15. Common Sources of Inefficiency • Input/output operations • If you have a choice of working with a file in memory vs on disk, in a database, or across a network, use an in-memory structure unless space is critical. • System calls: expensive due to context switch (Saving program states, recovering the kernel states and the reverse) • Write own services if part of the function is needed • Avoid going to the system • Work with the system vendor to make the call faster • Interpreted languages • Process each instruction before creating and executing machine code

  16. Input/output operations Interpreted languages

  17. Are They Functionally Equivalent? for (column=0; column<MAX_COLUMNS; column++) { for (row=0; row<MAX_ROWS; row++) { table [row][column] = BlankTableElement(); } } for (row=0; row<MAX_ROWS; row++) { for (column=0; column<MAX_COLUMNS; column++) { table [row][column] = BlankTableElement(); } } • Each element of table is about 4k bytes long • Pages are usually at least 4K bytes in size

  18. Common Sources of Inefficiency - cont • Paging • An operation that causes the OS to swap pages of memory • It is much slower than an operation that works on only one page of memory • Errors • Leaving debugging code turned on • Forgetting to de-allocate memory • Polling non-existent devices until timeout

  19. Paging Example Assume: each row is about the same page size. for (column=0; column<MAX_COLUMNS; column++) { for (row=0; row<MAX_ROWS; row++) { table [row][column] = BlankTableElement(); } } // accessing a different row causes a page fault for (row=0; row<MAX_ROWS; row++) { for (column=0; column<MAX_COLUMNS; column++) { table [row][column] = BlankTableElement(); } }// 1000 times faster on a machine with limited memory,!

  20. If table has too many rows, every time the program accesses a different row, the operating system will have to switch memory pages. • The way the loop is structured, every single array access switches rows, which means that every single array access causes paging to disk.

  21. Relative Costs of Common Operations Measurements are sensitive to local machine environment and compiler. Measurements between C++ and Java are not directly comparable.

  22. Measurement • Performance aspects can be counterintuitive. • Experience from old machine or language does not help much. • It is not worth sacrificing clarity for a performance gamble, if it is not worth measuring to know that it is more efficient • Develop software by using well-designed code that is easy to understand and modify. • Measurements need to be precise.

  23. Code Tuning Process • If the performance is poor: • Save a working version • Measure the system to find hot spots • Determine where weak performance is from: • Go back if tuning is not appropriate • Tune the bottleneck identified • Measure each improvement at a time • If no improvement, revert to the code saved

  24. Code Tuning Techniques • Logic • Loops • Data Transformations • Expressions • Object Creation

  25. Logic • Stop testing when you know the answer • If (5<x) and (x<10) then … // Specification • Use short-circuit op: && vs. & • If (5<x) then if (x<10) then … // If the language does not support short-circuit evaluation. • Search whether a negative is present negativeInputFound = false; for (i=0; i<count; i++) { if (input [i]<0) { negativeInputFound = true; } }

  26. A better approach? • Is to stop scanning as soon as you find a negative value. • Any of these approaches would solve the problem: • Add a break statement after the negativeInputFound = true line. • If your language doesn't have break, emulate a break with a goto that goes to the first statement after the loop. • Change the for loop to a while loop, and check for negativeInputFound as well as for incrementing the loop counter past count. • Change the for loop to a while loop, put a sentinel value in the first array element after the last value entry, and simply check for a negative value in the while test. After the loop terminates, see whether the position of the first found value is in the array or one past the end.

  27. Can You Tune This Code? • Keyboard input in a word processor Select inputCharacter Case “+”, “=“ ProcessMathSymbol(inputCharacter) Case “0” to “9“ ProcessDigit(inputCharacter) Case “ ,”, “.”, “:”, “;”, “!”, “?” ProcessPunctuation(inputCharacter) Case “ ” ProcessSpace(inputCharacter) Case “A” to “Z”, “a” to “z” ProcessAlpha(inputCharacter) Case Else ProcessError(inputCharacter) End Select

  28. Logic - cont • Arrange test so that the one that is fastest and most likely to be true is performed first • If you know the likely frequency of input characters, put the most common cases first. Select inputCharacter Case “A” to “Z”, “a” to “z” ProcessAlpha(inputCharacter) Case “ ” ProcessSpace(inputCharacter) Case “ ,”, “.”, “:”, “;”, “!”, “?” ProcessPunctuation(inputCharacter) Case “0” to “9“ ProcessDigit(inputCharacter) Case “+”, “=“ ProcessMathSymbol(inputCharacter) Case Else ProcessError(inputCharacter) End Select

  29. Can You Tune This Code? … if ( ( (‘a’<=inputChar) && (inputChar <=‘z’)) || ( (‘A’<=inputChar) && (inputChar <=‘Z’))) { charType = CharacterType.Letter; } else if ( (inputChar==‘ ‘) ||(inputChar == ‘,’) || (inputChar==‘.‘) || (inputChar==‘!‘) || (inputChar==‘(‘) || (inputChar==‘)‘) || (inputChar==‘:‘) || (inputChar==‘;‘) || (inputChar==‘?‘) || (inputChar==‘-‘)) { charType = CharacterType.Punctuation; } else if ((‘0’<=inputChar) && (inputChar <=‘9’)) { charType = CharacterType.Digit; } …

  30. Logic - cont • Substitute table lookups for complicated expressions (space for time) • Computing only once and storing results in a table. • Example1: Character type: • Store the type of each character in an array that’s accessed by the character type code • charType = charTypeTable[inputChar]; • Example 2: Integer square root of integers 1..100 • Lazy evaluation – avoid doing any work until needed • For a table of 5K entries, generate the whole table at startup time vs the small percentage used

  31. Substitute table Suppose you want to assign a category number to something based on which of three groups— Groups A, B, and C—it falls into:

  32. Can You Tune This Code? for (i=0; i<count; i++) { if (sumType == SUMTYPE_NET) { netSum = netSum+amount[i]; } else { grossSum = grossSum+ amount[i]; } }

  33. Loops -Unswitching if (sumType == SUMTYPE_NET) { for (i=0; i<count; i++) { netSum = netSum+amount[i]; } } else { for (i=0; i<count; i++) { grossSum = grossSum+ amount[i]; } }

  34. Loops -Unswitching • If the decision doesn't change while the loop is executing, • you can unswitch the loop by making the decision outside the loop • putting loops inside the conditional rather than putting the conditional inside the loop

  35. Loops - Minimizing the work inside loops for (i=0; i<rateCount; i++) { netRate[i] = baseRate[i] * rates ->discounts->factors->net; } qualityDiscount = rates ->discounts->factors->net; for (i=0; i<rateCount; i++) { netRate[i] = baseRate[i] * qualityDiscount; }

  36. Can You Tune This Code found = FALSE; i=0; while ( (!found) && (i<count) ) { if (item[i] == testValue) found=TRUE; else i++; } if (found) {…} • Compound test

  37. Loops - Sentinel values • For a loop with a compound test, you can often save time by simplifying the test. • For a search loop, one can use a sentinel value • A value that is put just past the end of the search range and guaranteed to terminate

  38. Loops - Sentinel values: cont // set sentinel value, preserving the original value initialValue = item[count]; item[count] = testValue; i=0; while (item[i] !=testValue) { i++; } // 3 tests to 1 test if (i<count) {…} • Time savings: Java (44%), C#(23%) for a 100-elemenmt of integers

  39. Loops - Putting the busiest loop on the inside for (column=0; column<100; column++) { for (row =0; row<5; row++) { sum = sum + table[row][column] } } • The outer executes much more often than the inner • Each time the loop executes, it has to initialize the loop index, increment it on each pass through the loop, and check it after each pass • Total number of loop executions: 100 + 100*5 • Switching the inner and outer: 5 + 100*5

  40. Loops - Strength Reduction • Replace an expensive operation (e.g. multiplication) with a cheaper operation (e.g. addition) • for (i=0; i<=saleCount-1;i++){ commission(i) = (i+1) * revenue*basedCommission*discount } • incrementalCommission = revenue*bassCommission*discount cumulativeCommission = incrementalCommission for (i=0;i<=saleCount-1;i++){ commission(i) = cumulativeCommission cumulativeCommission = cumulativeCommission + incrementalCommission }

  41. Data Transformations • Use integers rather than floating point numbers • Use the fewest array dimensions possible • One dimensional representation of an array for (entry=0; entry<numRows*numColumns; entries++) { matrix [entry] =0; } • Time savings: C++: 11%; Java: 47%, C#: 9%

  42. Is Tuning Possible? for (discountType =0; discountType <typeCount; discountType++) { for (discountLevel=0; discountLevel<levelCount; discountLevel++){ rate[discountLevel] = rate[discountLevel] * discount[discountType]; } }

  43. Data Transformations - cont • The reference to discount[discountType] doesn't change when discountLevel changes in the inner loop. • Minimize array references for (discountType =0; discountType <typeCount; discountType++) { for (discountLevel=0; discountLevel<levelCount; discountLevel++){ rate[discountLevel] = rate[discountLevel] * discount[discountType]; } } thisDiscount = discount[discountType]; for (discountLevel=0; discountLevel<levelCount; discountLevel++){ rate[discountLevel] = rate[discountLevel] *thisDiscount; }

  44. Expressions • Use strength reduction • Replace multiplication with addition • Replace exponentiation with multiplication • Replace trigonometric routines with their trigonometric identities • Replace longlong integers with longs and ints • Replace floating point with fixed-point numbers • Replace double precision with single precision • Replace integer multiplication-by-two and division-by-two with shift operations

  45. Is Tuning Possible? • Ax2 + Bx + C. • The letters A, B, and C are coefficients, and x is a variable. • Write code to evaluate an nth-order polynomial

  46. Evaluate a polynomial – cont’

  47. Evaluate a polynomial – cont’ • (Ax + x)x + C

  48. Is Tuning Necessary? unsigned int log2(unsigned int x) { return (unsigned int) (log(x)/log(2)); }

More Related