1 / 18

Program Optimization

Program Optimization. CSSE 514 - Programming Methods 4/10/01. Program Optimization . Overview When to optimize Design levels Methodology Tradeoffs Errors and pitfalls Common sources of inefficiency Example: code tuning Initial program Logical optimizations

santa
Download Presentation

Program Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program Optimization CSSE 514 - Programming Methods 4/10/01

  2. Program Optimization • Overview • When to optimize • Design levels • Methodology • Tradeoffs • Errors and pitfalls • Common sources of inefficiency • Example: code tuning • Initial program • Logical optimizations • Removing common subexpressions • Strength reduction • Data structure transforms • Code motion • Lazy evaluation • Summary Reference: Jon Bentley, Writing Efficient Programs, Prentice-Hall, 1982

  3. Overview • Goal is to optimize (minimize) • Time • Runtime • Response time • Space • Secondary storage • Main memory • Optimization strategies • Cost measured in effort, time, risk • Tradeoffs often involved • Space vs. time • Increased complexity • Optimization difficult because • Many strategies, tradeoffs • No general algorithm • Focus--must decide • What to optimize--space, time • Where to optimize

  4. When to Optimize • When the code doesn't give adequate time/space performance: • Early days of computing • Early days of desktop computers • Specialized computers (military, space) • Ubiquitous computing (as tiny computers are embedded in credit cards and other objects) • When complexity can be reduced • If performance is not an issue, aim for clarity, simplicity, maintainability • Review your first-cut code to see if it can be transformed into something simpler • Note: Solving the Year 2000 Problem can be viewed as a kind of optimization • Uses same strategy as code-tuning:semantics-preserving source-to-source transformations • Uses same tools as automatic code generation and code optimization (e.g. Refine)

  5. Design Levels • Six Design Levels (Bentley) • Program design (System structure) -- decomposition into modules • Module and routine design (Intramodular structure) -- choice of data structures and algorithms • Code tuning (Writing efficient code) -- source to source transformations • Code compilation (Translation into machine code) -- compiler may outperform human • Operating system interaction (System software) -- changing, tuning, bypassing operating system (OS) or database system (DBMS) • Hardware -- modify or purchase: microcode, faster CPU, DB machine, array processor, floating point hardware, ASICs (Application Specific Integrated Circuits) • Strategy -- optimize where change is possible and payoff is highest • Note: a 10-fold (or greater) gain may be possible at each level for a millionfold improvement

  6. Methodology • Design and implement for correctness and clarity • Include useful documentation • Use modularity for maintenance • Identity performance goals, and factor these down to individual modules • If performance unacceptable then • Monitor program to see where bottlenecks (time, space) are -- instrument code or apply tools • Revise data structures and algorithms in critical modules (level 2) • Consider redesign (level 1) • Consider solution by purchase (optimizing compiler, faster DBMS or OS, faster hardware) • If performance still not OK then • Apply source-to-source transformations (level 3) • retain original code as documentation • If still not OK then • Try lower level designs: • Recode critical modules in assembly language • Modify/tune/bypass OS or DBMS

  7. Tradeoffs • Correctness and clarity go together • Clear  well-structured, simple • Clear  provable, maintainable • Top-level strategies • Minimize interfaces, data flows • Define cohesive modules • Avoid over specification (is suboptimal OK?) • 3-way gains possible • Lower level strategies • Gains in one direction often offset by losses in other directions • Most optimization strategies increase complexity • Use correctness-preserving transforms (risk of errors due to lack of mechanical support for transformations) • Family of programs may be preferred approach (leave the tradeoffs to the user)

  8. Errors and Pitfalls • Programmers often make false assumptions: • Fewer lines means less code space or execution time (false!) • Certain operations are faster or smaller than others • May be dependent on language, compiler, or machine • Code tuning always pays off (false!) • Compiler may do it for you • Worse, your code tuning may defeat the compiler's more efficient optimizations -- you end up losing both clarity and efficiency • A pitfall: optimizing performance as you go • May introduce unneeded complexity • Micro-optimizations may cause global optimizations to be overlooked or made impossible • Detracts from other goals: correctness, readability, programmer productivity • May have little effect in final program: better to wait until program is complete

  9. Common Sources of Inefficiency • I/O: keep small files in memory, avoid using intermediate files • Formatted printing routines: may make code bigger and slower • Floating-point operations: when software used, one statement pulls in a whole library • Paging: minimize page faults by • Avoiding scattered access to memory • Keeping control flow in small regions

  10. Example: Code Tuning • Traveling salesman problem • Input -- n points on a plane • Output -- a minimal-length tour -- visit each point once • Level 2 analysis • Optimal, but infeasible algorithm examines all possible tours -- O(n!) • Feasible (but suboptimal) heuristic -- start anywhere and repeatedly advance to nearest unvisited point -- O(n2) • No feasible and optimal solution is known -- choose O(n2) solution and settle for usually near minimal results

  11. Initial Program • Data • Algorithm begin for i in 1..n [initialize visited] loop visited(i) := False end thisPt := n [start at n] visited(n) := True for i in 2..n loop [set closePt = nearest unvisited point to thisPt] [output thisPt "->" closePt] thisPt := closePt [advance] visited(closePt) := True end end

  12. Logical Optimizations • Code for inner loop (optimize performance [set closePt = nearest unvisited point] closeDist := maxreal for j in 1..n loop if not visited(j) and dist(thisPt, j) < closeDist then closeDist := dist(thisPt, j) closePt := j end end • Transform 1: not visited(j) => unvisited(j) • Effort: change declaration, 3 statements • Time: reduced slightly (potentially significant) • Space, complexity: reduced slightly • Transform 2: if a and b ... => if a and then b ... [Ada] if (a) { if (b) } ... [Java] • Effort: small, local • Time: much reduced when t(a) << t(b) and a is often false • Space, complexity: small increase • Transforms 1 & 2 are independent -- you can do in any order

  13. Removing Common Subexpressions • Result of transforms 1 & 2 on inner loop if unvisited(j) thenif dist(thisPt, j) < closeDist then closeDist := dist(thisPt, j) closePt := j end end • Transform 3: p(e) => v := e p(v) • Rule: if expression e appears more than once and has no side effects and variables not altered between evaluations then compute v := e [v is mnemonic variable] replace e in p by v end • Effort: may be hard to check for side effects • Time: decreases (small in this case) • Space: less code, extra word of data • Complexity: decreases • Good compiler may eliminate some common subexpressions

  14. Strength Reduction • Result of transform 3 with e=dist(thisPt, j) if unvisited(j) then thisDist := dist(thisPt, j) if thisDist < closePt then closeDist := thisDist closePt := j end end • Transform 4: replace dist(i, j) = sqrt(e) by distSqrd(i, j) = e where e = sqr(ptArr(i).X - ptArr(j).X) + sqr(ptArr(i).Y - ptArr(j).Y) • Rule: exploit algebraic identities, in this case, x2 < y2 <=> x < y for x, y > 0 • Effort: minimal • Time: greatly decreased • Space: slight decrease • Complexity: little change

  15. Data Structure Transforms • Transform 5: replace bitmap unvisited with integer array unvisited where unvisited(1..highPt) = unvisited points and unvisited(highPt + 1..n) = visited points in order • Effort: large rewrite of many lines • Time: small decrease (halve inner loop cycles, remove if unvisited(j) • Space: large increase  maxPts words • Complexity: increase but clear loop invariant -- introduces a level of indirection begin for i in 1..n loop unvisited(i):=i unvisited(i):=True end thisPt:=unvisited(n) thisPt:=n highPt:=n-1 unvisited(n):=False while highPt>0 <= for j in 2..n loop closeDist:=maxreal for i in 1..highPt for j in 1..n loop [compare distance] if unvisited(j) end thisPt:=unvisited(closePt) thisPt:=closePt swapUnvisited(closePt, highPt) highPt:=highPt-1 unvisited(closePt):=False end

  16. Code Motion • Transform 6: distSqrd(...) => inline code • Effort: small to large • Time: save code overhead (small) • Space: usually increases (here it decreases) • Complexity: increases (less readable) • Transform 7: move invariant expressions outside loops • Effort: small (but must verify invariance) • Time: decrease • Space: slight increase (not if statements are moved) • Complexity: little change (may reduce locality) thisX:=ptArr(thisPt).X thisY:=ptArr(thisPt).Y closeDist:=maxreal for i in 1..highPt loop thisDist:=sqr(ptArr(unvisited(i)).X - thisX) + sqr(ptArr(unvisited(i)).Y - thisY) if thisDist < closeDist then closePt:=i closeDist:=thisDist end end

  17. Lazy Evaluation • Transform 8: compute xDist first, yDist if needed • Effort: small • Time: large decrease • Space: slight increase • Complexity: increase thisDist:=sqr(ptArr(unvisited(i)).X - thisX) + sqr(ptArr(unvisited(i)).Y - thisY) if thisDist < closeDist then ... thisDist:=sqr(...X - thisX) if thisDist < closeDist * then thisDist:=thisDist + sqr(...Y - thisY) if thisDist < closeDist then ... * if xDist >= closeDist we avoid computing yDist

  18. Summary transform time/n2 savings T1: remove not visited T2: short-circuit and 47.0 T3: remove common sub- expression 45.6 1.4 *T4: remove square root 24.2 21.4 T5: convert boolean array to pointer array 21.2 3.0 *T6: put proc in line (helped by T3 & T4) *T7: move code from loop (T7 requires T6) 14.0 7.2 *T8: delay computing yDist (T8 requires T6) 8.2 5.8 Note: Choice of transform analogous to choice of chess move -- one transform makes other transforms possible * Best transforms

More Related