1 / 26

Performance*

Performance*. Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster?. *The examples in these slides come from

demi
Download Presentation

Performance*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance* • Objective: To learn when and how to optimize the performance of a program. • “The first principle of optimization is don’t.” • Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster? *The examples in these slides come from Brian W. Kernighan and Rob Pike, “The Practice of Programming”, Addison-Wesley, 1999.

  2. Approach • The best strategy it to use the simplest, cleanest algorithms and data appropriate for the task. • Then measure performance to see if changes are needed. • Enable compiler options to generate the fastest possible code.

  3. Approach • Assess what changes to the program will have the most effect (profile the code). • Make changes one at a time and re-assess (always retest to verify correctness). • Consider alternative algorithms • Tune the code • Consider a lower level language (just for time sensitive components)

  4. Topics • A Bottleneck • Timing and Profiling • time and clock • algorithm analysis • prof and gprof • gcov • Concentrate on hot spots • Strategies for speed • Tuning the code

  5. A Bottleneck • isspam example from the text • Heavily used • Existing implementation not fast enough in current environment • Benchmark • Profile • Tune code • Change algorithm

  6. isspam() /* isspam: test mesg for occurrence of any pat */ int isspam(char *mesg) int i; for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) { printf ("spam: match for '%s'\n", pat [i]) ; return 1; } return 0;

  7. strstr() /* simple strstr: use strchr to look for first character */ char strstr(const char *sl, const char *s2) int n; n = strlen(s2); for (;;) { sl = strchr(s1, s2[0]); if (sl == NULL) return NULL; if (strncmp(s1, s2, n) == 0) return (char *) sl; sl++ ; }

  8. Inefficiencies • strlen() is used to calculate pattern length • But patterns are fixed, so calculate once and save • strncmp() has complex inner loop • Comparing string bytes • Checking for \0 • Counting down • Know string lengths, so don’t check for \0

  9. Inefficiencies • strchr() also checks for \0 • This is unnecessary • Overhead of function calls to strchr(), strlen() and strncmp() adds up • Make no function calls in strstr() • Making these changes gave 30% speed-up • But still too slow!

  10. Further Improvements • Analyze and improve algorithm • for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) return 1; • Invert loop • for (j = 0; (c = mesg[j]) != ‘\0’; j++) if (some pattern matches starting at mesg[j]) return 1; • Don’t need to iterate through all patterns • Patterns stored in table

  11. Timing • In Unix environment • time command • writes the total time elapsed, the time consumed by system overhead, and the compute time used to execute command • Example (time quicksort from chapter 2) • head –10000 < /usr/share/dict/words | shuffle > in.txt • gcc –o sort1 sort1.c quicksort.c • time sort1 < in.txt > /dev/null

  12. Algorithm Analysis • Consider the asymptotic analysis of your program and the algorithms you are using • For quicksort, let T(n) be the runtime as a function of the size of the input array (the time will depend on the particular input array!) • The expected runtime is (nlog(n)) • If each partition roughly splits the array in half then the computing time T(n)  2T(n/2) + cn • The worst case is (n2) • If each partition splits the array into two pieces of unequal size (in the extreme 1 and n-1) • T(n) = T(n-1) + cn = (n2)

  13. Worst Case for Quicksort • Modify the code to remove the random selection of the pivot • This makes it possible to deterministically construct a worst case input (this is why randomization was used) • The worst case will occur for sorted or reverse sorted input • For sorted input, the number of comparisons Q(n) as a function of input size satisfies • Q(n) = Q(n-1) + n-1, Q(1) = 0 • Q(n) = n(n-1)/2

  14. What does Asymptotic Analysis mean for Actual Runtimes • If T(n) = (n2) • Doubling the input size increases the time by a factor of 4 • T(2n)/T(n) = (c4n2+ o(n2))/(cn2+ o(n2)), which in the limit is equal to 4. o(n2) means lower order terms. • If T(n) = (nlog(n)) • Doubling the input size roughly doubles the time [same as linear] • T(2n)/T(n) = (c2nlog(2n) + o(nlogn))/(nlog(n)+o(nlogn)) = = (c2nlogn + o(nlogn))/(cnlogn + o(nlogn)), which in the limit is equal to 2

  15. Empirical Confirmation • Run and time quicksort (without random pivot) on sorted inputs of size 10,000 and 20,000, and 40,000 • Compute the ratio of times to see if it is a factor of 4. • What if random inputs are used?

  16. Growth Rates and Limits • Suppose T(n) = (f(n)) [grows at same rate] • limitn T(n)/f(n) = c, a constant > 0. • [Actually this is not true, there may be separate limsup and liminf, but as a first approximation you can view it is true. • Suppose T(n) = o(f(n)) [grows slower] • limitn T(n)/f(n) = 0 • Suppose T(n) = (f(n)) [grows faster] • limitn T(n)/f(n) = 

  17. Determining Growth Rate Empirically • Time quicksort with a range of input sizes • e.g. 1000, 2000, 3000, …, 10000 • Write a program that times sort for a range of inputs. Use the clock function to time code inside a program. • T(1000), T(2000), T(3000),…,T(10000) • plot times for range of input to visualize • Compute ratios to compare to known functions • T(1000)/10002, T(2000)/20002,…, T(10000)/100002 • Does the ratio approach a constant, go to 0, go to ? • I.e. is is growing at the same rate, faster, or slower than the comparison function?

  18. Obtaining Range of Times • sortr 1000 10 1000 • sorts and times sorted arrays of size • 1000, 2000, 3000,…,10000

  19. Profiling with gprof • Reports on time spent in different functions (also gives number of times functions called) • Shows the hotspots • gcc –pg sort1.c quicksort.c –o sort1 • sort1 < in.40000 > /dev/null • gprof sort1 gmon.out

  20. Profiling with gcov • Uses source code analysis provided by the compiler to analyze the number of times each statement in the source code is executed. • $gcc -fprofile-arcs -ftest-coverage sorti.c quicksorti.c -o sorti • $sorti 10 • $gcov sorti.c • $gcov quicksorti.c

  21. Strategies for Speed • Concentrate on hot spots • Pay attention to which functions take the most time and how much time they take • Plot performance data • Highlights effects of parameter changes, comparisons of algorithms and data structures • Identifies unexpected behaviors

  22. Strategies for Speed • Use better algorithms and data structures • Be aware of space and time complexity of algorithms and data structures • Enable compiler optimizations • Not during code development, slows compilation • Check that code is still correct • Tune the code • Adjust details of loops and expressions • Check that code is still correct

  23. Strategies for Speed • Make sure each change continues to make program faster • Interaction between changes could slow code • Don’t optimize what doesn’t matter • Be sure you work on sections of code that take the most time • How much effort to make code faster? • The programmer time spent making a program faster should be less than the time the speed-up will recover in the lifetime of the program.

  24. Tuning the Code • Collect common subexpressions • Compute them only once • Replace expensive operations with cheaper ones • x*x*x vs. pow(x,3) • Don’t use sqrt during distance calculations if not necessary • Unroll or eliminate loops

  25. Tuning the Code • Cache frequently-used values • Compute them only once • Write a special-purpose memory allocator • Buffer input and output • Precompute results, e.g. strlen(pat) • Use approximate values, double vs. int • Rewrite in a lower-level language

  26. Summary • Choose the right algorithm • Get code working correctly, then optimize • Measure, i.e. time and profile • Focus on a few places that will make the most difference • Verify correctness • Measure again (Rinse and repeat) • Stop optimizing as soon as possible

More Related