260 likes | 384 Views
Performance*. Objective: To learn when and how to optimize the performance of a program. “ The first principle of optimization is don ’ t. ” Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster?. *The examples in these slides come from
Performance* • Objective: To learn when and how to optimize the performance of a program. • “The first principle of optimization is don’t.” • Knowing how a program will be used and the environment it runs in, is there any benefit to making it faster? *The examples in these slides come from Brian W. Kernighan and Rob Pike, “The Practice of Programming”, Addison-Wesley, 1999.
Approach • The best strategy it to use the simplest, cleanest algorithms and data appropriate for the task. • Then measure performance to see if changes are needed. • Enable compiler options to generate the fastest possible code.
Approach • Assess what changes to the program will have the most effect (profile the code). • Make changes one at a time and re-assess (always retest to verify correctness). • Consider alternative algorithms • Tune the code • Consider a lower level language (just for time sensitive components)
Topics • A Bottleneck • Timing and Profiling • time and clock • algorithm analysis • prof and gprof • gcov • Concentrate on hot spots • Strategies for speed • Tuning the code
A Bottleneck • isspam example from the text • Heavily used • Existing implementation not fast enough in current environment • Benchmark • Profile • Tune code • Change algorithm
isspam() /* isspam: test mesg for occurrence of any pat */ int isspam(char *mesg) int i; for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) { printf ("spam: match for '%s'\n", pat [i]) ; return 1; } return 0;
strstr() /* simple strstr: use strchr to look for first character */ char strstr(const char *sl, const char *s2) int n; n = strlen(s2); for (;;) { sl = strchr(s1, s2[0]); if (sl == NULL) return NULL; if (strncmp(s1, s2, n) == 0) return (char *) sl; sl++ ; }
Inefficiencies • strlen() is used to calculate pattern length • But patterns are fixed, so calculate once and save • strncmp() has complex inner loop • Comparing string bytes • Checking for \0 • Counting down • Know string lengths, so don’t check for \0
Inefficiencies • strchr() also checks for \0 • This is unnecessary • Overhead of function calls to strchr(), strlen() and strncmp() adds up • Make no function calls in strstr() • Making these changes gave 30% speed-up • But still too slow!
Further Improvements • Analyze and improve algorithm • for (i = 0; i < npat; i++) if (strstr(mesg, pat[i]) != NULL) return 1; • Invert loop • for (j = 0; (c = mesg[j]) != ‘\0’; j++) if (some pattern matches starting at mesg[j]) return 1; • Don’t need to iterate through all patterns • Patterns stored in table
Timing • In Unix environment • time command • writes the total time elapsed, the time consumed by system overhead, and the compute time used to execute command • Example (time quicksort from chapter 2) • head –10000 < /usr/share/dict/words | shuffle > in.txt • gcc –o sort1 sort1.c quicksort.c • time sort1 < in.txt > /dev/null
Algorithm Analysis • Consider the asymptotic analysis of your program and the algorithms you are using • For quicksort, let T(n) be the runtime as a function of the size of the input array (the time will depend on the particular input array!) • The expected runtime is (nlog(n)) • If each partition roughly splits the array in half then the computing time T(n) 2T(n/2) + cn • The worst case is (n2) • If each partition splits the array into two pieces of unequal size (in the extreme 1 and n-1) • T(n) = T(n-1) + cn = (n2)
Worst Case for Quicksort • Modify the code to remove the random selection of the pivot • This makes it possible to deterministically construct a worst case input (this is why randomization was used) • The worst case will occur for sorted or reverse sorted input • For sorted input, the number of comparisons Q(n) as a function of input size satisfies • Q(n) = Q(n-1) + n-1, Q(1) = 0 • Q(n) = n(n-1)/2
What does Asymptotic Analysis mean for Actual Runtimes • If T(n) = (n2) • Doubling the input size increases the time by a factor of 4 • T(2n)/T(n) = (c4n2+ o(n2))/(cn2+ o(n2)), which in the limit is equal to 4. o(n2) means lower order terms. • If T(n) = (nlog(n)) • Doubling the input size roughly doubles the time [same as linear] • T(2n)/T(n) = (c2nlog(2n) + o(nlogn))/(nlog(n)+o(nlogn)) = = (c2nlogn + o(nlogn))/(cnlogn + o(nlogn)), which in the limit is equal to 2
Empirical Confirmation • Run and time quicksort (without random pivot) on sorted inputs of size 10,000 and 20,000, and 40,000 • Compute the ratio of times to see if it is a factor of 4. • What if random inputs are used?
Growth Rates and Limits • Suppose T(n) = (f(n)) [grows at same rate] • limitn T(n)/f(n) = c, a constant > 0. • [Actually this is not true, there may be separate limsup and liminf, but as a first approximation you can view it is true. • Suppose T(n) = o(f(n)) [grows slower] • limitn T(n)/f(n) = 0 • Suppose T(n) = (f(n)) [grows faster] • limitn T(n)/f(n) =
Determining Growth Rate Empirically • Time quicksort with a range of input sizes • e.g. 1000, 2000, 3000, …, 10000 • Write a program that times sort for a range of inputs. Use the clock function to time code inside a program. • T(1000), T(2000), T(3000),…,T(10000) • plot times for range of input to visualize • Compute ratios to compare to known functions • T(1000)/10002, T(2000)/20002,…, T(10000)/100002 • Does the ratio approach a constant, go to 0, go to ? • I.e. is is growing at the same rate, faster, or slower than the comparison function?
Obtaining Range of Times • sortr 1000 10 1000 • sorts and times sorted arrays of size • 1000, 2000, 3000,…,10000
Profiling with gprof • Reports on time spent in different functions (also gives number of times functions called) • Shows the hotspots • gcc –pg sort1.c quicksort.c –o sort1 • sort1 < in.40000 > /dev/null • gprof sort1 gmon.out
Profiling with gcov • Uses source code analysis provided by the compiler to analyze the number of times each statement in the source code is executed. • $gcc -fprofile-arcs -ftest-coverage sorti.c quicksorti.c -o sorti • $sorti 10 • $gcov sorti.c • $gcov quicksorti.c
Strategies for Speed • Concentrate on hot spots • Pay attention to which functions take the most time and how much time they take • Plot performance data • Highlights effects of parameter changes, comparisons of algorithms and data structures • Identifies unexpected behaviors
Strategies for Speed • Use better algorithms and data structures • Be aware of space and time complexity of algorithms and data structures • Enable compiler optimizations • Not during code development, slows compilation • Check that code is still correct • Tune the code • Adjust details of loops and expressions • Check that code is still correct
Strategies for Speed • Make sure each change continues to make program faster • Interaction between changes could slow code • Don’t optimize what doesn’t matter • Be sure you work on sections of code that take the most time • How much effort to make code faster? • The programmer time spent making a program faster should be less than the time the speed-up will recover in the lifetime of the program.
Tuning the Code • Collect common subexpressions • Compute them only once • Replace expensive operations with cheaper ones • x*x*x vs. pow(x,3) • Don’t use sqrt during distance calculations if not necessary • Unroll or eliminate loops
Tuning the Code • Cache frequently-used values • Compute them only once • Write a special-purpose memory allocator • Buffer input and output • Precompute results, e.g. strlen(pat) • Use approximate values, double vs. int • Rewrite in a lower-level language
Summary • Choose the right algorithm • Get code working correctly, then optimize • Measure, i.e. time and profile • Focus on a few places that will make the most difference • Verify correctness • Measure again (Rinse and repeat) • Stop optimizing as soon as possible