Memory Aliasing and Program Efficiency Explained | Code Optimization Techniques

Chapter 5 Optimizing Program Performance Guobao Jiang (蒋国宝) 11210240049@fudan.edu.cn loveaborn@foxmail.com

Problem 5.1 (P381) • The following problem illustrates the way memory aliasing (存储器别名使用) can cause unexpected program behavior. Consider the following procedure to swap two values: void swap(int *xp, int *yp) { *xp = *xp + *yp; /* x+y */ *yp = *xp - *yp; /* x+y-y = x */ *xp = *xp - *yp; /* x+y-x = y */ } If this procedure is called with xp equal to yp, what effect will it have ?

Problem 5.2 (P384) • Later in this chapter we will take a single function and generate many different variants that preserve the function’s behavior, but with different performance characteristics. For three of these variants, we found that the run times (in clock cycles) can be approximated by the following functions: • Version 1 60 + 35n • Version 2 136 + 4n • Version 3 157 + 1.25n For what values of n would each version be the fastest of the three ? Remember that n will always be an integer.

Problem 5.3 (P391) • Consider the following functions: int min(int x, int y) {return x < y ? x:y;} int max(int x, int y){return x < y ? y:x;} void incr(int *xp, int v){ *xp += v;} int square(int x){return x*x;} • The following three code fragments call these functions:

Problem 5.3+ (P391) A. for(i = min(x, y); i<max(x, y); incr(&i,1)) t += square(i); B. for(i = max(x, y)-1;i>=min(x, y);incr(&i,-1)) t += square(i); C. int low = min(x, y); int high = max(x, y); for (i = low; i < high; incr(&i, 1)) t+= square(i);

Problem 5.3+ (P391) A. for(i = min(x, y); i<max(x, y); incr(&i,1)) t += square(i); C. int low = min(x, y); int high = max(x, y); for (i = low; i < high; incr(&i, 1)) t+= square(i); Assume x=10 and y=100. Fill in the table: 1 91 90 90 91 1 90 90 1 1 90 90

Problem 5.4 (P415) • Question: • Write C code for a procedure combine5px8 that shows how pointers, loop variables, and termination conditions are being computed by this code. Show the general form with arbitrary data and combining operation in the style of Figure 5.19 (P392). Describe how it differs form our handwritten pointer code (Figure 5.22). • At times, GCC does its own … .L6 addl (%eax), %edx addl 4(%eax),%edx addl 8(%eax),%edx addl 12(%eax),%edx addl 16(%eax),%edx addl 20(%eax),%edx addl 24(%eax),%edx addl 28(%eax),%edx addl $32,%eax addl $ 8,%ecx cmpl %esi, %ecx jl .L6

Problem 5.5 (P421) • The following shows the code generated from a variant of combine6(P416) that uses eight-way loop unrolling and four-way parallelism. .L152 addl (%eax), %ecx addl 4(%eax), %esi addl 8(%eax), %edi addl 12(%eax), %ebx addl 16(%eax), %ecx addl 20(%eax), %esi addl 24(%eax), %edi addl 28(%eax), %ebx addl $32,%eax addl $ 8,%edx cmpl -8(%ebp), %edx jl .L152 • Questions: • A. What program variable has being spilled onto the stack? • B. At what location on the stack? • C. Why is this a good choice of which value to spill ?

Problem 5.6 (P422) • Consider the following function for computing the product of an array of n integers. We have unrolled the loop by a factor of 3. int aprod(int a[], int n) { int i, x, y, z; int r = 1; for (i = 0; i < n-2; i += 3 ){ x = a[i]; y=a[i+1]; z=a[i+2]; r = r*x*y*z; /*Product computation*/ } for (; i < n; i++) r *= a[i]; return r; }

Problem 5.6+ (P422) • For the line labeled Product computation, we can use parentheses to create five different associations of the computation, as follows: r = ((r * x) * y) * z; /* A1 */ r = (r * (x * y)) * z; /* A2 */ r = r * ((x * y) * z); /* A3 */ r = r * (x * (y * z)); /* A4 */ r = (r * x) * (y * z); /* A5 */ • Recall from Figure 5.12 that the integer multiplication operation on this machine has a latency of 4 cycles and an issue time of 1 cycle.

Problem 5.6+ (P422) • The table that follows shows some values of the CPE and other values missing. Fill in the missing entries. 12/3 = 4 8/3 = 2.67 1.67 4/3 = 1.33 4/3 = 1.33 2.67 8/3 = 2.67

Problem 5.7 (P428) • A friend of yours has written … int deref(int *xp) { return xp ? *xp : 0; } The compiler generates the following code for the body of the procedure. movl 8(%ebp), %edx Get xp movl (%edx), %eax Get *xp as result testl %edx, %edx Test xp cmovzl %edx, %eax If 0, copy 0 to result Explain why this code does not provide a valid implementation of deref.

Problem 5.8 (P436) • As another example of code with potential load-store interactions, consider the following function to copy the contents of one array to another: void copy_array(int *src, int *dest, int n) { int i; for (i = 0; i < n; i++) dest[i] = src[i]; } Suppose a is an array of length 1000 initialized so that each element a[i] equals i.

Problem 5.8+ (P436) • A. What would be the effect of the call copy_array(a+1, a, 999) ? • B. What would be effect of the call copy_array(a, a+1, 999) ? • C. Our performance measurements indicate that the call of part A has a CPE of 3.00, while the call of part B has a CPE of 5.00. To what factor do you attribute this performance difference ? • D. What performance would you expect for the call copy_array(a, a, 999) ?

Problem 5.9 (P443) • Suppose you work as a truck driver, and you have been hired to carry a load of potatoes from Boise, Idaho to Minneapolis, Minnesota, a total distance of 2500 kilometers. You estimate you can average 100 km/hr driving within the speed limits, requiring a total of 25 hours for the trip. • A. You hear on the news that Montana has just abolished its speed limit, which constitutes 1500 km of the trip. Your truck can travel at 150 km/hr. What will be your speedup for the trip ? • B. You can buy a new turbocharger for your truck at www.fasttrucks.com. They stock a variety of models, but the faster you want to go, the more it will cost. How fast must you travel through Montana to get an overall speedup for your trip of 5/3 ?

Problem 5.10 (P444) • The marketing department at your company has promised your customers that the next software release will show a 2X performance improvement. You have been assigned the task of delivering on that promise. You have determined that only 80% of the system can be improved. How much (i.e., what value of k) would you need to improve this part to meet the overall performance target ?

Summary • 1. Optimization blocker (妨碍优化的因素) A. memory aliasing B. function call • 2. Performance improvement techniques A. High-level design algorithms and data structures B. Basic coding principles eliminate excessive function calls eliminate unnecessary memory references C. Low-level optimizations pointer versus array code reduce loop overhead by unrolling loops make use of the pipelined functional units by iteration splitting (迭代分割)

Assignments • 5.15 (P448) • 5.17 (P448) • 5.19 (P450) • Notes: Due Next Monday (May 28, 2012) • This slides will be uploaded to ftp:10.141.247.12

Q&A ? Thank you!

website • http://jpkc.fudan.edu.cn/s/258/main.htm • http://10.108.0.74/s/258/main.jspy • ftp: 10.141.247.12 usr:ics2012 pwd:ics2012

Memory Aliasing and Program Efficiency Explained | Code Optimization Techniques

Memory Aliasing and Program Efficiency Explained | Code Optimization Techniques

Presentation Transcript

Optimizing Network Performance

Chapter 5 Helicopter Performance

Optimizing System Performance

Optimizing Performance Through Consortia

Chapter 5 : Optimizing Windows

Optimizing Performance

Optimizing Performance

Optimizing HBase scanner performance

Program Optimization (Chapter 5)

Optimizing Herbicide Performance

Chapter 5 Advanced Program

Optimizing TCP Forwarder Performance

Optimizing Performance 2

Optimizing single thread performance

70-432 – Optimizing Performance

Optimizing Performance in Sport

CS 201 Optimizing Program Performance Machine Dependent Optimizations

Optimizing Batch Job Performance

Optimizing Pipeline Performance Market

CHAPTER 5 PERFORMANCE APPRAISAL

Optimizing System Performance

Optimizing Air Compressor Performance