Implementing & Accelerating 3D Geometry Transformations with MMX™ Technology

Intel MMX™ Technology Accelerating 3D Geometry Transformation Implementing & Accelerating 3D Geometry Transformations with MMX™ Technology Pei Qi & Yang Wang Electrical&Computer Engineering University of Wisconsin-Madison May, 2nd, 2006 ECE734 VLSI Array Structure for Digital Signal ProcessingInstructed by: Professor Hu

Intel MMX™ Technology Accelerating 3D Geometry Transformation • Background • Motivation • Implementation&Optimization 1. Identify the critical part of code 2. Reversing (Renaming) Register to find more paired instructions 3. Re-ordering instructions to break the dependency chains • Simulation 1. Correctness 2. Performance improvement (Execution time) • Conclusion

Intel MMX™ Technology Accelerating 3D Geometry Transformation • What is 3D Geometry Transformation? X Translation Rotation Shrink/Expands Y Z

Intel MMX™ Technology Accelerating 3D Geometry Transformation • Representation of 3D object • Matrix multiplication of vectors X,Y,Z : Coordinates W: Perspective Corrective Information Original vertex Transformation matrix Transformed vertex Translation Rotation Scaling

Intel MMX™ Technology Accelerating 3D Geometry Transformation Each matrix-vector multiplication amounts to a series of vector-vector multiplications, each of which is a series of scalar multiplies and adds: - Requires 16 multiplies and 12 adds for each vertex pixel - Lots of inherent parallelism

Intel MMX™ Technology Accelerating 3D Geometry Transformation • PMADDWD instruction – Packed Multiply and Add One PMADDWD instruction per row in the matrix, such that reduce the previous workload to 4 multiplies and 2 adds for each of vertex pixel

Intel MMX™ Technology Accelerating 3D Geometry Transformation • MOVQ instruction – Move 64 bits • PSRLQ – Packed Shift Right Logical • PSRAD – Packed Shift Right Arithmetic

Intel MMX™ Technology Accelerating 3D Geometry Transformation • mov eax,[esp]+ 4 • mov ebx,[esp]+ 8 • 3 mov ecx,[esp]+12 • 4 mov edx,[esp]+16 • 5 movq mm0, 0[eax] • 6 NextVect: • 7 movq mm3,[ebx] • 8 movq mm4,mm3 • 9 pmaddwd mm4,mm0 • 10 movq mm5,mm4 • 11 psrlq mm5,32 • 12 paddd mm4,mm5 • 13 moved [edx],mm4

Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm0 * 4 movq mm5, mm4 5 psrlq mm4, 32 * 6 paddd mm5, mm4 7 psrad mm5, 13 8 movd [edx], mm5 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 32 * 13 paddd mm5, mm4 14 psrad mm5, 13 15 movd [edx+2], mm5 16 movq mm4, mm3 * 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 32 * 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 pmaddwd mm4, mm0 4 movq mm5, mm4 5 psrlq mm5, 32 6 paddd mm4, mm5 7 psrad mm4, 13 8 movd [edx], mm4 9 movq mm4, mm3 * 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm5, 32 13 paddd mm4, mm5 14 psrad mm4, 13 15 movd [edx+2], mm4 16 movq mm4, mm3 * 17 pmaddwd mm4,mm2 18 movq mm5, mm4 19 psrlq mm5, 32 20 paddd mm4, mm5 21 movd [edx+4], mm4 22 add ebx, 8 23 add edx, 6 * 24 dec ecx 1 25 jnz NextVect * Reversing register to get more paired instructions

Intel MMX™ Technology Accelerating 3D Geometry Transformation NextVect: 1 movq mm4, [ebx] 2 movq mm3, mm4 3 pmaddwd mm4, mm0 1 - 1 4 movq mm5, mm4 5 psrlq mm4, 32 1 - 1 6 paddd mm5, mm4 8 movd [edx], mm5 9 movq mm4, mm3 1 - 1 10 pmaddwd mm4, mm1 11 movq mm5, mm4 12 psrlq mm4, 32 1 - 1 13 paddd mm5, mm4 15 movd [edx+2], mm5 16 movq mm4, mm3 1 - 1 17 pmaddwd mm4, mm2 18 movq mm5, mm4 19 psrlq mm4, 32 1 - 1 20 paddd mm5, mm4 21 movd [edx+4], mm5 22 add ebx, 8 23 add edx, 6 1 - 1 24 dec ecx 1 25 jnz NextVect 1 - 1 NextVect: 1 movq mm3, [ebx] 2 movq mm4, mm3 3 movq mm5, mm4 4 pmaddwd mm3, mm0 1- 1 5 pmaddwd mm4, mm1 6 pmaddwd mm5, mm2 7 movq mm6, mm3 2 - 1 8 psrlq mm3, 32 9 paddd mm3, mm6 10 movq mm6, mm4 1 - 1 11 psrlq mm4, 32 12 paddd mm4, mm6 13 movq mm6, mm5 1 - 1 14 psrlq mm5, 32 15 paddd mm5, mm6 19 movd [edx], mm3 20 movd [edx+2], mm4 21 movd [edx+4], mm5 22 add edx, 6 23 add ebx, 8 1 - 1 24 dec ecx 1 25 jnz NextVect 1 - 1 Re-ordering instructions to break the dependency chains

Intel MMX™ Technology Accelerating 3D Geometry Transformation • Simulation 1. Correctness (first concern) To testify if the optimized code yields the correct result, we compare output of optimized code with the result calculated by directly multiplies two matrix in Matlab. We found that the result of optimized code is coincident with what we expected. This proved that the optimization would not influence the correctness of program.

Intel MMX™ Technology Accelerating 3D Geometry Transformation • Simulation 2. Performance Improvement (Execution time)

Intel MMX™ Technology Accelerating 3D Geometry Transformation • Conclusion The new instructions in MMX™ Technology can effectively accelerate 3D geometry transformation. In addition, we can improve the performance of execution through some optimizations at instruction level, such as reversing register, re-ordering instruction without sacrificing the correctness of code.

Intel MMX™ Technology Accelerating 3D Geometry Transformation Thank you ECE734 VLSI Array Structure for Digital Signal ProcessingInstructed by: Professor Hu

Implementing & Accelerating 3D Geometry Transformations with MMX™ Technology