Vectorised / Semi – Parallel Interval Multiplication

Vectorised / Semi – Parallel Interval Multiplication Eoin Malins eoin@infc.ulst.ac.uk

Topics • The problem with Floating-Point (FP) arithmetic. • Interval Arithmetic • Interval Multiplication. • Vectorisation. • Results Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Floating-Point Rounding Errors • Generate two random numbers. • Add them to a variable (Error). • Subtract the same numbers from that variable. • Do not get the number you expected. float Error = 0.0, A = 0.0, B = 0.0; loop { A = rand(); B = rand(); Error = Error + A; Error = Error + B; Error = Error - A; Error = Error - B; Print ( Error ); } • The variable “Error” will not equal zero (0). Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

IEEE 754 Double Precision Floating Point Rounding Errors Over 1000 Iterations Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Topics • The problem with Floating-Point (FP) arithmetic. • Interval Arithmetic. • Interval Multiplication. • Vectorisation. • Results Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Interval Arithmetic • An interval is defined as: • [a, b] = • Many if not all calculations suffer from errors introduced by rounding, truncation and the results being non-representable on the target architecture. • Intervals prescribe the upper and lower bounds of these errors. • The ‘true’ value must lie between these bounds. Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Interval Example 3.0 3.1 3.2 3.3 3.4 Assume a system with one (1) decimal place of accuracy: The value would be rounded towards the upper and lower bounds of the values the equipment is capable of representing: Pi* = 3.14159265358979… Pi = [3.1, 3.2] Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Interval Multiplication • Brute Force method • 9-Case method • Integer-Based method Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Brute-Force Multiplication -3.6 Find Min : Find Max : 9.7 -3.6 9.7 Result x = XlYl XuYu Xl Xu Yl Yu XlYu XuYl [-1.1, 2.2] x [3.3, 4.4] -2.24 -3.63 7.26 9.68 = Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

9-Case Method of Interval Multiplication Case Conditions Zl Zu 1 2 3 XlYl XuYu Xl 0, 0 Yl XuYl XlYu Xl 0, 0 Yl < XlYu XuYl Xu < 0, Yl 0 4 5 6 Xu < 0, Yl < 0 XlYu XlYl XlYu XuYu Xl < 0 0, Yl 0 XuYl XlYl Xl < 0 Xu, Yu < 0 7 8 9 XuYl XuYu 0 Xl 0, Yl < 0 XlYu XlYl Xl < 0, Yl < 0 Yu min(XlYu, XuYl) max(XlYl, XuYu) Xl < 0 Xu, Yl < 0 Yu Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Integer-Based Interval Multiplication X = -1.1, 3.3 Y = 2.2, 4.4 X.Y = -1.1, 3.3 x 2.2, 4.4 1, 3 2 , 4 Integer Magnitude : - 1, 3 2, 4 Reapply Signs : Inner Product : x = XlYl XlYu XuYl XuYu -2 -4 6 12 Xl Xu -1 3 Yl Yu 2 4 = x Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Special Case 1/3 • According to interval specifications, the calculation : • Is defined as follows : • In FP arithmetic, the multiplication of by 0 is undefined and produces the value ‘NaN’ “Not a Number”. • Subsequent comparisons between a float and a NaN result in another NaN. Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Special Case 2/3 • When a comparison to a NaN occurs, in order to ensure coverage, normally, interval bounds must be extended to . • The product of X = and Y = • Would be sorted as: [2, 3] x [-1, 4] Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Special Case 3/3 [2, 3] x [-1, 4] Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Scalar Minimum Function int min(int a, int b) { if (a < b ) { return a; }else{ return b; } } Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Code Branching - Introduces Wait States Prefetch Instruction Stream X = A + B Get Instruction 2 Instruction 1 Y = C + D Get Instruction 3 Instruction 2 IF X < Y Get? WAIT Instruction 3 Instruction n Instruction m Get Instruction m+1 Get Instruction n+1 RETURN Y RETURN X Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Vectorised Minimum / IF Function • Vec_cmplt: Each element of the mask is TRUE if A is less than B otherwise FALSE. • Vec_sel: Each bit of the result is set to the corresponding bit of A or B if the mask bit is 0 or 1 respectively. Prefetch Instruction Stream mask = vec_cmplt (A , B) Get Instruction 2 Instruction 1 result = vec_sel (A, B, mask) Instruction 2 Get Instruction 3 Instruction 3 RETURN RESULT Get Instruction 4 . . Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

9-Case Bitmap and Vectorisation 0 12 3 Pointer array 15 1 2 3 4 15 16 INDEX Condition is true 1 a a 13 Condition is false 4 7 5 Function_case_1(Interval A, interval B) Function_case_2(Interval A, interval B) Function_case_3(Interval A, interval B) Function_case_4(Interval A, interval B) . . Function_case_15(Interval A, interval B) Function_case_16(Interval A, interval B) Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Vectorised Interval Multiplication SIMD Multiply Product vector P SIMD MIN P Min vector M Xu Yl Xl Xl Xu Yl Yu Yu 128 bit vector A B Rotate P R1 R2 XuYl Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Topics • The problem with Floating-Point (FP) arithmetic. • Interval Arithmetic. • Interval Multiplication. • Vectorisation. • Results. Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Results Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Program Size Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Assembled Instructions Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Average Time for 1x106 Interval Multiplications Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Summary • 9-case method was fastest, followed respectively by the brute force, vectorised and finally integer implementations. • Integer method of IA requires ½ the number of FP operations of its counterparts, has no special cases and is also applicable to interval division. • Integer method applicable mainly to systems with an high FPU (Floating Point Unit) latency. • Vectorised implementation was in fact slower due to the overheads incurred in setting up the appropriate data structures. Eoin Malins: Vectorised / Semi-Parallel Interval Multiplication

Vectorised/Semi-Parallel Interval ArithmeticEoin Malins Questions

Vectorised / Semi – Parallel Interval Multiplication

Vectorised / Semi – Parallel Interval Multiplication

Presentation Transcript

Performance Study of Domain Decomposed Parallel Matrix Vector Multiplication Programs

CS 267 Dense Linear Algebra: Parallel Matrix Multiplication

WHOLE NUMBERS MULTIPLICATION

CS 267 Dense Linear Algebra: History and Structure, Parallel Matrix Multiplication

THE BIG IDEAS OF MULTIPLICATION AND DIVISION

Patterns in Multiplication and Division

A. computers do everything by adding: 1 . multiplication 2 . subtraction 3 . division

Semi-Explicit Parallel Programming in Haskell

Multiplication

Everyday Mathematics Lattice Multiplication

Multiplication lesson 1

Jeopardy

Lab 5

Chapter 8 Objectives

Multplication , part 1

Parallel I/O

Interval Newton Method

Multiplication

Parallel Programming in C with MPI and OpenMP

Multiplication 2 digit x 2 digit

MULTIPLICATION