1 / 22

Tan Hongbing, Liu Sheng † , Chen Haiyan School of National University of Defense Technology

Tan Hongbing, Liu Sheng † , Chen Haiyan School of National University of Defense Technology. Modeling and Evaluation for Gather/Scatter Operations in Vector-SIMD architectures. Presentation Outline. 1. Introduction 2. Models and Verification 3. Evaluation and Results

rkiesel
Download Presentation

Tan Hongbing, Liu Sheng † , Chen Haiyan School of National University of Defense Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tan Hongbing, Liu Sheng†, Chen Haiyan School of National University of Defense Technology Modeling and Evaluation for Gather/Scatter Operations in Vector-SIMD architectures

  2. Presentation Outline 1. Introduction 2. Models and Verification 3. Evaluation and Results 4. Conclusion 2

  3. Gather-Scatter in Vector-SIMD architectures Gathers: vector of addrs  vector register Scatters: vector register vector of addrs – Reads and writes to different sub-banks performed in parallel – Multiple reads or writes to same sub-bank address combined into single access – Reads overlapped across different gathers; writes overlapped across different scatters 3

  4. Gather-Scatter in application 4

  5. Definition of Gather-Scatter Gather: Scatter: 5

  6. Gather/scatter has the stochastic and complicated properties, the hardware design of gather/scatter operations lacks theoretical analysis and modeling. • what’s the possible distributions of access locations to the given PE and memory bank counts, • what ‘s the probability of each distribution • how to detailedly optimize the hardware implemation The proposed model will give the answers. 6

  7. Presentation Outline 1. Introduction 2. Models and Verification 3. Evaluation and Results 4. Conclusion 7

  8. Example (1) MCPC=4, {4,0,0,0} (2) MCPC=3, {3,1,0,0} -Both the SIMD width and the number of memory (3) MCPC=2, {2,2,0,0} banks(sing-port) are 4 -The Maximum Conflicts Per Cycle (MCPC) is equal to the (4) MCPC=2, {2,1,1,0} maximum element of access location distributions. The Distribution of Access Location, We call DAL for short (5) MCPC=1, {1,1,1,1} 4 access locations divide into 2 groups and distribute in two different memory banks 8

  9. Relation among Distribution of Access Location (DAL) f(4,1): {1,1,1,1} f(4,2): f(4,1);{2,1,1,0};{2,2,0,0} f(4,3): f(4,2);{3,1,0,0} f(4,4): f(4,3);{4,0,0,0} f(7,7): f(7,1);{2,f(5,2)}; {3,f(4,3)};{4,f(3,3)}; {5,f(2,2)};{6,f(1,1)}; g(7); f(a,b) is a set which include all the DALs whose maximum element less than or equal to b with a PEs 9

  10. is the integer portion of the quotient of A divided by B Modeling the DAL f(a,b) is a set which include all the DALs whose maximum element less than or equal to b with a PEs 10

  11. (1) MCPC=4, {4,0,0,0} Modeling the Probability of Access Conflict(PAC) (2) MCPC=3, {3,1,0,0} (3) MCPC=2, {2,2,0,0} (4) MCPC=2, {2,1,1,0} (5) MCPC=1, {1,1,1,1} All possible permutation of the j-th DAL The probability of the j-th DAL 11

  12. Modeling the PAC The data used in this equation come from D, D[i,j] is the i-th element in j-th row; O[j] is the number of non-zero elements in j-th row; G(i,j) is the sum of the front of i elements in j-th row; M(j) is an intermediate variable for calculation; F(m,j) is the number of elements m in the j-th row. 12

  13. Model verification (By Matlab) Validating the PAC model The average accuracy of our model on the gather/ scatter is over 98% (min: 97.3%, max: 100%) when read/write locations are totally random Validating the DAL model The results show all the measured and estimated results are totally same 13

  14. Presentation Outline 1. Introduction 2. Models and Verification 3. Evaluation and Results 4. Conclusion 14

  15. Evaluation and Results (1) Organizing memory bank into separate sub-banks (2) Adding buffers to cache memory requests To hardware designers, two common methods can improve gather/scatter performance 15

  16. Evaluation and Results Analysis for MCPC with the PE:Bank varied more than 80% DALs, their MCPC<=4 more than 90% DALs, their MCPC<=3 more than 90% DALs, their MCPC<=2 The performance of gather/scatter is closely related to the ratio of PEs to memory banks 16

  17. Evaluation and Results NAC=1.64 NAC=1.32 NAC=2.12 2.12 2.59 3.05 3.45 3.76 Analysis for selecting the proper number of memory banks Average Number of Access Conflict (NAC) φ(k) stands the DALs whose MCPC is k 17

  18. Evaluation and Results 3.05 1.98 1.34 Buffer array deeper, Run time more less The runtime time reduced as the ratio of PEs to memory banks deceased 18

  19. Evaluation and Results The effect of performance improvement with the depth of buffer array varied Dispersive Very close 19

  20. Presentation Outline 1. Introduction 2. Models and Verification 3. Evaluation and Results 4. Conclusion 20

  21. Conclusion -This model can give all the possible DAL, PAC and so on for gather/scatter operation in various situations. -This model can help users to select the optimum number of memory banks and guide the designers to select the proper number of buffers.(For example, if SIMD=16,each bank consist of 2 sub-banks,and buffer depth set to 4 is recommended) 21

  22. Thank you! 22

More Related