1 / 16

Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra

Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra. SPARSE MATRIX VECTOR MULTIPLICATION . SPARSE MATRICES. Simply, Matrices with a large number of zero elements. Processing of Sparse matrices require large processing time

hova
Download Presentation

Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mihir AwatramaniLakshmi kiran TondehalXinying WangY. Ravi Chandra SPARSE MATRIX VECTOR MULTIPLICATION

  2. SPARSE MATRICES • Simply, Matrices with a large number of zero elements • Processing of Sparse matrices require large processing time • There is a huge overhead due to storing redundant elements • Sparse Matrices are when systems are modelled into large differential equations • Typical domains are Image processing , Industrial process simulations, Data retrieval WHY CONVENTIONAL ALGORITHMS NOT EFFICIENT FOR SPARSE MATRICES? WHAT ARE THEY ? WHERE ARE THEY USED ?

  3. BASICS OF SPARSE MATRICES • Compressed Sparse Row / Column to Matrix Market

  4. FORMAT INDEPENDENCE

  5. MOTIVATION for ALTERNATE STRATEGIES • Low Memory Bandwidth • Irregular memory access patterns • High latency of load/store instructions • High Ratio of Load/Store Instructions

  6. CONVEY - A QUICK LOOK INSIDE • It has 4 FPGAs for user defined Application Personalities as well !!! • 8 Memory Controllers enable parallel and pipelined • access to memory • 256 MB Coherent Cache for memory requests • from coprocessor to host memory • The AEH runs scalar instructions and routes memory requests from AE

  7. Details of the C Code SEQUENTIAL PROCESSOR AEH AE1 AE2 AE3 AE4 A8 A9 A10 MB1 MB2 MB3 HOST PROCESSOR POPULATES INPUT MATRICES COP_CALL ROUTINE PASSES THE BASE ADDRESS TO COPROCESSOR Memory allocated for array 1 from mem_base 1 Memory allocated for array 2 from mem_base 2 Memory allocated for result from mem_base 3

  8. Details of the Assembly Code AEH AE1 MB1 MB2 AEG 0 AEG 1 AEG 2 AEG 31 MB3 USE ASSEMBLY TO MOVE BASE ADDRESSES TO APPLICATION ENGINE REGISTERS • Logical operations – AND,OR,XOR • Arithmetic Operations- Multiplication, Addition • Complex calculations involving vectors could be • done without writing VHDL code MAIN MEMORY

  9. MEMORY INTERFACING OUR MODULE A 1 A 0 DATA ADDRESS POP DATA VALID REQ ID ROQ 0 MC 0 ID 0 ID 1 ID 2 ID 3 ID 255 ID 4 I &D 4 I &D 3 I &D 1 I &D 2 I &D 0 MAIN MEMORY D 1 D 0

  10. IMPLEMENTATION WE NOW HAVE THE REQUIRED INPUTS FOR SMVM IN THIS WAY, WE WRITE ALL 11 OUTPUTS TO MEMORY AFTER PROCESSING, THE SMVM GIVES A DONE SIGNAL IN THIS WAY, WE DO 21 READS FROM THE DATA BUS ONE CYCLE OF COMPUTATION IS COMPLETE !!! GENERATE LD SIGNAL GIVE BASE ADDRESS GENERATE 21 LOAD SIGNALS LOAD COMPLETE SIGNAL MASTER CONTROL ADD. DECODER MCs, ROQs AND MEMORY 0X454C…..400 0X454C…..040 DATA READ ENGINE DATA VALID START READ DATA BUS READ COMPLETE INPUT BUFFER INPUT BUFFER START WRITE OUTPUT BUFFER DONE START SMVM OUTPUT BUFFER

  11. Simulation Results - Co-Processor Instruction Execution Base Address & Size values moved to internal Registers Decode Move Instruction ( 6 Move Instructions ) Decode CAEP Instruction Start’s Custom Personality

  12. Simulation Results – Load Request to MC Starts Read Procedure ID from ROQ With Load Request Append ID from ROQ Start Read Process after send request to MC Check Address Decoded Address Send Load Request to respective MC Send 21 Data Load Requests

  13. Simulation Results – Receive Data from MC through ROQ Start Next Read ( if nothing to Write) Start Load Process Valid Data Available at MC’s But, Read Data Sequentially from MC0 – MC1 – MC2 Load Process done after receiving 21 Data Inputs

  14. Simulation Results – Write Back Results from SpMV-Engine using MC Write Process Done after 11 store operations Start Write if valid data received from SpMV Engine Decode Address Send Store Request to respective MCs Write Process Done and Start next read cycle

  15. FUTURE SCOPE • Increasing memory bandwidth • Partitioning SMVM calculations across four Application Engines

More Related