280 likes | 383 Views
Memory Arithmetic Unit Interface. Jason M. Meier Justin S. Teller Tom J. Keeley. Current Paradigm. CPU. Done: Task 1. CPU:. Task 1. Task 2. MEMORY CTRL:. MEMORY:. DRAM System. Memory Controller. Active Pages Implementation. Used Configurable DRAM - RADRAM.
E N D
Memory Arithmetic Unit Interface Jason M. Meier Justin S. Teller Tom J. Keeley
Current Paradigm CPU Done: Task 1 CPU: Task 1 Task 2 MEMORY CTRL: MEMORY: DRAM System Memory Controller
Active Pages Implementation • Used Configurable DRAM - RADRAM • Reconfigurable logic implements various memory functions • “Active Page” consists of a page of data and a set of associated functions • Works on individual DRAM chips • Processor-centric and Memory-centric partitioning * Active Pages - Oskin, Chong, Sherwood – ISCA ‘98
MAUI Implementation CPU Done: Task 1 CPU: Task 1 Task 2 MEMORY CTRL/MAUI: Task 1 MEMORY: MAU MAUI DRAM System Memory Controller
MAUI Instruction Set MAU MAUI DRAM System Memory Controller MAUI_LD <m_rd>,offset(<cpu_rs>) 1)CPU sends an MAU_LOAD register command to the MC (along with the reg # and address to read) across the front-side bus. 2)MC interprets command and places a Read command in the transaction queue. 3)DRAM performs read. 4) Result is stored in appropriate register in the MAUI register file. CPU: LOAD REG 1 4 MC/MAUI: 2 3 DRAM: R 4 3 2 1
MAUI Instruction Set II MAUI_LDI <rd>,<cpu_rs> 1)CPU sends an MAU_LOADI register command to the MC (along with the reg # and integer to save) across the front-side bus. 2)MC interprets command and places integer in the appropriate register in the MAUI register file. CPU: LOADI REG 1 2 MC/MAUI: DRAM: MAU MAUI 2 DRAM System Memory Controller 1
MAUI Instruction Set III CPU MAUI_ADD <rd>,<rs1>,<rs2>,<rsz> 1 1)CPU invalidates addresses in the cache that fall within the range of the destination array. Addresses within the range of the source arrays are written back if dirty. 2)CPU sends an MAUI_ADD command to the MC (along with the reg #’s) across the front-side bus. 3)MC interprets command, MAUI adds the appropriate registers and places a Write command and next two Read commands in the transaction queue. 4) Step 3 repeats for the length of the array. CPU: MAU_ADD 2 MC/MAUI: 4 3 DRAM: W R R W 1 3 MAU MAUI DRAM System 4 Memory Controller 2
Issues: Address Mapping Virtual Space Memory that is Contiguous in Virtual Spacemay not be Contiguous in Physical Space • MAUI assumes consecutive addressing (size register) • MAUI operations which cross page boundaries must be split into separate operations for each page TLB • Programmer will not know mapping scheme Physical Space • Result: All MAUI operations will need to be privileged instructions, accessed by programs through a system call.
Issues: Compiler Issues • The compiler will be responsible for deciding when MAUI instructions should be used. • This decision will be based on the size of the array, and if it’s likely to be in the cache, or if it’s likely to used by an instruction that isn’t implemented in the MAUI.
Issues: Task Interrupts CPU CPU: Task 1 Task 2 Task 2 MEMORY CTRL/MAUI: Task 1 Task 2 Task 1 MEMORY: MAU MAUI DRAM System Memory Controller
Example: maui_add I BIU maui_ld r1, 0 Memory maui_ld r1, 0 Transaction Queue Memory Controller
Example: maui_add II BIU maui_ld r2, 5 Memory maui_ld r2, 5 Transaction Queue Memory Controller
Example: maui_add III BIU maui_ld r3, 10 Memory maui_ld r3, 10 Transaction Queue Memory Controller
Example: maui_add IV BIU maui_ld r4, 2 Memory maui_ld r4, 2 Transaction Queue Memory Controller
Example: maui_add V BIU maui_add r3, r1, r2 Memory maui_add r3, r1, r2 Transaction Queue R, 0 R, 5 Memory Controller
Example: maui_add VI BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue D1[0] Memory Controller
Example: maui_add VII BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue D2[0] Memory Controller
Example: maui_add VIII BIU Read 10 Memory maui_add r3, r1, r2* Transaction Queue R, 1 R, 6 W,10, D1[0]+D2[0] Memory Controller
Example: maui_add IX BIU Write 6, D Memory maui_add r3, r1, r2* Transaction Queue D1[1] Memory Controller
Example: maui_add X BIU Write 6, D Memory maui_add r3, r1, r2* Transaction Queue D2[1] Memory Controller
Example: maui_add XI BIU Memory Next Instruction Transaction Queue W,10, D1[1]+D2[1] Memory Controller
Advantages & Disadvantages • Advantages • Better performance for DRAM latency bound computations • Lower latency to DRAM compared to CPU • Reduced traffic on front-side bus • Concurrent execution • Disadvantages • MAUI operates at a lower clock frequency • Increased compiler complexity • Increased fabrication costs (More Logic = More $$) • Recently used data may not be cached
Alternative Implementation CPU MAUI Occupies its Own Read & Write Bus • GOOD • GOOD • Eliminate Contention with CPU for DRAM system resources. • Create Circular Data flow resulting in increased performance • Need Specialized Triple-Ported DRAM system leading to increased production costs X BAD MAU MAUI Read & Write Bus MAUI Memory Controller DRAM System
Test Setup • Simulated on SimpleScalar version 4.0 • One set of test benches with dual array operations running in both the MAUI and CPU with four different array sizes. This trial was repeated for both shared and independent memory access busses. • Found up to a 43% speedup!
Results Total CPU Cycles
Future Enhancements I MAUS MAUI DRAM System Memory Controller MAU Multi-tasking CPU: Task 1 Task 2 Task 3 Task 3 Task 2 MEMORY CTRL/MAUI: Task 1 MEMORY: More MAUs for Parallelism Larger Register File Small Cache
Future Enhancements II MAU MAUI DRAM System Memory Controller Better Pipelining CPU: MAU_ADD MC/MAUI: DRAM: R R R R R R R R W W W W Larger Register File to Hold Intermediate Results