290 likes | 309 Views
Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access. Gilbert Hendry Johnnie Chan, Daniel Brunina , Luca Carloni , Keren Bergman. Lightwave Research Laboratory Columbia University New York, NY. Motivation.
E N D
Photonic On-Chip Networks for Performance-Energy Optimized Off-Chip Memory Access Gilbert Hendry Johnnie Chan, Daniel Brunina, Luca Carloni, Keren Bergman Lightwave Research Laboratory Columbia University New York, NY
Motivation The memory gap warrants a paradigm shift in how we move information to and from storage and computing elements [www.OpenSparc.net] [Exascale Report, 2008] Lightwave Research Lab, Columbia University
Main Premise • Current memory subsystem technology and packaging are not well-suited to future trends • Networks on chip • Growing cache sizes • Growing bandwidth requirements • Growing pin counts Lightwave Research Lab, Columbia University
SDRAM context • DIMMs controlled fully in parallel, sharing access on data and address busses • Many wires/pins • Matched signal paths (for delay) • DIMMs made for short, random accesses Chip Lately, this is on chip [Intel] DIMM Memory Controller DIMM DIMM Lightwave Research Lab, Columbia University
Future SDRAM context Example: Tilera TILE 64 Lightwave Research Lab, Columbia University
SDRAM DIMM Anatomy data DRAM_Bank Col Decoder Sense Amps Col addr/en DRAM_Chip data DRAM cell arrays IO Cntrl Row Decoder Row addr/en Banks (usually 8) Addr/cntrl Ranks SDRAM device DRAM_DIMM Lightwave Research Lab, Columbia University
Memory Access in an Electronic NoC Packetized, size of packet determined by router buffers message Chip Boundary NoC router Memory Controller Burst length dictated by packet size Lightwave Research Lab, Columbia University
Memory Control • Complex DRAM control • Scheduling accesses around: • Open/closed rows • Precharging • Refreshing • Data/Control bus usage [DRAMsim, UMD] Lightwave Research Lab, Columbia University
Experimental Setup – Electronic NoC • 2cm×2cm chip • 8×8 Electronic Mesh • 28 DRAM Access points (MCs) • 2 DIMMs per DRAM AP • Routers: • 1 kb input buffers (per VC) • 4 virtual channels • 256 b packet size • 128 b channels • 32 nm tech. point (ORION) • Normal Vt • Vdd = 1.0 V • Freq = 2.5 GHz System: 5-port Electronic Router Traffic: DRAM: • Random core-DRAM access point pairs • Random read/write • Uniform message sizes • Poisson arrival at 1µs • Modeled cycle-accurately with DRAMsim [Univ. MD] • DDR3 (10-10-10) @ 1333 MT/s • 8 chips per DIMM, 8 banks per Chip, 2 ranks Lightwave Research Lab, Columbia University
Experiment Results 269 Gb/s Lightwave Research Lab, Columbia University
Current Lightwave Research Lab, Columbia University
Goal: Optically Integrated Memory Optical Fiber Optical Transceiver Vdd, Gnd Lightwave Research Lab, Columbia University
Advantages of Photonics • Decoupled energy-distance relationship • No long traces to drive and synch with clock • DRAM chips can run faster • Less power • Less pins on DIMM module and going into chip • Eventually required by packaging constraints • Waveguides can achieve dramatically higher density due to WDM • DRAM can be arbitrarily distant – fiber is low loss Lightwave Research Lab, Columbia University
Broadband 1×2 Switch Broadband 2×2 Switch Transmission Hybrid Circuit-Switched Photonic Network [Cornell, 2008] [Shacham, NOCS ’07] Lightwave Research Lab, Columbia University
Hybrid Circuit-Switched Photonic Network Photonic Transmission Electronic Control Compute Lightwave Research Lab, Columbia University
Hybrid Circuit-Switched Photonic Network International Symposium on Networks-on-Chip
Hybrid Circuit-Switched Photonic Network [Bergman, HPEC ’07] International Symposium on Networks-on-Chip
Photonic DRAM Access Fiber / PCB waveguide Memory gateway DIMM DIMM Photonic + electronic DIMM Procesor gateway To network Modulators needed to send commands to DRAM Processor / cache electronic Chi p boundary Photonic switch Modulators generates memory control commands cntrl Memory Control Network Interface Memcntrl To/From network Lightwave Research Lab, Columbia University
Memory Transaction Memory gateway DIMM 3 DIMM DIMM Procesor gateway To network 2 1 Read or write request is initiated from local or remote processor, travels on electronic network Processor Gateway forwards it to Memory gateway Memory gateway receives request 1 Processor / cache Chi p boundary Lightwave Research Lab, Columbia University
Memory READ Transaction 4) MC receives READ command 5) Switch is setup from modulators to DIMM, and from DIMM to network 6) Path setup travels back to receiving Processor. Path ACK returns when path is set up 7) Row/Col addresses sent to DIMM optically 8) Read data returned optically 9) Path torn down, MC knows how long it will take 8 Photonic switch 7 Modulators 5 Control 4 8 6 Lightwave Research Lab, Columbia University
Memory WRITE Transaction 4) MC receives WRITE command, which is also a path setup from the processor to memory gateway 5) Switch is setup from modulators to DIMM 6) Row/Col addresses sent to DIMM 7) Switch is setup from network to DIMM 8) Path ACK sent back to Processor 9) Data transmitted optically to DIMM 10) Path torn down from Processor after data transmitted 9 Photonic switch 6 Modulators 5 7 Control 4 8 Lightwave Research Lab, Columbia University
Optical Circuit Memory (OCM) Anatomy Packe t Format λ Detector Bank DRAM_OpticalTransceiver Cntrl Data Bank ID DLL Burst length Latches Mux Addr/cntrl (25) Modulator Bank Nλ Nλ Data (64) Row address Col address drivers clk t tRCD tCL Fiber Coupling OR Waveguide Coupling VDD, Gnd Lightwave Research Lab, Columbia University
Advantages of Photonics • Decoupled energy-distance relationship • No long traces to drive and synch with clock • DRAM chips can run faster • Less power • Less pins on DIMM module and going into chip • Eventually required by packaging constraints • Waveguides can achieve dramatically higher density due to WDM • DRAM can be arbitrarily distant – fiber is low loss • Simplified memory control logic – no contending accesses, contention handled by path setup • Accesses are optimized for large streams of data Lightwave Research Lab, Columbia University
Experimental Setup - Photonic • 2cm×2cm chip • 8×8 Photonic Torus • 28 DRAM Access points (MCs) • 2 DIMMs per DRAM AP • Routers: • 256 b buffers • 32 b packet size • 32 b channels • 32 nm tech. point (ORION) • High Vt • Vdd = 0.8 V • Freq = 1 GHz • Photonics - 13λ System: Photonic Torus Tile Traffic: DRAM: • Random core-DRAM access point pairs • Random read/write • Uniform message sizes • Poisson arrival at 1µs • Modeled with our event-driven DRAM model • DDR3 (10-10-10) @ 1600 MT/s • 8 chips per DIMM, 8 banks per Chip Lightwave Research Lab, Columbia University
Performance Comparison Lightwave Research Lab, Columbia University
Experiment #2 Random Statically Mapped Address Space Lightwave Research Lab, Columbia University
Results Lightwave Research Lab, Columbia University
Network Energy Comparison Electronic Mesh Photonic Torus Power = 0.42 W Power = 13.3 W Total Power = 2.53 W (Including laser power) Lightwave Research Lab, Columbia University
Summary • Extending a photonic network to include access to DRAM looks good for many reasons: • Circuit-switching allows large burst lengths and simplified memory control, for increased bandwidth. • Energy efficient end-to-end transmission • Alleviates pin count constraints with high-density waveguides PhotoMAN Lightwave Research Lab, Columbia University