270 likes | 413 Views
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration. Gilbert Hendry – Columbia University. Johnnie Chan , Shoaib Kamil , Lenny Oliker , John Shalf , Luca P. Carloni , Keren Bergman. TX. RX. Why Photonics?. Photonics changes the rules for Bandwidth, Energy, and Distance.
E N D
Silicon Nanophotonic Network-On-Chip Using TDM Arbitration Gilbert Hendry – Columbia University Johnnie Chan, ShoaibKamil, Lenny Oliker, John Shalf,Luca P. Carloni, Keren Bergman
TX RX Why Photonics? Photonics changes the rules for Bandwidth, Energy, and Distance. OPTICS: • Modulate/receive high bandwidth data stream once per communication event. • Broadband switch routes entire multi-wavelength stream. • Off-chip BW = On-chip BW for nearly same power. ELECTRONICS: • Buffer, receive and re-transmit at every router. • Each bus lane routed independently. (P NLANES) • Off-chip BW is pin-limited and power hungry. RX RX RX RX RX RX TX TX TX TX TX TX TX TX TX
Silicon Photonic Integration Cornell, 2009 Cornell, 2005 Sandia, 2008 Ghent, 2007 Columbia, 2008
Photonic Networks-on-Chip Corona Photonic Clos PhotonicTorus [MIT] [U. of Wisconsin, HP] [Columbia]
Ring Resonators • Modulator/filter • Broadband λ λ
Transmission Circuit-switched P-NoCs Electronic Control Ohmic Heater p-region n-region Thermal Control 1V 0V 0V 1V S D Off-resonance profile Injected Wavelengths On-resonance profile
Circuit-switched P-NoCs Pros: Cons: • Energy-efficient end-to-end transmission • High bandwidth through WDM • Electronic network still available for small control messages* • Network-level support for secure regions • Path setup latency • Path setup contention • (no fairness) • Longer paths block more • Head-of-line blocking at gateways * [G. Hendry et al. Analysis of Photonic Networks for a Chip Multiprocessor Using Scientific Applications. In NOCS, 2009]
Head of Line Blocking Control Router Electronic Crossbar To/From Control plane Core Network IF Core Receivers Deserialization Tx/Rx Core Serialization Drivers Core 5-port photonic switch To/From Data plane Bidirectional Electronic Channel Bidirectional Waveguide External Concentration* * [P. Kumar et al. Exploring concentration and channel slicing in on-chip network router. In NOCS, 2009]
TDM Arbitration t2 tC-1 t1 t4 tC-2 t0 t3 tC-3 Time slot 0 Time slot 1 Time slot T …
Synchronous Gateway/Control Time slot ~ 10ns TDM sync clock ~ 100MHz
Nonblocking Network Scheduling Time slot 0 Time slot 1 Time slot 2 Required time slots = N-1
However… • Nonblocking topology difficult to implement because of Insertion Loss [M. Petracca et al. IEEE Micro, 2008] * [J. Chan et al. Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis. JLT, May 2010
Scheduling Time Slots • Problem: • Blocking Network • Full coverage • Minimize Time Slots • (most comm. per slot) • Constraints: • Source contention • Destination contention • Topology contention
Solution: Genetic Search Initialization Reproduction (back to P) Population (size P) Selection (down to size psxP) Mutation (still P) S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S Slot 0: c0 Slot 1: c1 … Slot N2: cN2 Slot 0: c0, c5, c7, c8 Slot 1: c23, c6, c58 … Slot T: c42, c65, c1 Fitness = 1/(number of time slots)
Reproduction: Birds and Bees S0 S1 c0, c3, c60, c19 c12, c2, c1, c60 c27, c4 c100, c82, c9 c100, c71, c9 c0 … … c1, c17, c23 c89, c56, c16, c63 C c0, c3, c60, c19 c12, c2, c1, c60
Mutation: Secret of the Ooze S c0, c3, c60, c19 c100 c27, c4 c71 c100, c71, c9 c9 … c1, c17, c23 S c0, c3, c60, c19, c9 c100 c27, c4, c100 c71 c9 … c1, c17, c23, c71
Schedule Results • Pop size = 50 • Mutation prob = 0.8 16-node 36-node 64-node
Implementation: Photonic Switch • 200µm rings • Total switch size = 1.4mm x 1.4mm • No • S->W, S->E, N->W, N->E (X-then-Y routing)
Implementation: Switch Control • Width of LUT = 12 (number of rings) • Length of LUT = T (number of time slots)
Implementation: Network Gateway • 1. Send request • 2. Grant, set x-bar and transmit to serializer • 3. Receive, deserialize • 4. Store in temp buffer, request to core
Simulation Setup • PhoenixSim* – Photonic and Electronic network simulator • 64 cores • E-mesh, P-mesh, P-TDM • Traffic • Random – 32B, 1kB, 32kB messages • Scientific application traces * [Chan et al. PhoenixSim: A Simulator for Physical-Layer Analysis of Chip-Scale Photonic Interconnection Networks. In DATE 2010]
Results – Random Traffic 32B 1kB
Results – Random Traffic 32B 1kB 32kB
Conclusion • TDM implements fairness • TDM improves network utilization • Genetic Search useful for finding full-coverage static schedule • Future Work: • Scaling gracefully* • Reducing time slots* • Dynamic scheduling • Contact: gilbert@ee.columbia.edu * [Hendry et al. Time-Division-Multiplexed Arbitration in Silicon Nanophotonic Networks-on-Chip for High Perf. CMPs. In JPDC, Jan 2011]