200 likes | 357 Views
Handling Global Traffic in Future CMP NoCs. Ran Manevich, Israel Cidon, and Avinoam Kolodny. . Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel. QNoC. Research. Group. SLIP 2012. Bandwidth Version of Rent’s Rule. B – Cluster external bandwidth.
E N D
Handling Global Traffic in Future CMP NoCs Ran Manevich, Israel Cidon, and Avinoam Kolodny. Electrical Engineering Department Technion – Israel Institute of Technology Haifa, Israel QNoC Research Group SLIP 2012
Bandwidth Version of Rent’s Rule B – Cluster external bandwidth. k – Average bandwidth per module. G – Number of modules in a cluster. R – Rent’s exponent, 0<R<1. B = kGR G = 16 B = ∑ Greenfield et al., “Implications of Rent’s Rule for NoC Design and Its Fault-Tolerance”, NOCS 2007
CMP NoC Traffic Follows Rent’s Rule 2D Mesh NoC ~Average of CMP parallel programs* *Heirman et al., “Rent’s Rule and Parallel Programs: Characterizing Network Traffic Behaviour”, SLIP 2008
2D Mesh – Packets Classification by Distance • For illustration purposes, packets are classified according to distances between sources and destinations. • Nearest Neighbor (NN) – • Dist = 1 • Local – 1<Dist<2+K/8 K=16 K=8 • Global – Dist ≥ 2+K/8
Fraction of global packets decreases in large systems Rent’s exponent (R) = 0.7 (Nearest Neighbor)
Dominance of Global Packets in BW/Router and Light Load Latency Nearest Neighbor traffic is dominant in small systems. * • In large systems: • Global packets are minority. • Global packets dominate BW/router and average latency. *Zarkesh-Ha et al., “Hybrid Network on Chip (HNoC): local buses with globalmesh architecture”, SLIP 2010
Problem!!! • In large systems, global packets (minority): • Consume most of the network’s BW. • Significantlyincrease averagelight load latency.
Solution - PyraMesh • Hierarchical 2D mesh. • Global packets are routed through higher hierarchy levels. • Overall hops-count is reduced. Dest. • Average latency is reduced. Source 4 5 6 1 7 2 8 3 hops • Average BW per router is reduced. instead of 14!
PyraMesh - Architecture K – The size of the base mesh. NL – Number of levels. NP – Number of pyramids on top of the base mesh. αi – Ratio between the sizes of levels iand i+1. Ci – Number of routers in level i that are connected to a router in level i+1 along a single dimension. K = 8, NL = 2, NP = 4 αi= 4, Ci= 1 K = 8, NL = 3, NP = 1 αi= 2, Ci= 1 K = 8, NL = 2, NP = 1 αi= 4, Ci= 2
PyraMesh – Addressing and Routing • Addressing – On each level i, node (X,Y)Base Mesh is represented by the nearest router in the North-East quarter: • Routing – XY:
PyraMesh – Packets Classification • Packets are distributed among levels iaccording to their travel distance (D) in the base mesh. • DThi – Distance threshold of level i. • If D > DThi , the packet is directed to level i+1. • Example: DThi= 6, 12, 20
PyraMesh – Optimization Area overhead, Wiring overhead, Maximum bandwidth per router*, Average light-load latency*=F(K,NL,NP,αi,Ci,Dthi*,R*) OPTIMIZATION OBJECTIVES CONSTRAINTS
Optimization Results Example of 16x16 System, R = 0.7 • Light load latency optimized PyraMesh: Packets distance thresholds D>8 5<D≤8 D≤5 • Throughput optimized PyraMesh: D>18 6<D≤18 D≤6
Light Load Latency Performance BMesh – The baseline mesh HNoC – Scaled Mesh (SMesh) – Links wider than in BMesh by PyraMesh area overhead factor.
Our Contributions • Characterization of Rentian traffic in large NoCs. • The observation that global packets limit scalability of large systems. • PyraMesh – A novel framework for hierarchical NoCs design.
Conclusions • Global packets limit performance in large (future) CMP systems. • PyraMesh – A novel class of hierarchical 2D mesh topologies. • PyraMesh handles global traffic in future CMP NoCs.
Related Work CMesh J. D. Balfour and W. J. Dally. “Design tradeoffs for tiled CMP on-chip networks”. International Conference on Supercomputing, 2006. GigaNoC • C. Puttmann, J.-C. Niemann, M. Porrmann, and U. Rückert. “GigaNoC – A hierarchical network-on-chip for scalable chip-multiprocessors.” Euromicro DSD 2007. Hierarchical 2-Levels 2D Mesh • Markus Winter and Steffen Prusseit and Gerhard P. Fettweis. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. ISOCC 2010. Hierarchical Rings on a Mesh • S. Bourduas and Z. Zilic. “Latency reduction of global traffic in wormhole-routed meshes using hierarchical rings for global routing”. ASAP 2007. Long Range Links • U. Y. Ogras and R. Marculescu. “ ‘It’s a small world after all’: NoC performance optimization via long-range link insertion”. IEEE Trans. on Very Large Scale Integr. (VLSI) Syst. 2006.