290 likes | 452 Views
Power and Area Modeling of NoC Components. ECE 284 On-Chip Interconnection Networks Spring 2013. Example Modeling Problem. Suppose we are building an application-specific NoC Some routers may need 3 ports, some 5, etc, depending on network topology
E N D
Power and Area Modeling of NoC Components ECE 284 On-Chip Interconnection Networks Spring 2013
Example Modeling Problem • Suppose we are building an application-specific NoC • Some routers may need 3 ports, some 5, etc, depending on network topology • Some links/ports may need 64-bit wide busses, some 128-bits, etc., depending on data rate requirements • Some links/ports/routers may operate at a higher clock frequency, depending on data rate • Some ports may need different #VCs and/or #buffers/VC, etc, depending on expected congestion • How do we accurately estimate how much power and area each configuration requires? 2
NoC Modeling So Far… (ORION) Arbiter SRC BUF I SINK Link BUFE Link XBAR Link BUFW Link Link BUFN Link Link BUFS Link Leakage power ORION1.0 (2002) ORION2.0 (2009) Clock power 6NOR + 2INV + DFF 6NOR + 2INV + DFF
What Is The Problem? Arbiter • Microarchitecture mismatch • RTL code mismatch • Logic transformation and technology mapping mismatch SRC BUF I SINK Link BUFE Link XBAR Link BUFW Link Link BUFN Link Link BUFS Link 6NOR + 2INV + DFF
How Bad Is It? Router RTL generators: Netmaker – Cambridge, UK Stanford NoC - Stanford 460% 89% • Why such large errors? • Assumed logic template inaccurate • Control logic not modeled • Implementation details missing
Improved NoC Router Power-Area Models technology parameters implementation parameters • interconnect parameters • device parameters • LEF/Capacitance Tables/etc. • … • Target frequency • Chip aspect ratio • Row utilization architectural parameters • # of ports; # of buffers • flit-width; # of VC • voltage, frequency • Built using router layout data • Closed-form models suitable for design space exploration • Provides significant accuracy improvement compared with existing models (e.g., ORION 2.0) 6
Implementation Flow and Tools RTL generation from architecture Timing-driven synthesis, place and route flow Use range of architectural and implementation parameters to capture design space Nonparametric regression modeling Router RTL (Netmaker) Architectural Parameters Synthesis (Design Compiler) Implementation Parameters Power and Area Models Place + Route (SOC Encounter) Model Generation (Multiple Adaptive Regression Splines) Power / Area Reports 7
Design of Experiments Netmaker (Cambridge) fully synthesizable router RTL codes Libraries: TSMC (1) 130G, (2) 90GP, and (3) 65GP Tool Chain: Synopsys Design Compiler (DC), Cadence SOC Encounter (SOCE), Salford MARS 3.0 Experimental axes: Technology nodes: {130nm, 90nm, 65nm} Implementation parameters: fclk = target clock frequency ar = aspect ratio util = row utilization Architectural parameters: fw = flit-width nvc = number of virtual channels nport = number of input/output ports lbuf = buffer length (#flit buffers / VC) 8
Modeling Problem → • Accurately predict y given vector of parameters x • Difficulties: (1) which variables x to use, and (2) how different variables combine to generate y • Parametric regression: requires a functional form • Nonparametric regression: learns about the best model from the data itself For our purpose, allows decoupling of underlying architecture / implementation from modeling effort • We use nonparametric regression to model power and area of an on-chip router → 9
Multivariate Adaptive Regression Splines (MARS) • MARS is a nonparametric regression technique • MARS builds models of form: • Each basis function Bi(x) can be: • a constant • a “hinge” function max(0, c-x) or max(0, x-c) • a product of two or more hinge functions • Two modeling steps: • (1) forward pass: obtains model with defined maximum number of terms • (2) backward pass: improves generality by avoiding an overfit model ^ → → 10
Power and Area Modeling • Derive models for both dynamic and leakage power • Dynamic power is due to switching capacitance (cswitching) • Pdynamic = 0.5×α×cswitching×V2×fclk • Leakage power is due to leakage current (ileak) (subthreshold + gate) • Pleakage = ileak×V • Our modeling task: • To model dependence of (Pdynamic / α×V2×fclk)on microarchitectural and implementation parameters • To model dependence of (Pleakage / V) on microarchitectural and implementation parameters • Similarly, we model dependence of extracted area on microarchitectural and implementation parameters • Area is the sum of standard cell area 11
Example MARS Output Models (1) Dynamic power model of a router in 65nm technology B1 = max(0, nport - 5); B2 = max(0, 5 – nport); … B34 = max(0, fclk - 200)×B1; B35 = max(0, 200 - fclk) B1; Pdynamic = 0.5×α×(0.83 + 0.64×B1 - 0.31×B2 + 0.16×B3 … - 0.003×B33 + 0.003×B34 - 0.003×B35)×V2 Leakage power model of a router in 65nm technology B1 = max(0, nport - 5); B2 = max(0, 5 - nport); … B34 = max(0, nvc - 3)×B27; B35 = max(0, 3 - nvc)×B27; Pleakage = (0.13 + 0.04×B1 - 0.04×B2 + 0.01×B3 … - 6.59E-5×B34 - 5.53E-5×B35)×V 12
Example MARS Output Models (2) Area model of a router in 65nm technology B1 = max(0, nport - 5); B2 = max(0, 5 - nport); … B34 = max(0, 24 - fw)×B14; B35 = max(0, fclk - 100)×B15; Area = 0.02 + 0.01×B1 - 0.004×B2 + 0.003×B3 … - 4.59E-6×B34 - 1.23E-7×B35 Total wirelength model of a router in 65nm technology (NEW) B1 = max(0, nport - 5); B2 = max(0, 5 - nport); … B33 = max(0, 1 - ar)×B26; B34 = max(0, util - 0.7)×B8; WLtotal = 112269 + 64952.4×B1 - 31881.3×B2 … + 157.639×B33 - 321.06×B34 • Closed-form expressions with respect to architectural and implementation parameters • Suitable to drive early-stage architecture-level design exploration 13
Model Comparison (1) Comparison against ORION 2.0 w.r.t. microarchitectural parameters: (1) #VC (nvc), (2) flit-width (fw), (3) #port (nport), and (4) buffer length (lbuf) 14
Model Comparison (2) Power estimation error reductions Reg.: avg error 76.2% (24.4% 5.8%), max error 45.2% (108.4% 59.4%) ORION 2.0: avg error 82.3% (32.8% 5.8%), max error 27.4% (81.8% %59.4) Area estimation error reductions Reg.: avg error 79.4% (26.2% %5.4), max error 45.5% (111.3% 61.8%) ORION 2.0: avg error 83.8% (33.3% 5.4%), max error 28.3% (86.2% 61.8%) 15
Metamodeling • MARS (Multivariate Adaptive Regression Splines) is one metamodeling technique for building non-parametric models from training data. • But non-parametric modeling (and machine learning in general) has been a very active area of research with different non-parametric modeling techniques/ 16
Brief Background on Metamodeling • General form of estimation where, Predicted response deterministic response Random noise function Regression coefficients 17
Metamodel Classification • Tree-based • MARS • Gaussian process-based • RBF (Radial Basis Function) • KG (Kriging) 18
Regression Function: MARS where, Ii : # interactionsin the ith basis function bji: ±1 xv: vth parameter tji: knot location Knot = value of parameter where line segment changes slope 19
Regression Function: RBF where, aj: coefficients of the kernel function K(.): kernel function µj: centroid rj: scaling factors 20
Regression Function: KG where, R(.): correlation function (Gaussian, linear, spherical, cubic, …) : correlation function parameter 21
Multicollinearity at High-D • If is a linear combination of one or more ’s • Matrix (N x D) of parameters ’s is ill-conditioned • Large variance in ’s • Proper relationship between ’s and is hard to determine • Impact on estimation results • Large errors between and as Dincreases • Diagnostic tests to detect multicollinearity • Variance Inflation Factor (VIF) • F-test • ANOVA 22
Hybrid Surrogate Modeling • “Cure” adverse effects of multicollinearity as D increases • Variant of Weighted Surrogate Modeling but uses least-squares regression to determine weights where, w1: weight of predicted response of surrogate model for MARS w2 : weight of predicted response of surrogate model for RBF w3 : weight of predicted response of surrogate model for KG 23
Metamodeling Flow Generate golden data points Generate test data points Derive model (MARS/RBF/KG/…) Generate training samples (LHS, AS) Surrogate models Estimate response Compute model accuracy 24
Latin Hypercube Sampling • Sample uniformly (“exploration”) across parameter space • Only 5 samples Error 25
Adaptive Sampling • Sample using “exploration” and “exploitation” across parameter space • Only 5 samples Error 26
Maximum Estimation Error: NoC • With a training sample set size of 36 data points • RBF and KG (Gaussian process-based) have in general 1.5x less error than MARS (tree-based) • HSM can have up to 3x less error than MARS 27
Modeling Other NoC Components and Applications • Can apply the same training set to non-parametric models approach to other NoC components like the network link. • This approach has also been successfully applied to other VLSI CAD problems like Clock-Tree Synthesis and Power Network modeling. 28