270 likes | 279 Views
This paper discusses the use of PVT (process, voltage, and temperature) variations to mitigate timing emergencies in multi-core processors. It analyzes the complementary effect of these variations in both the timing and frequency domains and proposes an implementation scheme using delay sensors. Experimental results show the effectiveness of this approach.
E N D
Leveraging the Core-Level Complementary Effects of PVT Variations to Reduce Timing Emergencies in Multi-Core Processors Guihai Yan1, Xiaoyao Liang2, Yinhe Han1, and Xiaowei Li1 1. Key Laboratory of Computer System and Architecture, Institute of Computing Technology, Chinese Academy of Sciences (ICT, CAS) 2. NVIDIA Corporation Jun. 23, 2010
Outline • Introduction to PVT variations • Analyzing “complementary effect” • Timing domain • Frequency domain • Implementation challenges & solutions • Experimental results
Introduction to variations • Variation sources • Process variation • Random dopant fluctuation • Sub-wave length lithography • Voltage variation • Parasitic power delivery networks • Application variability • Inductive noise, IR-drop • Temperature variation • Imbalanced activity • Hotspot • We focus on the primary manifestation • Performance variation
Process variation Sub-wavelength lithography • Sub-wavelength Lithography • “What you get is not what you want” • Systematic • Random dopant fluctuations • Vth variation • Random [Borkar, DAC’09] [Teodorescu, ISCA’08] Max Freq. differentiate by 20% ! P variation is time-independent, “DC component” [Aitken, ATS’07]
Temperature variation Measured PentiumM processor temperatures • Application-specific • Slow-varying • Milliseconds • Typical thermal constant: 2ms [Donald, ISCA’06] T variation is slow-varying, “Low-frequency components”
Voltage variation • Fast-changing • Inductive noise • a.k.a. L(di/dt) problem • IR-drop Why it is harder to keep a constant voltage level ? Example Power budget: 100W Working voltage: 1V Current: 100A To keep voltage fluctuation between ±5%, RPDN < 0.5 mOhm Hierarchical PDN V variation is fast-changing, “High-frequency components”
Resultant impact of PVT variations Fast cores Mild apps. Low temp. Timing (Delay) Variation High temp. Slow cores Violent apps.
Prior solutions • Strive to compensate P, V, and T variation individually • Mitigate P variation • ReCycle[ISCA’06], Body Bias[Micro’07], ReVIVal[ISCA’08] et al. • Stabilize V variation • Pipeline damping[ISCA’03], DeCoR[HPCA’08] et al. • Balance T variation • Hotspot [ISCA’03], DVFS + Activity Migration[ISCA’03, HPCA’01, TODAES’07] et al. • Other timing-oriented solutions • Razor[JSSC’06], EVAL[Micro’08], Tribeca[Micro’09] et al.
Our perspective • Focus on the essential Timing issue Process Delay Voltage Temp. Delay variation Temp. variation Process variation Voltage variation Design Goal: Minimize Delay variation Not Necessarily aggregated, but can cancel off each others in some cases. Hence, “Complementary”
Some terms Timing Emergency • Timing emergency (TE) • Emergency level (EL) • “Density” of TE • Define: EL = # of TE per 100 millions cycles • Violent vs. Mild • Voltage • Large fluctuation = Violent • Small fluctuation = Mild • Temperature • “Hot” = Violent • “Cool” = Mild • Process • Slow corner = Violent • Fast corner = Mild Delay Threshold Time Voltage Traces Violent Mild
How PVT Variations Complement each other ? • Observation in time domain Excessive headroom T. Mild, V. Mild Threshold Core1: Delay T Mild, V Mild T Mild, V Violent Large margin, low EL Time Emergency T. Violent, V. Violent Delay Core2: T Violent, V Mild T Violent, V Violent Little margin, High EL Time What if exchange the threads on Core1 and Core2? Mild + Violent
Frequency domain analysis • Y(f) = FFT(D(t)) • Sample interval: 5ns • Span of analysis: 1ms DC component: “P” Low freq. component: “T” High freq. component : “V”
The strength of each component of PVT variations P T PT Migrate threads = “Graft”V component
Frequency domain analysis (cont.) • Relative frequency spectrum deviations on 2GHz quad-core processor. • P: 0-100Hz, T: 100Hz-1MHz, V: 1MHz-250MHz. • Potential • Core3 and Core4 are mild • Strategy • exchange threads on Core1 and Core4, Core2 and Core 3
How to exploit such “complementary effect”? • Straightforward approach Product test Voltage sensor Temp. sensor Aging sensor Xyz sensor P component V component T component • Pros. • Conceptually simple • Cons. • Slow: V. and T. sensor are slow • Incomprehensive: e.g. what if aging ? • Our approach: Delay sensor-based scheme • Pros. • Fast • Comprehensive (Timing) • Cons. • Need a little trick Delay sensor (P+T) component V component
What we have known Delay variation Delay sensors What we need to know The strength of PT and V component Implementation (cont.) How to bridge the gap? • Three challenges • Infer PVT component from delay Values • On-the-fly thread migration decision-making • On-the-fly variation prediction
Top view of architecture Timing Emergency Aware + Thread Migration TEA-TM
Infer PVT component from Delay Values • Use mean delay to infer PT component(< 1MHz) Mean delay PT component This simplification greatly facilitates cost-efficient implementation of TEA-TM. Then, how about “V component”?
Urgent First Policy (UFP) Do NOT directly rely on accurate V-component Basic idea: Migrate the threads running on the highest EL core to the core with the smallest PT component. —— Always right, but may not be optimum! On-the-fly TEA-TM Decision Making EL = PT “+” V Emergency Level PT Component TM Core1 Core2 Refer to our paper for the more sophisticated “DUFP” heuristic
On-the-fly Variation Prediction • Objective: reducing the emergency level in the future • Emergency Level • PT component • Linear prediction mechanism EL prediction result
Experiments • Methodology • Trace-based evaluation • Modeled processor • Quad-core • Superscalar • 2GHz • PDN • Similar to Intel Xeon 5500 quad-core microprocessor • 130W (peak 150W) • Workload
Metrics • Relative throughput loss Where, • Relative Fairness Where,
Impact of TM interval on average EL reduction Perf. Overhead & EL Reduction • When take migration penalty into account Overall Throughput Large Migration Penalty Large Emergency Rate • No migration overhead accounted • 1ms at 2GHz, migration overhead is negligible • 0.3 ms at 2GHz, migration overhead < 15% Minimal TM Interval
Reduction in Relative Throughput Loss • TM Interval: 0.2ms, Accuracy: 90% • Developing more sophisticated heuristics
Fairness Improvement • 80% fairness improvement
Conclusion • Analyzing the complementary effect • from both time and frequency domain • Presenting a delay sensor-based scheme (TEA-TM) to exploit the comp. effect • Simple, cost-efficient • The experimental results show • Improved throughput • Improved fairness