1 / 52

Microprocessor Design in the Face of Process Variations

Explore the impact of process variations on microprocessor designs and innovative strategies to mitigate performance deviations. Learn about resilient pipeline and adaptive cache concepts overcoming physical and environmental factors in semiconductor fabrication, with case studies and practical outcomes.

mrivas
Download Presentation

Microprocessor Design in the Face of Process Variations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microprocessor Design in the Face of Process Variations Csaba Andras Moritz Electrical & Computer Engineering University of Massachusetts, Amherst

  2. Outline • Introduction • Impact of Process Variations • A Process Variation Resilient Pipeline • A Process Variation Resilient Adaptive Cache Architecture • Results • Conclusion

  3. Introduction • As technology scales, the feature size reduces thereby requiring a sophisticated fabrication process. • The process variations increase as the feature reduces due to the difficulty of fabricating small structures consistently across a die or a wafer. • These variations cause mismatches between identical structures. • With respect to circuits, this translates to a change in all devices or interconnects parameters from their mean value. Device and interconnect variation trends for different technology generations Trends since 2007 even worse Sani Nassif, etl. “Models of Process Variations in Device and Interconnect”. IEEE Press 2000

  4. Introduction • Two main sources of process variation: • Physical factors (intrinsic variation) • Environmental factors (dynamic variation) • The physical factors are permanent and result from limitations in the fabrication process • Effective Channel Length(Geometric Variations): • Imperfections in photolithography (mask, lens, photo system deviations) • Threshold Voltage(Electrical Parameter Variation): • Variation in device geometry • Random dopant fluctuations • changes in oxide thickness • The environmental factors depend on the operation of the circuit and include variations in: • Temperature, Power Supply, Switching Activity • The performance and power consumption of integrated circuits can be greatly affected.

  5. Pipeline design • 10-20 gate delays typically • Smaller in very high clock-speed designs • Let us review variation with a NAND chain

  6. 15 NAND Gates A = “1” B = “0”→“1” C = “1”→“0” “1” “1” “1” Cload VBP C A VBN VBN B 15 NAND gates and NAND2

  7. Assumptions • The designs we show target a 32-nm technology process where leakage and process variation started to be significant • Problem exacerbated in later nodes • In the nominal delay we assume there is no process variations impact on the pipeline stage. • In worst-case we assume the worst values of the parameter variations at each transistor that will result in the maximum delay or power consumption. • A body bias is a voltage applied between the source or drain of a transistor and its substrate, effectively changing the transistor’s Vth. • Depending on the polarity of the voltage applied, Vth increases or decreases. If it increases, the transistor becomes less leaky and slower (reverse body bias); if it decreases, the transistor becomes leakier and faster (forward body bias). • Table 1 shows parameter values of process variations for different cases. Figure 3 and Table 2 show delay of the pipeline at different body bias voltages. Figure 4 and Table 3 show average power consumption of the pipeline stage with different body bias voltages.

  8. Device parameter variations • Leff, Vdd, and Vth Table 1. Parameter values for different cases

  9. Table 2. Delay of the pipeline stage. Delay of Pipeline Stage “A 28nm SOI test chip by STMicroelectronics ARM Cortex-A57 used forward body bias to increase the peak frequency to 3GHz. By operating at a reduced nominal 0.5V supply and FBB, the circuitry ran at a much slower 300MHz but with reduced power consumption.” Source: ST Microelectronics

  10. Delay of Pipeline Stage

  11. Power of Pipe Stage Table 3. Average power of the pipeline stage.

  12. Average Power with BB

  13. Effect of BB on delay and power Table 4. Effect of Body Bias Technique.

  14. Delay Distribution

  15. All parameters summary Table 5. Effect of all parameters on pipeline delay

  16. Nominal Power Distribution

  17. Summary power consumption Table 6. Effect of all parameters on pipeline power consumptions.

  18. Razor Latches • Latch concept to sample output of a stage two different times • Compare outputs • If not equal resample inter-stage latch and delay pipeline by one cycle • Implications?

  19. Recovery Technique 1: Global Clock Gating If any stage detects a timing problem Stall the entire pipeline for one clock cycle. Use this additional clock cycle to recompute using the correct shadow-latch values

  20. Recovery Technique 2:Counterflow Pipelining When a mismatch (between regular and shadow latch contents) is detected: Assert a bubble signal, to specify that the erring pipeline slot is now to be considered a bubble. In the subsequent cycle, inject the shadow latch value into the next stage, allowing the errant operation to continue with the correct values Trigger a flush train, traveling backwards from the errant stage, flushing operations at each stage it visits

  21. Process Variation Impact on Memory Systems • The process variations are expected to become significant in the smaller geometry transistors commonly used in memories. • Process variations in caches affect the performance of circuits like • Sense amplifiers that require identical device characteristics • SRAM cells that require near-minimum-sized cell stability for large arrays in embedded, low-power applications • The delay of the address decoders suffer from the process variations that can result in shorter time left for accessing the SRAM cells • Question is whether there is a significant delay variation overall that will drive a change in memory architecture design.

  22. Motivation • To account for the worst-case scenario we might need to increase the cache access time by 2 to 3 cycles in conventional design. • Application performance could be impacted by as much as 30-40%! • These results suggest that process variations must be taken into consideration • New types of circuits and architectures?

  23. Introduction • There are several ideas that could be exploited in a memory system: • reduce performance by operating at a lower clock frequency (conservative approach) • increase cache access latency assuming worst-case delay (conservative approach) • variable-delay cache architecture (adaptive approach)

  24. Virtual Address: 31 9 8 5 4 2 1 0 Word Tag Bank Byte 16 Banks 8 words Cache Bank 32 lines Matchline Data SRAM CAM Tags Data MUX Cache Organization Overview • The focus of this presentation is on CAM-based caches.

  25. Critical Path of CAM-tag Cache

  26. Experiment Setup • Cadence tool was used to design the circuits at layout level, and HSPICE simulation used to evaluate the performance. • All the circuits were designed using 32-nm CMOS technology and simulated with a supply voltage of 0.9V. Configuration of our 16 KB Low Power Cache

  27. Worst-Case Conditions • Effective Channel Length variation: • Imperfections in photolithography (mask, lens, photo system deviations) • A 40% variation in Leff is expected within a die [Sani Nassif, IEEE press 2000].

  28. Worst-Case Conditions • Effective Channel Length variation: • A small variation in the Leff value causes a change in the leakage power by as such as 60X from the nominal value.

  29. Worst-Case Conditions • Threshold Voltage Variation: • Accurate control of Vth is very important for many performance and power optimizations and for correct execution.

  30. Worst-Case Conditions • Threshold Voltage Variation • The impact on leakage power could be as much as 40X.

  31. Worst-Case Conditions • Power Supply Variation • One of the most important environmental factors that cause variations in operating condition is supply voltage. • Voltage variations due to non uniform power-supply distribution, switching activity, and IR drop; • A total variation of 15% in Vdd was considered with a nominal value of 0.9V.

  32. Expected Conditions • To accurately predict cache critical path delay distribution at the circuit level, cache delay variability can be studied through Monte-Carlo in HSPICE circuit simulations. • Monte-Carlo simulations verify model predictions over a wide range of process and design conditions and provide an estimate for expected behavior. • We assume parameter variations to be normally distributed with mean and sigma values derived from PTM and ITRS sources. Parameter values and σvariations

  33. Nominal Expected Conditions • The distribution of delay of a cache critical path was determined by performing Monte-Carlo sampling at different supply voltages, threshold voltages, and transistor lengths. • under the expected condition a large fraction of accesses would be still close to the nominal value

  34. Architectural Techniques • How do we design a memory system in the face of process variations and help mitigate the negative impact on performance? • We can select a cache design using worst case assumptions • ALL VARIATIONS and ALL COMPONENTS on the critical path • Alternatively, we need to design circuits and architectures that would work adaptively depending on actual delay • Process variation resilient design • Resilience against delays in different parts of the cache

  35. F D EX MEM WB address data CAM Tag Data Array Adaptive Controller Test Mode Classifier Delay Storage Proposed Adaptive Cache Architecture • Two phases of operation: classification and execution

  36. Data Array Row Address Delay Storage Column MUX Sense Amplifiers Speed Information BIST Test Mode Data Out Operating Conditions Classification Phase • During classification phase • The cache is equipped with a built-in-self-test (BIST) technique to detect speed difference due to process variation. • Each cache line is tested using BIST when the test mode signal is on. A block is considered medium, slow, failure.

  37. Data Array Row Address Delay Storage Column Address Column MUX Controller Sense Amplifiers Data Out Execution Phase • During execution phase • The speed information stored in the delay storage is used to control sense amplifiers during regular operations of the circuit.

  38. Experimental Setup SimpleScalar parameters for CPU • The adaptive cache architecture is implemented in the SimpleScalar. • We have conducted simulations of SPEC2000 benchmarks using the adaptive approach. • The adaptive cache based on the delay distribution is determined by the Monte-Carlo simulation.

  39. Performance Speedup • Baseline: 3 cycle D-cache with worst-case delay, 16KB total size, 16 banks each 32-way. Out of order 4-way issue. • Adaptive caching scheme: 1% 3 cycle, 24% 2 cycle. 75% 1 cycle cache line access. • Results below show performance is improved by 9% to 31%!

  40. Sensitivity to Issue Width • Speedup values are normalized with respect to the worst-case delay of 3 cycles. • As we can see, the 8-way issues design benefits more than the 4-way issues from the adaptive cache architecture.

  41. Hardware Required • Hardware required : • BIST circuit • delay storage • control circuitry • We have evaluated the hardware needed for the adaptive cache by using the Synopsys Design Compiler tool.

  42. Power Issues

  43. Leakage Power Variation

  44. Leakage (contd.)

  45. Leakage (contd.)

More Related