370 likes | 631 Views
Low Power FPGA Using Pre-defined Dual-Vdd / Dual-Vt Fabrics. Authors: Fei Li, Yan Lin and Lei He EE Department, UCLA Link :http://eda.ee.ucla.edu/pub/c43.pdf Presented by: Ahmed Abdelgawad. Outline. Background and Motivation Configurable Dual-Vdd/Dual-Vt FPGA Circuits Architectures
E N D
Low Power FPGA Using Pre-defined Dual-Vdd / Dual-Vt Fabrics Authors: Fei Li, Yan Lin and Lei He EE Department, UCLA Link :http://eda.ee.ucla.edu/pub/c43.pdf Presented by: Ahmed Abdelgawad
Outline • Background and Motivation • Configurable Dual-Vdd/Dual-Vt FPGA • Circuits • Architectures • Design Flow • Experimental Results • Conclusions
Power Limitation of FPGAs • Existing FPGAs are HIGHLY power inefficient • Over 100X power overhead vs. ASIC • Power is likely the largest limitation for FPGAs
Studies to reduce FPGA power There have been several studies to reduce FPGA power • Introduce hierarchical interconnects to reduce interconnect power but they do not consider deep sub-micron effects such as the increasingly large leakage power • Developed a flexible power evaluation framework fpgaEva-LP and performed dynamic and leakage power evaluation for FPGA with cluster-based logic blocks and island style routing structure
Solution Multi-Vdd /multi-Vt fabric and layout pattern must be pre-defined in FPGAs
Challenges to apply multi-Vdd /multi-Vt to FPGA • Leakage power becomes a large portion of the total FPGA power in 100nm technology and below. It is mainly because LUT-based FPGAs use a large number of SRAMs to provide the programmability • FPGA do not have the freedom of using mask pattern to arrange different Vdd/Vt components in a flexible way as ASICs
In this paper • They perform the first type of studies on the dual-Vdd and dual-Vt FPGA fabrics. • They design FPGA circuits with dual-Vdd /dual-Vt to effectively reduce dynamic and leakage power. • They propose FPGA fabrics employing dual-Vdd /dual-Vt techniques. • They develop new CAD algorithms including power-sensitivity based voltage assignment and simulated-annealing based placement • They then discuss the pre-defined dual- Vdd /dual-Vt FPGA fabrics.
LUT-SVST The schematic of a 4-LUT using single Vdd and single Vt (LUT-SVST).
Voltage Scaling for Single Vdd /Vt LUTs • Vdd scaling of LUT-SVST is effective to reduce dynamic power because dynamic power is quadratically proportional to the supply voltage. • However, aggressive Vdd scaling can introduce large delay penalty. • It is important to decide appropriate Vt corresponding to the Vdd level for best power-delay trade-off.
Voltage Scaling for Single Vdd /Vt LUTs Delay versus different Vdd scaling schemes for a 4-LUT.
Voltage Scaling for Single Vdd/Vt LUTs • Although fixed-Vdd/Vt-ratio is promising to alleviate delay penalty compared to constant-Vt, leakage power increases greatly in this scaling scheme. This is because the leakage current increases exponentially when Vt reduces. Leakage power (at 100.C) versus different Vdd scaling schemes for a 4-LUT.
Voltage Scaling for Single Vdd/Vt LUTs • Since leakage power has already been a large portion of total FPGA power in nanometer technology, FPGA designs cannot afford the increasing leakage power by the fixed-Vdd /Vt-ratio scaling scheme. • Based on the above two Vdd scaling schemes, the constant-leakage Vdd scaling scheme is propose. • For each Vdd level, it have to adjust the threshold voltage to maintain an almost constant leakage power across all the Vdd levels.
Low-leakage SRAM and Dual Vt LUTs Thy design low-leakage LUTs with single Vdd and dual Vt (named as LUT-SVDT). The schematic of a 4-LUT using single Vdd and dual Vt (LUT-SVDT).
Low-leakage SRAM and DualVt LUTs • Note that the two regions are DC disconnected due to the inverters at the output of the SRAM cells. • The content of the SRAM cells does not change after the LUT is configured and the SRAM cells always stay in the read status. • Therefore, we can increase the threshold voltage of region I to reduce leakage power without introducing runtime delay penalty. • We determine Vdd and Vt in a LUT-SVDT as follows. • For region II, we decide the Vdd/Vt combination by constant-leakage Vdd scaling scheme. • For region I, we use the same Vdd as region II but increase Vt
LUT-SVST and LUT-SVDT • LUT-SVDT obtained an average 2.4X LUT leakage reduction compared to LUT-SVST at different Vdd levels. The delay of LUT-SVDT is almost same as LUT-SVST Delay and power comparison between LUT-SVST and LUT-SVDT in the ITRS 100nm technology
LUT-SVST and LUT-SVDT • The high-Vt low-leakage SRAM cells can be used for programmability of both interconnects and logic blocks. • Ideally, we can increase Vt as high as possible to achieve maxima leakage reduction without delay penalty. • However, an extremely high Vt increases the SRAM write access time and slows down the FPGA configuration speed. • They decide to increase the Vt of SRAM cells for 15X SRAM leakage reduction. It increases the configuration time only by 13%.
FPGA Fabrics A FPGA with cluster-based logic blocks and island style routing structures.
FPGA Fabrics The new fabric with dual Vdd and dual Vt arch-DVDT. • It uses low-leakage SRAM cells for all LUTs and interconnects, and employs one single Vdd inside one logic block. • But logic blocks across the FPGA chip can have different supply voltages. The physical locations of these logic blocks define a dual-Vdd layout pattern.
FPGA Fabrics DVDT. Pre-designed dual-Vdd layout patterns for dual-Vdd logic block fabric.
Level Converter Design • For a dual-Vdd FPGA fabric, the interface between a VddL device and a VddH device must be designed carefully to avoid the excessive leakage power. • If a VddL device drives a VddH device and the VddL device output is logic ‘1’, both PMOS and NMOS transistors in the VddH device will be at least partially “on”, dissipating unacceptable amount of leakage power due to short circuit current. • A level converter should be inserted to block the short circuit current
when the input signal is logic ‘1’, the threshold voltage drop across NMOS transistor ‘n1’ can provide a virtual low supply voltage to the first-stage inverter (p2,n2), so that p2 and n2 will not be partially “on”. When the input signal is logic ‘0’, the feedback path from node ‘OUT’ to PMOS transistor ‘p1’ pulls up the virtual supply voltage to VddH and inverter (p2,n2) generates a VddH signal to the second inverter so that no DC short circuit current exists. Level Converter Design
DESIGN FLOW FOR DUALVDD/ DUALVT FPGAS • CAD algorithms need to be developed to leverage the proposed FPGA fabrics with dual Vdd and dual Vt. • The input data is a single-Vdd gate-level netlist and it is optimized by SIS and mapped to LUTs by RASP • They then start the physical design. • Generate the basic circuit netlist (BC-netlist). • The BC-netlist is annotated with capacitance, resistance as well as supply voltage level if dual Vdd is applied. • Performing the power estimation and timing analysis on the BC-netlists to obtain the power and performance. • An enhanced version of fpgaEva-LP is developed to handle dual-Vdd/dual-Vt FPGA power estimation.
DESIGN FLOWFOR DUALVDD/ DUALVT FPGAS Design flow for dual-Vdd/dual-Vt FPGAs.
Dual Vdd Assignment • They select the logic block with the largest power sensitivity and assign low Vdd to it, and update the timing information. • If the new critical path delay exceeds the user-specified delay increase bound, they reverse the low-Vdd assignment. • Otherwise, They keep this assignment and go to next iteration. • In either case, the logic block selected in this iteration will not be re-visited in other iterations. • Right after the dual-Vdd assignment, They can estimate the power and delay for the dual-Vdd BC- netlist.
Dual Vdd Assignment • However, this dual-Vdd BC-netlist does not consider the layout constraint imposed by the pre-designed dual-Vdd pattern. • It assumes the flexibility to assign low-Vdd to a logic block at arbitrary physical location. • This is the ideal case for fabric arch-DVDT. • To obtain real case power and delay considering the layout pattern constraint, They use this dual-Vdd netlist as an input and perform dual-Vdd placement and routing.
Placement and Routing for Dual Vdd FPGA Fabric • The input data is the dual-Vdd BC-netlist generated by dual-Vdd assignment. The dual-Vdd placement considers the layout constraint in arch-DVDT • A dual- Vdd placement is based on the simulated annealing algorithm implemented in VPR. • VPR placement tool models an FPGA as a set of legal slots or discrete locations, at which logic blocks or I/O pads can be placed.
Placement and Routing for Dual Vdd FPGA Fabric Placement and routing for dual-Vdd fabric arch-DVDT
EXPERIMENTAL RESULTS The ratio between VddL row (cell) number and VddH row (cell) number for arch-DVDT.should be set to 3:1
EXPERIMENTAL RESULTS For the Vdd range in the experiments, arch-SVDT achieves power saving from 9% to 18% Power versus delay for alu4.
EXPERIMENTAL RESULTS For the Vdd range in our experiments, arch-SVDT achieves power saving from 12% to 26% Power versus delay for big key.
EXPERIMENTAL RESULTS • Arch-DVDT can further obtain more power reduction at the higher clock frequency, however, the dual-Vdd technique does have some extra overhead. • Level converters inserted between VddL block and VddH block consume extra power. • As shown in the lower frequency region, the overhead of dual-Vdd fabric exceeds the benefit it can bring and arch-DVDT achieves less power savings compared to arch-SVDT for arch-DVDT. • It implies that not all the potential power reduction via introducing dual Vdd is achieved by the current fabric and CAD algorithms.
CONCLUSIONS • They design FPGA circuits with dual-Vdd/dual-Vt to effectively reduce both dynamic power and leakage power. • They define dual-Vdd/dual-Vt FPGA fabrics based on the profiling of benchmark circuits. • They further develop CAD algorithms including power-sensitivity based voltage assignment and simulated-annealing based placement to leverage such fabrics.
CONCLUSIONS • Compared to the conventional fabric using uniform Vdd/Vt at the same target clock frequency, the new fabric using dual Vt achieves 9% to 20% power reduction. • However, the pre-defined FPGA fabric using both dual Vdd and dual Vt only achieves on average 2% extra power reduction.
REFERENCES • E. Kusse and J. Rabaey, “Low-energy embedded FPGA structures,” in ISLPED, 1998 • F. Li, Y. Lin, L. He, and J. Cong, “FPGA power reduction using configurable dual-Vdd,” Tech. Rep. UCLA Eng. 03-224, Electrical Engineering Department, UCLA, 2003. • J. T. Kao and A. P. Chandrakasan, “Dual-Threshold Voltage Techniques for Low-Power Digital Circuits,” in IEEE Journal of Solid-state circuits, 2000. • F. Li, D. Chen, L. He, and J. Cong, “Architecture evaluation for power-efficient FPGAs,” in ISFPGA, 2003.