430 likes | 447 Views
Design and Impementation of a Sub-threshold BFSK Transmitter. By: Suganth Paul # Rajesh Garg $ Sunil P. Khatri $ Sheila Vaidya % # Intel Corporation, Austin, TX $ Department of ECE, Texas A&M University, College Station, TX % Lawrence Livermore National Lab., Livermore, CA. Outline.
E N D
Design and Impementation of a Sub-threshold BFSK Transmitter By: Suganth Paul# Rajesh Garg$ Sunil P. Khatri$ Sheila Vaidya% #Intel Corporation, Austin, TX $Department of ECE, Texas A&M University, College Station, TX %Lawrence Livermore National Lab., Livermore, CA
Outline • Sub-threshold circuits – the opportunity • Challenges • Process/temperature/voltage variations • Solution – dynamic body bias • Validation via test chip • Design methodology • Silicon results • Conclusions
The Opportunity • Power consumption has become a major issue for recent ICs • There is a large and growing class of applications where power reduction is paramount – not speed. • Such applications are ideal candidates for sub-threshold circuit design • Compared traditional circuit with sub-threshold (obtained by simply setting VDD < VT) • Performed simulations for 2 different processes on a 21 stage ring oscillator. • Impressive power reduction (100X – 500X) • Power-Delay-Product (P-D-P) improves by as much as 20X • P-D-P is an important metric to compare circuit design styles
Sub-threshold Logic • Ids has an exponential dependence on process, voltage and temperature (PVT) • Need to stabilize the circuit performance by compensating for PVT variations • No approach to compensate sub-threshold delay • Existing approaches compensate sub-threshold currents • To compensate delay, need a representative circuit • Not easy to come up with representative circuit for standard cells
Our Solution • We propose a technique that uses self-adjusting body-bias to phase-lock the circuit delay to a beat clock. • Use a network of PLAs to implement circuits. • Several PLAs in a cluster share a common nbulk node. • A representative PLA in each cluster is chosen to phase lock the delay of the PLAs to the beat clock • If the delay is too high, a forward body bias is applied to speed up the representative PLA. • If the delay is low, body bias is brought back down to zero to slow down the representative PLA. • All other PLAs exhibit the same delay as the representative PLA, since they all share a common nbulk terminal
Objective • Validate and verify flow by designing a sub-threshold circuit for the application • Choose a test application • Low power, low speed • Develop a sub-threshold circuit design flow • Implement our delay compensation scheme to negate PVT variations • Implement the same application using a standard cell based flow on the same die • Fabricate and test the chip (TSMC 0.25 um process) • Compare the sub-threshold circuit with the standard cell circuit in terms of power consumption
Test Application - Binary Frequency Shift Keying (BFSK) Transmitter Digital BFSK Modulator Produces two tones f1 if Input is LOW f2 if Input is HIGH DAC Binary Input Data Digital Block Implemented Using Sub-threshold Circuits Amplifier Antenna • Specifications • Input bit Rate: RB = 32kbps, Broadcast distance: D = 1000m • FSK tones: f1=150kHz, f2=450kHz, Channel bandwidth: B = 300kHz
Sub-threshold Design Approach • Digital part of the circuit implemented as NPLA (Network of Programmable Logic Arrays) • NPLAs have low delay • Critical path delay easy to find • PLAs have common nbulk node • Circuit level PVT compensation • An external Beat Clock (BCLK) signal is phase locked with the critical path delay • Delay controlled by a charge pump that modulates the bulk voltage of transistors in the circuit • Compensates for both inter- and intra-die variations
clk Precharge Evaluate Dynamic NOR-NOR PLA • We use precharged NOR-NOR PLAs as the structure of choice • Wordlines run horizontally • Inputs / their complements and outputs run vertically • Each PLA has a “completion” signal that switches low after all the outputs switch • Several PLAs in a cluster share a common nbulk node. Outputs Inputs completion clk clk
clk Network of PLAs (NPLA) L1 PLA L2 PLA L3 PLA L4 PLA Inputs Outputs Combinational Logic Implemented as NPLA L2 PLA Timing Diagram L1 PLA Throughput = Tpchg+n.Teval L2 PLA L3 PLA L4 PLA
pullup pulldown The Charge Pump - PLA “completion” signal lags beat clock - nbulk node gets forward biased - PLA “completion” signal leads beat clock - nbulk goes back to zero bias
Effectiveness of the Approach • We simulated a single PLA from 0ºC to 100ºC. Also applied VT variations (10%) and VDD variations (10%). • The light region shows the variations on delay over all the corners without delay compensation. • The red region shows the delays with the self-adjusting body-bias circuit.
Design Flow BFSK Design HDL Synthesis Map to NPLA Design Of Analog Components Logic Verification Spice Verification: Functional, timing, charge pump RC Extraction LVS Full Chip Spice Verification Layout Integrated Spice Netlist
BFSK Design • fout < fclk/2, Nyquist criterion, implies < 256. • Phase increments chosen based on fclk or left programmable in real time to get Software Defined Radio (SDR) operation. • We fix phase increments to avoid extra input pins required for SDR Phase Accumulator Phase Increment Sine Lookup Table Depth: 29 = 512 DFF DFF 9 8 Mux fout = fclk 512 Binary Input Clk Clk
Design Flow BFSK Design HDL Synthesis Map to NPLA Design Of Analog Components Logic Verification Spice Verification: Functional, timing, charge pump RC Extraction LVS Full Chip Spice Verification Layout Integrated Spice Netlist
Basic BFSK transmitter Block Diagram Digital BFSK Modulator Produces two tones f1 if Input is LOW f2 if Input is HIGH Binary Input Data DAC Digital Block Implemented Using NPLA based Sub-threshold Circuits Amplifier Antenna
Digital BFSK using NPLA 4 LSBs - Binary 15 MSBs - Thermometer Avoids glitches in DAC o/p System Architecture Digital BFSK Modulator CLK Phase Detector Charge Pump Ref. PLA completion BEAT CLK Common Bulkn Input DFF DFF Binary to Thermometer Encoder Phase Accum 9 NCO 8 CLK 19 Amplifier DAC Antenna
Completion of Reference PLA Common nbulk node of a cluster of PLAs, modulated by charge pump Phase Detector Charge Pump Beat Clk Delay Compensated Sub-threshold Design block diagram NPLA DFFs DFFs L1 PLA L2 PLA L3 PLA L4 PLA L1 PLA L2 PLA L2 PLA L2 PLA Clk Clk
HDL to Schematic of Digital BFSK • Digital BFSK transmitter described using VHDL • VHDL synthesized using FPGA synthesis tool, to get a gate level netlist • This is imported into SIS in “blif” format • The “blif” file is logically optimized and mapped into NPLA • Technology Independent Optimization done on circuit • Circuit converted to a mult-level network of nodes with 5 or less inputs per node • Circuit traversed from inputs to outputs, and nodes are implemented using PLAs of size (8/6/12) • Using NPLA throughput equation, fclk estimated as 1.2MHz • We choosef1≈0.115* fclk andf2 = 0.345* fclk
Design Flow BFSK Design HDL Synthesis Map to NPLA Design Of Analog Components Logic Verification Spice Verification: Functional, timing, charge pump RC Extraction LVS Full Chip Spice Verification Layout Integrated Spice Netlist
System Architecture Digital BFSK Modulator CLK Phase Detector Charge Pump Ref. PLA completion BEAT CLK Common Bulkn Input DFF DFF Binary to Thermometer Encoder Phase Accum 9 NCO 8 CLK 19 Amplifier DAC Antenna
Binary Therm 00 000 01 001 Adjacent Values Differ by 1-bit 10 011 11 111 Thermometer Coded 8-BIT DAC Binary to Thermometer Code Conversion 15 Digital BFSK Output 4 DAC 4 LSBs
W1 8-BIT DAC Schematic • Currents flow through mirror legs based on input value • Output current /voltage modulated based by sum of weighted currents through Rout • Thermometer codes prevent glitches at output • DAC supply is 0.7V to handle 0.6V digital signals • Rout, Rcm are off-chip resistances
Amplifier Schematic • Common Source Amplifer • Supply of 0.7V • Rd, Rs are off-chip resistances • M1 biased by DAC Rout resistor • CL on-chip antenna load 80pF
Testability Features added before Integration Charge Pump Supply CLK Phase Detector Charge Pump Bulkn Ref. PLA completion BEAT CLK Common Bulkn Input DFF DFF Binary to Thermometer Encoder 9 8 Phase Accum NCO CLK 19 Amplifier DAC Antenna CHIP 8-BIT BFSK Output or 8-BIT DAC Input DAC Ouput Amp Ouput
Layout • Manual PLA layout for every PLA in design • NPLA routed using SEDSM • I/O pad cells, ESD diodes layout done manually • DAC, amplifier layout done manually • Antenna coil layout done manually
Input, Bit Line Word, Lines Transistors, modified based on logic to be implemented Output, Lines PLA Layout
I/O PAD CELL Layout • Fully Compliant with TSMC Design rules • ESD Diodes have guard rings to prevent latchup I/O PAD I/O Drivers Primary ESD Diodes Secondary ESD Diodes
Die Photo Digital BFSK domain, 0.6V Digital BFSK inputs domain, 0.7V Std Cell domain, 2.5V Digital BFSK output domain, 2V
Experimental Results from Silicon • Output of BFSK transistor is shown • As input changes from 0 to 1, the output frequency changes showing the modulation • Fclk = 1MHz • F1 = 117kHz • F2 = 347kHz • The adjacent peaks are around -10dB below the fundamental peaks • We found from Matlab Simulations that, signals from the extracted Spice netlist, could be demodulated at the receiver side
Results from Silicon Operating Range • Nbulk kept at 0V, 0.45V • Maximum frequency shows an quadratic dependence on supply Voltage
Power Comparison • Sub-threshold power calculated only for Phase Accumulator, and NCO blocks on 0.6V power supply, • Std Cell implements only this portion of BFSK circuit • Sub-threshold gives 19.4X lesser power
Bulkn Node Modulation • Bulk node modulates when beat clock demands speedup or slow-down • Bulk node modulates as supply voltage is changed, so that circuit delay is maintained constant.
Conclusion • Validated a sub-threshold circuit design methodology based on dynamic body bias (first-of-kind) • Validated design tools and techniques • First-of-kind design automation flow, will help bring sub-threshold design to mainstream. • We implemented an ultra low power, low data rate wireless BFSK transmitter • The fabricated chip, works as expected, validating our design flow. • We compared the sub-threshold design a with Std Cell based design and showed 19.4X reduction in power.
Introduction • Power consumption has become a significant hurdle for recent ICs • Higher power consumption leads to • Shorter battery life • Higher on-chip temperatures – reduced operating life of the chip • There is a large and growing class of applications where power reduction is paramount – not speed. • Such applications are ideal candidates for sub-threshold circuit design • For sub-threshold circuits, VDD ≤ VT
TX/RX System Testing TX PCB with subthreshold IC TX antennas RX board RX setup
Solving the Problem of Delay Sensitivity to Process, Voltage and Temperature Variations"A Variation-tolerant Sub-threshold Design Approach", Jayakumar, Khatri. Design Automation Conference (DAC) 2005 Anaheim, CA , June 13-17.
An Example Showing Phase Locking • This figure shows how the body bias (and hence the delay of the PLA) changes with changes in VDD. • The adjustment is very quick (within a few clock cycles). VDD change 0.22V to 0.18V VDD change 0.2V to 0.22V
Energy and Speed • We may be interested in the minimum energy operating point for the design • Minimizing VDD reduces power but minimum VDD does not mean minimum energy • The optimum VDD value increases with increased logical depth, and with temperature "Minimum Energy Near-threshold Network of PLA based Design", Jayakumar, Khatri. International Conference on Computer Design (ICCD) 2005, Oct 2-5, San Jose, CA. • Reclaiming the speed penalty • Can be done for datapath circuits, using asynchronous micropipelining • Showed that speedup of 7X is possible, with a area overhead of 44% "A PLA based Asynchronous Micropipelining Approach for Subthreshold Circuit Design", Jayakumar, Garg, Gamache, Khatri. IEEE/ACM Design Automation Conference (DAC) 2006, July 24-28, San Francisco, CA.
On-chip Antenna • Antenna size needs to be at least a 10th of the transmit wavelength to radiate effectively • Transmit wavelength around 600m • Due to on-chip space constraints, antenna coil length is only 0.2m • We have the option of using an external antenna • And we had a 60dB safety margin in the link budget analysis. • This could compensate for a lossy antenna
Spectrum of Amplifier Tones • Fclk = 1MHz • F1 = 117kHz • F2 = 347kHz • The adjacent peaks are around -10dB below the fundamental peaks • We found from Matlab Simulations that, signals from the extracted Spice netlist, could be demodulated at the receiver side