280 likes | 372 Views
ASIC Implementation of the PWA Generic Canonical F orm. Dpto. Electrónica y Electromagnetismo, Universidad de Sevilla Instituto de Microelectrónica de Sevilla-CNM-CSIC; acojim@imse-cnm.csic.es. Antonio J. Acosta. MOBY-DIC Project FP7-IST-248858 Noordwijkerhout, August 23, 2012.
E N D
ASIC Implementation of the PWA Generic Canonical Form Dpto. Electrónica y Electromagnetismo, Universidad de Sevilla Instituto de Microelectrónica de Sevilla-CNM-CSIC; acojim@imse-cnm.csic.es Antonio J. Acosta MOBY-DIC Project FP7-IST-248858 Noordwijkerhout, August 23, 2012
Outline of the presentation Introduction: Role of ASICs in control & characteristics of ASICs Design of MPC_ASICs: From high-level specifications to silicon PWAG Architecture Selection Design, Integration and test of a PWAG ASIC Test Results
Design flow Numerical data Heuristic knowledge Identification Description Tuning / Verification Simplification Synthesis Non-linear plant HW (VHDL) or SW (C, C++, Java) Simulation Experiment Model-based synthesis
Embedded Controller DSP Dedicated HW Embedded SW Expansion boards Digital ASIC FPGA Test board External memories FPGA -Performances + +Flexibility - -Cost +
ASIC DesignExample HDL Logical verification Timing and power estimation Area Estimation
ControlandCircuitdecisions • Configurable architecture • Parametrizable design • Programmabilityissues • HW requirements & limitations • … • Canonical form (PWAG) • No. input-output • Precision • Control surface • … MOBY-DIC TOOLBOX FOR SELECTED CASE-STUDIES HDL code Parameters
Proposed PWAG Architecture Modification of theone in [OLIV09] FSM MEMORY • Thebinarytreeisstored in a Memory • The data in theTreeMemory are theaddress of ParamMemory • Lessrigid, more configurable (differenttrees, on-line computation)
ASIC majorspecifications • Canonical Form: PWA Generic • Maximumnumber of inputs: 4 (configurable 1-4) • Bit number of inputs/parameters: 12-bits • Bit number of output: 26-bits (althoughtheprecisionis 12) • Fixed-pointarithmetic • Maximumnumber of polytopes plus edges: 4096 • MaximumTreedepth: 13 (configurable 1-13)
Technology and CAD Tools Selection Taiwan Semiconductor Manufacturing Company (TSMC) 90 nm, 9 metal layers MiniAsic:1,875 x 1,875 µm2 (2011), 100 samples. Memories on Chip Dual Port RAM Memories Access and Writing times below 5 ns (worst case) CAD tools used: DESIGN ANALYZER (SYNOPSYS) SOC ENCOUNTER, DFWII (CADENCE)
Working Modes: Writing TMEMO • TMEMO storesallthenodes of thebinarytree: • 214 -1= 20+ 21+…+ 213, being13 themaximumtreedepth • Worldlenght=12 enables 212edges plus polytopes 12 Two ck cycles for each data writing 32768 ck cycles to write the whole memory, 0.64s with a 50MHz clock Layout dimensions: 1060.95µm x 577.15µm = 612332.6µm2 214=16384 TREE MEMORY (TMEMO)
Working Modes: Writing PMEMO • PMEMO storesallthepossibleedges and polytopes • 212 = 4096 edges plus polytopes • Worldlenght=60 enables 5 12-bit parameters 60 Six ck cycles for each data writing 24576 ck cycles to write the whole memory, 0.48s with a 50MHz clock Layout dimensions: 1190.51µm x 569.12µm = 677543.1µm2 PARAMETER MEMORY (PMEMO) 212=4096
x3 h3 x4 x2 x1 h4 h2 h1 ≤ 0 12 24 12 12 24 Working Modes: Normal operation 12 26 12 24 f(x)=fPWA(x) when a leaf is reached 12 12 24 12 • Combinational • Delay < 4ns worst case ArithmeticUnit decision 26 k 12 COND. Word lengthconditioningcircuitfortunablefixedpoint
Working Modes: Normal operation MemoryTiming Output ready in onlyoneclockcycle PORT A (writemode) PORT A (writemode) TMEMO PQ TQ PMEMO ADDRESS PORT B (readmode) PORT B (readmode) TQ PQ output clk !clk Input Acquisition Fullyparallel load needs 48 pins Parallel load of 12-bit inputs in 4 clockcycles Ifone/two/three inputs, set X2X3X4/X3X4/X4=0 clk X1 X2 X3 X4 valid_in
Working Modes: Test 14 12 TREE MEMORY 60 CONTROL UNIT INPUT PARAMETER MEMORY test clk Arith Unit OUTPUT • Parallel load of relevant data (snapshot) • Serial Test Output (shiftingoutthe 86-bit register) • ConcurrenttoOperationMode
Layout I/O Ports: 48 pins VDD/GND Ports: 12 pins Package: JLCC68 Area: 1860 x 1860 µm2 Active: 1460 x 1460 µm2 No. cells: 3135 Memory: 54 KB % Memory: 60% PMEMO TMEMO Post-layout simulated
Structure of ASIC No. Inputs: configurable from 1 up to 4 No. Outputs: 1 Input Resolution: 12 bits Output Resolution: 26 bits Parameters Resolution: 12 bits Maximum No. hyperplanes plus polytopes: 4096 Depth of binary search tree: configurable from 1 up to 13 Integration technology: 90nm, 2.5V-1.2V, 9 metal layers, TSMC (Taiwan Semiconductor Manufacturing Company) TMEMO: 16384x12=24KB PMEMO: 4096x60=30KB Size: 1860x1860 µm2 Package: JLCC68 Operation mode P=00 Writing TMEMO mode P=01 Writing PMEMO mode P=10 Test mode P=11
Test Setup of ASIC OscilloscopeAgilent DSO6104A • Power • supply HPE3630A • Logic • AnalyzerAgilent • 16823A • ExperimentcontrolledwithMatlab
Test process of ASIC: Go/no go test • Go/No go test: Simulation post-synthesis vs experimental data
Test process of ASIC: Examples • Doubleintegrator, ACC and DC-DC in open-loopfashion • Memorystoredwithparameters and treesobtainedfromtheMoby-Dic toolbox • Thecomparisonbetweenexpected and obtainedresultswasmadebyMatlab Output surface (ASIC)
Characterization of ASIC • 20 packagedsamples, allowingstatisticalanalysis To discard bad samples AUTOMATIC FLOW For specific conditions @ DC @ 50 MHz @ fmax
Summary of results • 100% effectivenesswithveryreducedvariationswithprocess • Doubleintegrator (2 inputs) reacheshigherfrequency and consumes lessthan ACC and DC-DC (4 inputs) • Staticpowerisdominatedbyleakage in memories
Analysis of costs & performance n: No. dimensions (inputs) d: depth of the tree Nr: No. Regions (edges+polytopes)
Analysis of cost&performance for CSs • Post-layoutsimulation of ASIC provides 4-8 nsforclockcycle
References • [OLIV09] A. Oliveri, T. Poggi, M. Storace, “Circuit implementation of piecewise-affine functions based on a binary search tree,” European Conference on Circuit Theory and Design (ECCTD’09), pp. 145–148, Antalya, Turkey, August 2009. • [OLIV11] A. Oliveri, G.J.L. Naus, M. Storace, W.P.M.H. Heemels, "Low-complexity approximations of PWA functions: a case study on Adaptive Cruise Control“, European Conference on Circuit Theory and Design (ECCTD'11), pp. 694-697, Linköping, Sweden, August 2011.
Conclusions and Remarks • ASIC to cover different case-studies • VLSI is not a simple translation from FPGA • Configuration and programmability is provided -> IP hard block • Extensive usage of toolbox to get surface parameters and fully functional simulation • ASIC performances overtake FPGA in speed (x10) and power (÷10)