180 likes | 322 Views
University of Rome “ Tor Vergata” Department of Electronic Engineering. VLSI Implementation of Reconfigurable Cells for RFU in Embedded Processors. Authors : G.C. Cardarilli, L. Di Nunzio, R. Fazzolari, C. Lenci , M. Re. Index. Introduction / motivation
E N D
UniversityofRome “Tor Vergata” Departmentof Electronic Engineering VLSI ImplementationofReconfigurableCellsfor RFU in EmbeddedProcessors Authors: G.C. Cardarilli, L. Di Nunzio, R. Fazzolari, C. Lenci, M. Re
Index • Introduction / motivation • ReconfigurableFunctionalUnits • MulticontextLogicBlocks • Traditional and proposedcellcomparison • Performance evaluation • Delay • Powerconsumption • Area requirements • Conclusions
Motivation • Operandsusuallyshorterthan native processorwordlenght in some applications Data 2 Data 1 Result Result Poorefficiencyofgeneralpurposeprocessorswhile processing shorter data XOR AND
PossibleSolution • Executionspeed can beincreasedusing a reconfigurableunitfor “custom” instructions Register File ALU ReconfigurableUnit PROCESSOR
ReconfigurableUnits Attached Processing Unit (APU) • Locatedoutsideof the processorcore • “Slow” data-transfer between APU and processor • Originalinstruction set Register File ALU PROCESSOR Processorcore APU
ReconfigurableUnits Coprocessor • Locatedoutsideof the processorcore • FasterinteractionwithprocessorcorethanAPUs • Instruction set extensionneeded Register File ALU Processorcore Coprocessor Register File Coprocessor PROCESSOR
Register File ALU ReconfigurableUnits ReconfigurableFunctionalUnits (RFUs) • Integratedinto the processorcore • Fastest interaction with the processor • Core re-design needed • Instruction set extensionneeded RFU PROCESSOR
ReconfigurableUnits ReconfigurableUnitrequirements: • Fast data-transfer between RU and processor RFU approachchosen • Fast reconfigurationof the RU • Silicon area assmallaspossible • Low powerconsumption
MulticontextReconfigurableCells Traditionalapproach (LUT-based): OneLook-Up Tablefor eachcontext (operation) Configurable Block LUT Context N LUT Context 1 Context Memory ReconfigurableLogic Block: A single reconfigurable block, complete with a memorycontaining the contexts output input output input Selector context selection context selection
ProposedLogic Block • Full-Adderbased • Additionalblocksforitsconfiguration • 4 configurationbits (24 = 16 context) • 3 Input bits/ 1 Output bit S0 S1 D2 S0 S1 P D3 S2 D1 MUX CIN Sum MUX Data Full Adder X CIN ConfigurationBits COUT Y Switch LB Out To CIN ofnext LB
ReconfigurableCellComparison ProposedReconfigurableCell Logic Block A single reconfigurablelogic block based on a full-adder, complete with a memorycontaining the contextconfigurationbits 16x3 Context Memory Context Enable Out CIN SUM 4 3 Context Selection COUT 3 D2 D1 D3
ReconfigurableCellsComparison Traditional (LUT-based) implementationof the samecell: S0 SUM MUX MUX 8 8 16x8 LUT 16x8 LUT 4 CIN Context Selection Context Enable Context Enable Out Out COUT 3 D2 D1 D3 Data Input MUX 3
Performance evaluation • Simulation software: SPECTRE, Cadence Virtuoso Suite • Processused: CL018 by TSMC, Taiwan (0.18μm featuresize) • Processrelatedsimulation data: NCSU Design Kit
Performance evaluation: layout LUT-basedcell layout: Proposedcell layout: 0.00903 mm2 vs 0.0212 mm2(57.4% less)
Performance evaluation: delay Maximumdelaysof the proposedcell: Maximumdelaysof the traditionalLUT-basedcell:
Performance evaluation: power • Simulationconditions: • 100 MHz operatingfrequency • 100% input nodeactivity Powerconsumptionof the proposedcell: 0.572mW Powerconsumptionof the traditionalLUT-basedcell: 1.097mW Averagepowerconsumptionreducedby 48%
Performance evaluation: summary Summaryof performance comparison:
Conclusions • Architectureadvantages: • Fast reconfiguration • Low transistor count (68.8% less) and area requirements • Low powerconsumption • Mainlimitations: • Reducedflexibilityifcomparedto a LUT-basedcell • Future work: • Useof the proposedcell in a complete RFU architecture • Integrationof the RFU in anexistingembeddedprocessor