530 likes | 665 Views
Class Presentation Of Advance VLSI Course. Presented by : Ali Shahabi. Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block. By :
E N D
Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz Presentation Date : 2004/12/30
Custom ASICs are expensive • High design complexity • High non-recurring engineering costs • Need high-volume, high-profit market • Hard to modify or fix
Reconfigurable computing • Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures • Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads
FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]
Coarse-grain architecture • Chip multi-processor • Compute, memory, interconnect, control • Reconfigure tile and global network [1]
Current memory systems • Traditional emphasis on compute side – Memory system important • FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs • Coarse-grain architectures – Large grain – Low flexibility
Design goal Low overhead, fast, reconfigurable memory system • Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality • Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths • Design targets – Cache – FIFO
Reconfigurable memory system • Array of homogeneous memory mats • Each memory has a port into the interconnect • Mat size chosen based on natural partition boundary • Small inter-mat control network [1]
Tile floor plan [2]
Sample configuration: caches • Mats configured as tag or data • Direct mapped or set-associative caches • Use inter-mat control network to pass hit/miss [1]
Sample configuration: FIFOs • Data FIFOs, instruction store, and scratchpad • Completely self-contained FIFOs • A single FIFO can be <> 1 mat [1]
Multi-porting • Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports • Multi-porting each cell is expensive – Multiple ports not always needed • Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays
Mat latency • Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle • Fully pipelined – Accepts one access every cycle
Added features [1]
Meta-data [1]
Mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline
Meta-data bits meta-data data 32b 4b • Cache: valid, dirty, LRU, cache coherence state • FIFO: valid • Special operations – Gang – Read modify write
Gang operation meta-data data mask clear set • Can gang set or clear columns of meta-data bits • Single cycle operation [1]
Gang operation meta-data data mask clear set • Can gang set or clear columns of meta-data bits • Single cycle operation [1]
Read modify write mdata data [1]
Read modify write: read mdata data [1]
Read modify write: modify mdata data [1]
Read modify write: write mdata data [1]
Timing [1]
PLA • Reconfigurable NOR-NOR PLA • 1st NOR plane = ternary-CAM • 2nd NOR plane = SRAM [1]
Pointer logic [1]
Pointer logic For FIFO configurations we add pointer logic • 4 pointer/stride pairs – 11b pointer – 4b stride Pointer cells are 2-ported [1]
Write buffer [1]
Write buffer Pipeline writes for single-cycle cache writes • On write, data mat stores incoming data in WB • Tag check – Cache miss WB entry is invalidated – Cache hit WB entry is allowed to write • Writes into data mat on next write • On every write, the WB and mat are both active [3]
Comparator [1]
Comparator • Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk • Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)
Testchip • 0.18µm 6M TSMC • 3mm x 3.3mm die • 4 memory blocks • Low swing crossbar • Test vector storage • 1.1GHz (10 FO4) • 1.8V, room temp. [1]
Testchip mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data • 16 AND-term PLA – 6 inputs, 4 outputs • 4 pointer/stride pairs – 11b pointer – 4b stride
Mat area breakdown (mm2) • 32% mat area in peripheral logic [1]