Class Presentation Of Advance VLSI Course

Class Presentation Of Advance VLSI Course Presented by : Ali Shahabi Major Refrence is : Architecture and Circuit Techniques for a Reconfigurable Memory Block By : Ken Mai, Ron Ho, Elad Alon, Dean Liu,Younggon Kim, Dinesh Patil, and Mark Horowitz Presentation Date : 2004/12/30

Custom ASICs are expensive • High design complexity • High non-recurring engineering costs • Need high-volume, high-profit market • Hard to modify or fix

Reconfigurable computing • Growing interest in reconfigurable solutions – FPGAs – Structured ASICs – Coarse-grain architectures • Reconfigurable computing characteristics – Low non-recurring engineering costs – Good performance and efficiency – Reconfigurability overheads

FPGA with hardwired blocks CLBs CLB : Configurab44le Logic Block [1]

Coarse-grain architecture • Chip multi-processor • Compute, memory, interconnect, control • Reconfigure tile and global network [1]

Current memory systems • Traditional emphasis on compute side – Memory system important • FPGAs – Fine grain with sizable overheads – Use CLBs for extra functionality – Slow compared to cutting-edge SRAMs • Coarse-grain architectures – Large grain – Low flexibility

Design goal Low overhead, fast, reconfigurable memory system • Reconfigure along natural SRAM partition boundaries – Add hardwired blocks for extra functionality • Modern SRAM circuit techniques – Pulse-mode self-resetting logic – Replica timing paths • Design targets – Cache – FIFO

Reconfigurable memory system • Array of homogeneous memory mats • Each memory has a port into the interconnect • Mat size chosen based on natural partition boundary • Small inter-mat control network [1]

Smart Memories chip [2]

Tile floor plan [2]

Sample configuration: caches • Mats configured as tag or data • Direct mapped or set-associative caches • Use inter-mat control network to pass hit/miss [1]

Mats configured as 2-way set-associative cache [2]

Sample configuration: FIFOs • Data FIFOs, instruction store, and scratchpad • Completely self-contained FIFOs • A single FIFO can be <> 1 mat [1]

Multi-porting • Some configurations need >1 access per cycle – Cache tag with snooping – FIFOs with independent read and write ports • Multi-porting each cell is expensive – Multiple ports not always needed • Run memory system faster than processor – Time multiplex single-port – Memory cycle = 10 fan-out of 4 inverter delays

Virtual multi-porting [1]

Mat latency • Total mat latency = 2 cycles – 20 FO4 – SRAM access = 1 cycle – Peripheral logic = 1 cycle • Fully pipelined – Accepts one access every cycle

Added features [1]

Meta-data [1]

Mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data – Scan tunable replica bitline

Meta-data bits meta-data data 32b 4b • Cache: valid, dirty, LRU, cache coherence state • FIFO: valid • Special operations – Gang – Read modify write

Gang operation meta-data data mask clear set • Can gang set or clear columns of meta-data bits • Single cycle operation [1]

Meta-data bit cell [1]

Read modify write mdata data [1]

Read modify write: read mdata data [1]

Read modify write: modify mdata data [1]

Read modify write: write mdata data [1]

RMW decoder circuits [1]

RMW decoder circuits: read [1]

RMW decoder circuits: modify [1]

RMW decoder circuits: write [1]

Timing [1]

PLA • Reconfigurable NOR-NOR PLA • 1st NOR plane = ternary-CAM • 2nd NOR plane = SRAM [1]

PLA: 1st NOR plane [1]

PLA: normal delay chain [1]

PLA: early reset-off delay chain [1]

PLA: 2nd NOR plane [1]

Pointer logic [1]

Pointer logic For FIFO configurations we add pointer logic • 4 pointer/stride pairs – 11b pointer – 4b stride Pointer cells are 2-ported [1]

Write buffer [1]

Write buffer Pipeline writes for single-cycle cache writes • On write, data mat stores incoming data in WB • Tag check – Cache miss 􀃆 WB entry is invalidated – Cache hit 􀃆 WB entry is allowed to write • Writes into data mat on next write • On every write, the WB and mat are both active [3]

Comparator [1]

Comparator • Maskable comparator – Can mask out any combination of meta-data bits – Can mask out the main data as a chunk • Example use: cache tag compare – Want to check valid state of line (in meta-data) – Want to check tag itself (in main data)

Putting it all together [1]

Testchip • 0.18µm 6M TSMC • 3mm x 3.3mm die • 4 memory blocks • Low swing crossbar • Test vector storage • 1.1GHz (10 FO4) • 1.8V, room temp. [1]

Testchip mat details • 2KB SRAM array – 512 x 36b logical, 128 x 144b physical – 32b main data, 4b meta-data • 16 AND-term PLA – 6 inputs, 4 outputs • 4 pointer/stride pairs – 11b pointer – 4b stride

Mat area breakdown (mm2) • 32% mat area in peripheral logic [1]

Class Presentation Of Advance VLSI Course

Class Presentation Of Advance VLSI Course

Presentation Transcript

VLSI 2009 Powerpoint Presentation Template

VLSI Course Presentation: Arash Mirhaj

VLSI DESIGN PRESENTATION

A Class presentation for VLSI course by : Maryam Homayouni

Introduction to advance class modelling

Presentation for Advanced VLSI Course

Class Presentation

Class Representation For Advanced VLSI Course

A Class Presentation for VLSI Course by : Fatemeh Refan Based on the work

A Class Presentation for VLSI Course By “ Anahita Naghilou ” Professor : Dr. Fakhraei

Class Representation For Advanced VLSI Course

Presentation For VLSI Application of Suffix Trees

A class presentation for VLSI course

Class Presentation for Advanced VLSI Course Presented by: Behzad Eghbalkhah

Class Presentation

Advance Directives PowerPoint Presentation

Advance Diploma of Management Course Online

VLSI Online training |Online VLSI Course in Bangalore - VLSI GURU

Advance Php Course

Best Advance VLSI Training Center in Bangalore

VLSI course in chennai

VLSI Project Presentation