A Synthesizable Datapath-Oriented Programmable Logic Core

A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College

Embedded Programmable Logic Cores • Embed a small amount of programmable logic onto an ASIC • Postpone some decisions until late in design cycle • Fast upgrade path for products • Embedded Debug:

Soft Programmable Logic Cores

Soft Programmable Logic Cores • Advantages • Easy to integrate, reduces design time • Very flexible, can create the exact required core • Easy to migrate to smaller technologies • Disadvantages • Inefficient compared to hard cores • Our thought • Makes sense if you only want a small core (a few hundred gates)

This talk: • A new architecture for a synthesizable programmable logic core that supports datapath (bus-based) circuits

Previous Synthesizable PLC’s • Kim Bozman and Noha Kafafi: • LUT-Based • Unique Directional Routing Fabric

Synthesizable Cores • Observation 1: To make it truly synthesizable, must avoid • combinational loops in the unprogrammed fabric • Observation 2: Each tile need not be identical

Previous Synthesizable PLC’s • Andy Yan: • Product-term Based Logic Block • Unique Directional Routing Fabric • Supported Sequential Circuits

Our Architecture • Use it when the PLC is connected to a bus: Bus Bus Observation: These connections are permanently tied to the bus signals, and we know this when the ASIC is designed

Logic Architecture

Logic Architecture Key point: - All bitblocks within a wordblock share same set of configuration bits - Means all bitblocks implement the same function

Routing Architecture • Key point: Signals are routed as buses

Routing Architecture • Key point: - Linear array of wordblocks • - Buses get wider as we go to the right

Routing Architecture • Key point: - Linear array of wordblocks • - Number of buses goes up as we go to the right

Datapath Architecture

Two inputs instead of three Multipliers Two output buses (MSB, LSB)

Add a Control Block • Control block is based on P-term fine-grained synthesizable core

Example Mapping • Monitor two buses: • - Count the number of times • each bus matches a mask • - includes don’t care bits • - Count the number of times • both buses match the mask • at the same time

Interesting Questions: 1. How do the various architectural parameters affect density? • How does this compare to a fine-grained architecture?

Architectural Parameters • D Number of Wordblocks (incl. multipliers) • N Bit Width • M Number of Input Buses • R Number of Output Buses • F Number of Feedback Paths • C Number of Constant Registers • A Number of Multipliers • P Number of Product-Term Blocks

Impact of Number of Word-blocks and bit-width • Key Result: Both bit-width and number of wordblocks have a • significant impact on area.

Impact of the Number of Multipliers • Key result: Area increase due to more buses in the routing

Impact of the Size of the Control Block • Key result: The control block can dominate if it becomes too big

Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25

Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture

Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC

But these results aren’t fair: • - For each benchmark, we found the optimum set of • architectural parameters. • - We need an architecture that works for a variety of • circuits

Architecture Construction • Our thought: • - The number of inputs/outputs is fixed by the SoC • - The designer has an idea of the size of the programmable • logic (number of wordblocks) • Fix all other parameters (as a function of # of wordblocks) • - eg. fixed ratio between number of multipliers vs. wordblocks • fixed ratio between control logic and datapath logic, etc. • We arbitrarily chose fixed ratios based on our experience • - A full architecture study is left as future work!

Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34

Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC

625mm 625mm

Conclusions • Our architecture is 6 to 426 x more efficient than fine-grained architecture • But, this is only for datapath-oriented circuits. • However, this is ok: • - In an SoC, we know, when the chip is designed, whether • the inputs are buses or bits • - If there are buses, use this architecture • - If there are not buses, use Andy’s PTerm architecture • Final thought: using this architecture, the overhead is similar to • that of a normal FPGA. People already accept this!

A Synthesizable Datapath-Oriented Programmable Logic Core

A Synthesizable Datapath-Oriented Programmable Logic Core

Presentation Transcript

Programmable Logic

Programmable Logic Controllers

PROGRAMMABLE LOGIC CONTROLLERS

Programmable Logic Controllers

Programmable Logic

Programmable Logic Controllers

Programmable Logic Controllers

Programmable Logic

Programmable Logic

Programmable Logic Controllers

A Programmable Logic Device

A Programmable Logic Device

Programmable Logic Devices

Programmable Logic Devices

PROGRAMMABLE LOGIC CONTROLLERS

Programmable Logic Devices

Programmable Logic

Programmable Logic

Programmable Logic

Architecture and algorithm for synthesizable embedded programmable logic core