340 likes | 482 Views
A Synthesizable Datapath-Oriented Programmable Logic Core. Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College. Embedded Programmable Logic Cores. Embed a small amount of programmable logic onto an ASIC
E N D
A Synthesizable Datapath-Oriented Programmable Logic Core Steven J.E. Wilton, Chun Hok Ho, Philip Leong, Wayne Luk, Brad Quinton University of British Columbia and Imperial College
Embedded Programmable Logic Cores • Embed a small amount of programmable logic onto an ASIC • Postpone some decisions until late in design cycle • Fast upgrade path for products • Embedded Debug:
Soft Programmable Logic Cores • Advantages • Easy to integrate, reduces design time • Very flexible, can create the exact required core • Easy to migrate to smaller technologies • Disadvantages • Inefficient compared to hard cores • Our thought • Makes sense if you only want a small core (a few hundred gates)
This talk: • A new architecture for a synthesizable programmable logic core that supports datapath (bus-based) circuits
Previous Synthesizable PLC’s • Kim Bozman and Noha Kafafi: • LUT-Based • Unique Directional Routing Fabric
Synthesizable Cores • Observation 1: To make it truly synthesizable, must avoid • combinational loops in the unprogrammed fabric • Observation 2: Each tile need not be identical
Previous Synthesizable PLC’s • Andy Yan: • Product-term Based Logic Block • Unique Directional Routing Fabric • Supported Sequential Circuits
Our Architecture • Use it when the PLC is connected to a bus: Bus Bus Observation: These connections are permanently tied to the bus signals, and we know this when the ASIC is designed
Logic Architecture Key point: - All bitblocks within a wordblock share same set of configuration bits - Means all bitblocks implement the same function
Routing Architecture • Key point: Signals are routed as buses
Routing Architecture • Key point: - Linear array of wordblocks • - Buses get wider as we go to the right
Routing Architecture • Key point: - Linear array of wordblocks • - Buses get wider as we go to the right
Routing Architecture • Key point: - Linear array of wordblocks • - Number of buses goes up as we go to the right
Two inputs instead of three Multipliers Two output buses (MSB, LSB)
Add a Control Block • Control block is based on P-term fine-grained synthesizable core
Example Mapping • Monitor two buses: • - Count the number of times • each bus matches a mask • - includes don’t care bits • - Count the number of times • both buses match the mask • at the same time
Interesting Questions: 1. How do the various architectural parameters affect density? • How does this compare to a fine-grained architecture?
Architectural Parameters • D Number of Wordblocks (incl. multipliers) • N Bit Width • M Number of Input Buses • R Number of Output Buses • F Number of Feedback Paths • C Number of Constant Registers • A Number of Multipliers • P Number of Product-Term Blocks
Impact of Number of Word-blocks and bit-width • Key Result: Both bit-width and number of wordblocks have a • significant impact on area.
Impact of the Number of Multipliers • Key result: Area increase due to more buses in the routing
Impact of the Size of the Control Block • Key result: The control block can dominate if it becomes too big
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 68,190 132,339,335 9,300 1940 7.33 • dotv3 34,119 65,534,780 6,575 1921 5.19 • dscg 72,178 116,271,968 9,473 1611 7.62 • fir4 76,213 130,971,120 9,843 1718 7.74 • egcd 1,225,231 22,776,474 10,420 18.6 117 • momul 294,135 11,448,589 7,097 38.9 41 • median 142,172 10,733,962 4,420 75.5 32 • debug1 87,265 1,302,928 3,484 14.9 25 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC
But these results aren’t fair: • - For each benchmark, we found the optimum set of • architectural parameters. • - We need an architecture that works for a variety of • circuits
Architecture Construction • Our thought: • - The number of inputs/outputs is fixed by the SoC • - The designer has an idea of the size of the programmable • logic (number of wordblocks) • Fix all other parameters (as a function of # of wordblocks) • - eg. fixed ratio between number of multipliers vs. wordblocks • fixed ratio between control logic and datapath logic, etc. • We arbitrarily chose fixed ratios based on our experience • - A full architecture study is left as future work!
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34
Bench- Datapath Fined-Grain ASIC Fine-Grain/ Datapath/ • Mark (ours) (PTerm) Datapath ASIC • fbly 332,091 132,339,335 9,300 399 35.7 • dotv3 225,518 65,534,780 6,575 291 34.3 • dscg 325,029 116,271,968 9,473 358 34.3 • fir4 307,154 130,971,120 9,843 426 31.2 • egcd 3,778,611 22,776,474 10,420 6.02 363 • momul 486,654 11,448,589 7,097 23.5 68.5 • median 194,654 10,733,962 4,420 55.1 44 • debug1 119,286 1,302,928 3,484 10.9 34 Key result 1: Significantly better than fine-grained architecture Key result 2: Overhead roughly the same as FPGA/ASIC
625mm 625mm
Conclusions • Our architecture is 6 to 426 x more efficient than fine-grained architecture • But, this is only for datapath-oriented circuits. • However, this is ok: • - In an SoC, we know, when the chip is designed, whether • the inputs are buses or bits • - If there are buses, use this architecture • - If there are not buses, use Andy’s PTerm architecture • Final thought: using this architecture, the overhead is similar to • that of a normal FPGA. People already accept this!