230 likes | 498 Views
A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory. M. Borgatti , L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi. STMicroelectronics - Central R&D - Italy. Outline of Presentation.
E N D
A Reconfigurable Signal Processing IC with embedded FPGA and Multi-Port Flash Memory M. Borgatti, L. Calì, G. De Sandre, B. Forêt, D. Iezzi, F. Lertora, G. Muzzi, M. Pasotti, M. Poles, P.L. Rolandi STMicroelectronics - Central R&D - Italy
Outline of Presentation • Project motivation and background • System architecture • Reconfigurable core • Memory subsystem • System performance • Application example: embedded face recognition system • Energy efficiency, measurements • SoC integration and design flow • System 2 RTL and RTL 2 Layout • Summary 2
Project motivation and background • Conflicting industry trends • Economics of system integration • Even more complex SoC • More integration • Cost effectiveness and performance (per unit) • Increasing design complexity and risks • Increasing NREs • Shorter time-to-market and product life • Strong need for: • Faster project turnaround • Lower risk • Usage of re-configurable silicon fabrics 3
Project motivation and background • Pragmatic approach proposed: • Reconfigurable architecture • Joins a statically extensible processor with e-FPGA • Tight connection to Flash memory subsystem • Open architecture with flexible programmable I/O • Programmable platform approach • Simple model for programmers 4
Programmable Platform Approach System Applications Family System Application Application Compilation Platform Compilation Config. Proc + e-FPGA Silicon process + Enabling technologies Programmable platform 5
System Architecture 48 kB SRAM 8KB D$ 8KB I$ bus bridge Extensible MPU 64 bit AHB BUS 8KB D$ M/S AHB I/F DMA & FPGA Prog. I/F FP CP DP INTs e-FPGA Instr. Ext. Flash Mem Inst. Ext I/F Buffer I/F AHB/APB Bridge 1kB Buffer GP I/O 64 bit APB BUS I2C BUS General Purpose I/O Lines I/O registers I2C Master 6
e-FPGA Purposes • Processor ISA extensions • Simplest programmer’s model • Specific interface to the MPU datapath • Impact on processor performance • Impact on processor energy efficiency • Efficiency limited by instruction stream decoding • Bus-mapped co-processor • Maximum benefits in speed/power • Flexible I/O 7
e-FPGA – Microprocessor interface e-FPGA Clock Microprocessor clock Clock Ctrl Instruction Other FPGA Purposes Decode Pipe Control Register File R Instruction extension E Result 8
Flash Memory Architecture 2Mb #0 2Mb #1 2Mb #2 2Mb #3 DFT PMA Power Block 128-bit Memory Sub-System Crossbar 128 128 128 128 P I/F DP CP FP 64 64 32 8-bit P Data Port Code Port FPGA Port 9
Flash Memory Subsystem • Modular approach • Customizable array of N independent 2Mb modules • 3 content-specific ports (CP, DP, FP) • HW support for filesystem implem. (DP) • Defrag • Compression • Virtual erase • 2Mb Module features: • 128b I/O • 40ns access time (400MB/s peak throughput) • Power management and arbitration 10
32-bit uP RegisterFile System Memory Hierarchy AHB Bridge 64-bit AHB Bus 32-bit FPGA PI/F • AHB Peak Throughput: • 800MB/s • e-FPGA • 400MB/s • (50MB/s sustained) • Total Aggregate Peak • 1.2GB/s 64-bit AHB 32-bit 64-bit CP I/F 64-bit DP I/F DMA 64 bit Port CP 32-bit Port FP 64-bit Port DP 512-B Buffer 2 x 64- + 1 x 32-bitMemory Port I/Fs 6x4 128-bit Crossbar 4 x Flash Memory Controller Logic 4 x 16384 x 128-bit Memory Module 11
Application Ex.: Face Recognition • Target application: • Recognize a face out of twenty • low-resolution images from CMOS cameras • Potential applications: • Low cost smart toys • Advanced human-machine interfaces • Color CMOS camera processors • Image preprocessing: Bayer filter • Face location: based on Hough transform • Face recognition: Line-Based • Recognition rates over 90 % • Scale-invariant • Tolerant to changes in illumination intensity 12
‘8’ ’16’ Processor Extension (I) + + Processor Load Unit 4-segm. 4-segm. • 8-issue, 8-bit L2 distance • Complexity: • 23 8-bit OPS • 6 64-bit OPS • 1GOPS peak throughput • Distance computation • 10k equiv. ASIC gates • Mapped to e-FPGA _ x 64-bit register + Result 13
Processor Extension (II) root Remaind. Number +1 >>1 <<2 >>30 >>2 + • Fixed-point square root kernel • Complexity: • 12 32-bit OPS • 2k equiv. ASIC gates • Mapped to e-FPGA _ > + 2 << 1 Result 14
Performance: Processing Time @ 100 MHz
Energy Efficiency vs. Flexibility FPGA-mapped CoProcessors 1000 Dedicated HW uP + FPGA Instructions 100 Energy Efficiency (MOPS/mW) Energy-Flexibility Gap ! 10 ASIPs, DSPs 1 Embedded Processors 0.1 Flexibility (Coverage) from: Zhang et Al., ISSCC 2000 16
Performance: Energy Efficiency 17
Functional model (untimed) Partitioning / I/F Synthesis / Refinement uP ISS Cycle Accurate Simulation Performance Analysis Libraries HW/SW VHDL (e-FPGA) Inst.Ext. Verilog HW (RTL) uP, AHB/APB Bus Peripherals C Soft Hardware (eFPGA) SW Apps eFPGA mapping eFPGA HARD MACRO SoC Integration 18
CPU core, IPs Interface RTL code Flash RAM eFPGA core Inst. Ext. Coproc. I/O I/F Synthesis Floorplanning / P&R Synthesis Static Timing Analysis, Dynamic Verification Con. Mapping (P&R) Netlist + Timing Database FPGA Timing DB Bit-stream Static Timing Analysis (SoC + eFPGA) Silicon fab 19
Chip Layout DFT 1MB FLASH Memory 8+8 KB I$ + D$ Embedded FPGA TAGS 32b uP + AHB & APB + 250k GATES Flash Ports Buffers uP AHB/APB FPGA 48 KB SRAM BUFFER 48kB SRAM 8+8 kB I$+D$ 20
Summary • e-FPGAs allow architectural tradeoffs for reconfigurable embedded systems: • Processor ISA extensions • Bus-mapped co-processor • Flexible I/O • Modular, content-specific, multiport e-Flash • Performance figures: • Up to 10x speedup • Up to 9x energy reduction • Dynamic reconfiguration in 500 us • Specific design-flow for system and RTL 22
Acknowledgements: The authors thank: all the colleagues of NVM-DP Dept. A. Maurelli, F. Piazza and L. Fumagalli. 23