430 likes | 697 Views
gabriela:. gabriela:. Embedded Memory Wrapper Generation for Multi-processor SoC Design. F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet 38031 Grenoble Cedex - France. DSP. CPU. IP. (N processors). Communication Network. SRAM Memory. FLASH
E N D
gabriela: gabriela: Embedded Memory Wrapper Generation for Multi-processor SoC Design F. Gharsalli, S. Meftali, F. Rousseau, A.A. Jerraya TIMA laboratory 46 avenue Felix Viallet 38031 Grenoble Cedex - France
DSP CPU IP (N processors) Communication Network SRAM Memory FLASH Memory Memory for SoC • SoC: a single chip • Heterogeneous components (CPU, IP, …) • Application-specific architecture • Integration of standard Memory IP • Adaptation of memory protocols to the specific network GLUE
DSP CPU IP (N processors) Communication Network Wrapper Wrapper SRAM Memory FLASH Memory Memory for SoC • SoC: a single chip • Heterogeneous components (CPU, IP, …) • Application-specific architecture • Integration of standard Memory IP • Adaptation of memory protocols to the specific network
Outline • Introduction • Memory IP based design • Memory integration issues • Architectural Models and Basic Concepts • Memory Wrapper • Generic architecture • Automatic generation • Experiments • Conclusion
MEMORY INTERFACE DESIGN IS A DOMINANT PROBLEM Memory IP based design • Steadily Increasing Capacity • Memory Reuse Based Design • to close the gap between capacity and productivity
Memory integration issues • Complex system design • Heterogeneous components • Several logical ports and specific communication protocols • Standard Memory components • Limited physical ports and standard access protocols • Large memory design space exploration • Different memory characteristics (Type, Size, Consumption) • Multi-masters SoC • Parallel accesses to the global memory
Memory integration issues • Complex system design PORT ADAPTATION is needed • Large memory design space exploration WRAPPER FLEXIBILITY is required • Multi-masters SoC SOPHISTICATED SYNCHRONIZATION MECHANISMS are required
None of the existing strategies has fully addressed the problems of memory IP integration already described Related Work • Port adaptation • CoWare • Polis • Cadence (VCC) • Wrapper flexibility • Marie Curie • COSY • Synchronization mechanisms • Fixed priority (PalmChip) • TDMA and Round-Robin (Sonics)
Our Contributions • Generic memory wrapper architecture • Port adaptation • Memory flexibility • Arbitration between parallel memory accesses • Automatic generation of memory wrapper by assembling library components
Outline • Introduction • Memory IP based design • Memory integration issues • Architectural Models and Basic Concepts • Memory Wrapper • Generic architecture • Automatic generation • Experiments • Conclusion
Virtual architecture Channels M1 MEMORY M2 Module implementation • Micro-architecture model • Modules implementation • Physical communication network • Explicit communication procedures • HW wrapper implementation and synthesis M1 MEMORY OS Wrapper Wrapper Physical Communication Network Micro-architecture Architectural models • Virtual architecture model • Abstract modules (Virtual modules) • Abstract channels • Implicit communication procedures • Wrapper specification but no implementation
Channel 1 Channel 2 External port (logic port) Internal port (physical memory port) virtual port Memory IP Wrapper Basic concepts: virtual module • Separation between behavior and communication interface • Memory access must be independent of the memory type • Hiding the abstraction level of memory description • Memory integration must be independent of these abstraction levels • Logical and physical accesses To adapt these accesses, we use a wrapper
Outline • Introduction • Memory IP based design • Memory integration issues • Architectural Models and Basic Concepts • Memory Wrapper • Generic architecture • Automatic generation • Experiments • Conclusion
Communication network channels CA1 CA3 CA2 arbiter IB memory wrapper MPA Memory Bus Memory IP Memory wrapper architecture • Generic wrapper architecture • Memory dependent part • Memory port adapter (MPA) • Communication dependent part • Channel adapter (CA) • Internal bus (IB) • Address, data and control • Arbiter
Communication network Communication network CA1 CA2 CA3 CA1 CA2 CA3 IB arbiter IB memory wrapper arbiter MPA memory wrapper Memory Bus MPA1 MPA2 Single port memory IP Memory Busses Dual port memory IP We change only the Memory Port Adapter part Flexibility of the memory architecture • Flexible memory wrapper architecture for a large design space exploration • Flexibility is ensured by generic and modular models • CA: customized with communication network specific parameters • MPA: customized with memory specific parameters
Virtual Architecture Annotated with Parameters Memory IP Library CA MPA library Wrapper Generation Micro-architecture Memory wrapper generation flow • Wrapper generation • Input : • Memory IP library • Wrapper components library (CA, MPA) • Architectural parameters • Number of ports, channels, protocols • Action • Customizing the generic CA and MPA from library using the architectural parameters • Instantiation of customized CA and MPA • Interconnection to the rest of system • Output : • Micro-architecture
Outline • Introduction • Memory IP based design • Memory integration issues • Architectural Models and Basic Concepts • Memory Wrapper • Generic architecture • Automatic generation • Experiments • Conclusion
Image Filtering Process Input/Output Image Input image Output image
Experiments • Low level image processing for digital camera • The initial specification is • Memory rich (2 Mbytes Flash, 2Mbytes ROM, 256 Kbytes SRAM) • Processor poor (only one 8 bit RISC processor) • Acceleration by adding an other processor • We use 2 ARM7 processors • 1 global memory • Point-to-point communication network • 2 Experiments to prove the memory flexibility ensured by wrapper • Experiment 1: using a dual port SRAM • Experiment 2: using a single port SDRAM
M1 M2 T1 T3 T2 T4 Logical channels SRAM dual port Experience 1: Dual port memory
M1 M2 T1 T3 T2 T4 Logical channels SRAM dual port Experience 1: Dual port memory
M1 M2 T1 T3 T2 T4 Logical channels SRAM dual port SRAM dual port SRAM dual port Experience 1: Dual port memory Module 1 implementation ARM7 ISS Module 2 implemenbtation ARM7 ISS CPU wrapper CPU wrapper MEMORY WRAPPER Memory Busses (32)
M1 M2 T1 T3 T2 T4 Logical channels SRAM dual port SRAM MPA SRAM MPA SRAM dual port SRAM dual port Experience 1: Dual port memory Module 1 implementation ARM7 ISS Module 2 implemenbtation ARM7 ISS CPU wrapper CPU wrapper Memory Busses (32)
M1 M2 T1 T3 T2 T4 Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SRAM dual port SRAM MPA SRAM MPA SRAM dual port SRAM dual port Experience 1: Dual port memory Module 1 implementation ARM7 ISS Module 2 implemenbtation ARM7 ISS CPU wrapper CPU wrapper Memory Busses (32)
Module 1 implementation ARM7 ISS Module 2 implemenbtation ARM7 ISS M1 M2 T1 T3 T2 T4 CPU wrapper CPU wrapper Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SRAM dual port IB1(32) IB2(32) SRAM MPA SRAM MPA Memory Busses (32) SRAM dual port SRAM dual port Experience 1: Dual port memory
Module 1 implementation ARM7 ISS Module 2 implemenbtation ARM7 ISS CPU wrapper CPU wrapper CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER IB1(32) IB2(32) SRAM MPA SRAM MPA Memory Busses (32) SRAM dual port SRAM dual port Experience 1: Dual port memory • MPA services • Test • Address decoding • Access mode • burst mode • burst seq (4 words) • Bank control
M1 M2 T1 T3 T2 T4 Logical channels SDRAM Single port Experience 2: Single port memory
M1 M2 T1 T3 T2 T4 Logical channels SDRAM Single port Experience 2: Single port memory
Module 1 implementation ARM7 ISS Module 2 implementation ARM7 ISS M1 M2 T1 T3 T2 T4 CPU wrapper CPU wrapper Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SDRAM Single port arbiter IB (32) Memory Bus (16) SDRAM Single port Experience 2: Single port memory MEMORY WRAPPER SDRAM MPA
Module 1 implementation ARM7 ISS Module 2 implementation ARM7 ISS M1 M2 T1 T3 T2 T4 CPU wrapper CPU wrapper Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SDRAM Single port Memory Bus (16) SDRAM Single port Experience 2: Single port memory
Module 1 implementation ARM7 ISS Module 2 implementation ARM7 ISS M1 M2 T1 T3 T2 T4 CPU wrapper CPU wrapper Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SDRAM Single port Memory Bus (16) SDRAM Single port Experience 2: Single port memory SDRAM MPA
Module 1 implementation ARM7 ISS Module 2 implementation ARM7 ISS M1 M2 T1 T3 T2 T4 CPU wrapper CPU wrapper Logical channels CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER SDRAM Single port arbiter IB (32) Memory Bus (16) SDRAM Single port Experience 2: Single port memory SDRAM MPA
Module 1 implementation ARM7 ISS Module 2 implementation ARM7 ISS CPU wrapper CPU wrapper CA2 AFIFO + BUFFER CA1 AFIFO + BUFFER arbiter IB (32) SDRAM MPA Memory Bus (16) SDRAM Single port Experience 2: Single port memory • MPA services • Test • Address decoding • Access mode • classic R/W mode • Bank control • Initialization • Refresh • Conversion 16 <-> 32 bits
Results • SystemC code size for the memory wrapper • Experience 1 : 1438 lines • Experience 2 : 1335 lines • Latency (without memory latency) • Write : 3 CPU cycles • Read : 7 CPU cycles (send/receive) • Simulation results of an image of 387 x 222 : • Experience 1: 2.05 millions of CPU cycles • Experience 2: 2.97 millions of CPU cycle • Fast design exploration with different memories thanks to automatic memory wrapper generation
Outline • Introduction • Memory IP based design • Memory integration issues • Architectural Models and Basic Concepts • Memory Wrapper • Generic architecture • Automatic generation • Experiments • Conclusion
Conclusion • Systematic method to integrate Memory IP in the multi-processors SoC architectures at system level • Generic memory wrapper architecture • Port adaptation • Flexibility of the memory architecture • Parallel accesses arbitration • Automatic memory wrapper generation is done by assembling library components • Fast memory design exploration • Application for low-level image processing
Perspectives • Generalization of IP wrapper architecture based on generic wrapper model • Using a sophisticated communication network like AMBA bus and packet switch communication network • Configurable memory test bench