SystemC Simulation Based Memory Controller Optimization

SystemC Simulation Based Memory Controller Optimization Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys

Agenda • Background • Challenges – System Level • Memory Controller Architecture – An Example • Optimization & Configuration • Requirements • Methodology • A case Study • Conclusion

Background • SDRAM controller ‘s are an integral piece of today’s System on Chip (SoC) • SDRAM access performance is one of the primary bottleneck • Memory Controller is responsible for optimizing SDRAM accesses • Across the system • Optimizing JEDEC interface utilization

Challenges – System Level • Early design-space and architecture exploration • System level optimization for targeted use cases • Interconnect configuration • Memory hierarchy (Buffers/caches/ on-chip / off-chip memories) • Memory architecture optimization • Meeting bandwidth/latency requirements for each application / master in the system • System level architecture and design for the targeted use cases and applications • SDRAM hardware architecture optimization

Memory Controller Architecture – An Example AXI Port Scheduler Port Arbiter AXI IF Request Multiplexer SDRAM Memory Access Controller Command Queue AXI Port RdData / WrRspDemux AXI IF AXI IF JEDECIF AXI IF A Sample Memory Controller Programmable interface

Optimization & Configuration - Requirements • System level visibility (end to end latency/throughput) • Memory access co-relation with system traffic • Visualization and analysis of memory interface activity • Root cause analysis for various bottlenecks / limitations • SDRAM architecture exploration

Methodology • Specify system constraints like latency, throughput or utilization • Simulate and analyze constraint violations • Analyze system characteristics to identify bottleneck(s) • Investigate to identify the root cause of the problem • Re-configure the system to address bottleneck(s) • Re-run / re-analyze refined configuration till constraints are satisfied

Memory Controller Optimization – A case study CORE0 SDRAM INTERFACE AXI Bus MEMORY CONTROLLER SDRAM (DDR3) CORE1 AXI PORT INTERFACE (XPI) ARBITER SCHEDULER • Objective: • Optimize memory controller to achieve desired latencies for CORE0 • Optimization on throughput & Utilization is also possible

System Level – Latency Analysis Cumulative average duration de-composed per component Average Duration for Read Transaction

System Level – Latency Analysis CORE0 memory access latency Interconnect latencies • Analysis Result For Round-Robin Arbitration Scheme: • Average Delay for CORE0 transactions in Memory Controller Arbiter is 262ns. But Arbiter alone is causing a delay of 100ns • Average SDRAM access delay for CORE0 is 72ns. Delay in different components of Memory Controller for transactions from CORE0 Delays for memory access for CORE0

Priority Based Arbitration for Memory Controller Arbiter CORE0 memory access latency reduces from 428ns to ~310ns. • Result For Priority based Arbitration Scheme:- • Average Delay for CORE0 transactions reduces from 428ns to 310ns. • Average SDRAM access delay for CORE0 is ~68ns while it was 72ns previously. But it is still 22% of the total Latency. Delay in different components of Memory Controller for transactions from CORE0

Memory Channel Utilization Analysis for CORE0 Detailed Memory Channel Utilization for the entire system COMMAND and DATA PHASE utilization for CORE0 only HIT = 8.4 % MISS = 12.315 % For optimum architecture HIT % >> MISS %

Initial Inferences • A large percentage of accesses are resulting in page misses, causing: • Increased access latency, • In-efficient usage of JEDEC interface and • Higher power consumption due to increased “precharges” and “activates” • Possible reasons for inefficient system could be • Mapping of application addresses to memory addresses • Page policy • Page crossovers • Rank crossovers

Memory Channel Utilization Analysis for CORE0 Maximum MISSES are due to transaction in same Bank but different rows COMMAND and DATA PHASE utilization for Core_0 with CMD_PHASE divided into CMD setup due to different reasons for MISS.

Refined Inferences • The reason for almost all Page MISS in this system is transaction on same Bank but different Page • Possible solution for resolving this • Change in traffic (not always possible) • Use a memory with bigger page sizes

Increasing Page Size from 1KB to 2KB Command Setup phase drastically reduces from 41.262% to 15% Increased page size results in desired performance Zero Page Miss. All memory accesses result in Page Hit.

Effect of increasing Page Size on overall Delay CORE0 memory access latency reduces from 310ns to ~222ns. Delays for memory access for CORE0 reduces from 68ns to 52 ns.

Conclusions • System level performance analysis allows detection of performance problems • Detailed data path visibility allows identification of hot-spots, e.g: • Arbitration scheme & • Hit/Miss ratio of SDRAM • Analyzing the hot-spots allows identification of root causes • Systematic refinement allows creation of optimum architecture for targeted use cases

Q&A

SystemC Simulation Based Memory Controller Optimization

SystemC Simulation Based Memory Controller Optimization

Presentation Transcript

Simulation optimization

MEMORY OPTIMIZATION

HW/SW SystemC Co-Simulation SoC Validation Platform

Optimization, simulation and integration

SystemC

Fuzzy Controller Tuning Using Bioegeography-Based Optimization

Memory/Cache Optimization

Simulation-based Optimization for Supply Chain Design

SIS100 simulation (Geometry optimization)

Flash Memory project | CONTROLLER FOR flash memory

A SystemC-based methodology for the simulation of dynamically reconfigurable embedded systems

Memory Controller V2

The Memory Controller

MODELING, SIMULATION AND OPTIMIZATION

PINTOS : An Execution Phase Based Optimization and Simulation Tool )

SystemC

Cache (Memory) Performance Optimization

MEMORY OPTIMIZATION

SystemC

Memory Controller V2