190 likes | 394 Views
SystemC Simulation Based Memory Controller Optimization . Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys . Agenda. Background Challenges – System Level Memory Controller Architecture – An Example
E N D
SystemC Simulation Based Memory Controller Optimization Primary Author: Ashutosh Pandey Secondary Author(s): Nitin Gupta, Amit Garg Presenter: Ashutosh Pandey Company/Organization: Synopsys
Agenda • Background • Challenges – System Level • Memory Controller Architecture – An Example • Optimization & Configuration • Requirements • Methodology • A case Study • Conclusion
Background • SDRAM controller ‘s are an integral piece of today’s System on Chip (SoC) • SDRAM access performance is one of the primary bottleneck • Memory Controller is responsible for optimizing SDRAM accesses • Across the system • Optimizing JEDEC interface utilization
Challenges – System Level • Early design-space and architecture exploration • System level optimization for targeted use cases • Interconnect configuration • Memory hierarchy (Buffers/caches/ on-chip / off-chip memories) • Memory architecture optimization • Meeting bandwidth/latency requirements for each application / master in the system • System level architecture and design for the targeted use cases and applications • SDRAM hardware architecture optimization
Memory Controller Architecture – An Example AXI Port Scheduler Port Arbiter AXI IF Request Multiplexer SDRAM Memory Access Controller Command Queue AXI Port RdData / WrRspDemux AXI IF AXI IF JEDECIF AXI IF A Sample Memory Controller Programmable interface
Optimization & Configuration - Requirements • System level visibility (end to end latency/throughput) • Memory access co-relation with system traffic • Visualization and analysis of memory interface activity • Root cause analysis for various bottlenecks / limitations • SDRAM architecture exploration
Methodology • Specify system constraints like latency, throughput or utilization • Simulate and analyze constraint violations • Analyze system characteristics to identify bottleneck(s) • Investigate to identify the root cause of the problem • Re-configure the system to address bottleneck(s) • Re-run / re-analyze refined configuration till constraints are satisfied
Memory Controller Optimization – A case study CORE0 SDRAM INTERFACE AXI Bus MEMORY CONTROLLER SDRAM (DDR3) CORE1 AXI PORT INTERFACE (XPI) ARBITER SCHEDULER • Objective: • Optimize memory controller to achieve desired latencies for CORE0 • Optimization on throughput & Utilization is also possible
System Level – Latency Analysis Cumulative average duration de-composed per component Average Duration for Read Transaction
System Level – Latency Analysis CORE0 memory access latency Interconnect latencies • Analysis Result For Round-Robin Arbitration Scheme: • Average Delay for CORE0 transactions in Memory Controller Arbiter is 262ns. But Arbiter alone is causing a delay of 100ns • Average SDRAM access delay for CORE0 is 72ns. Delay in different components of Memory Controller for transactions from CORE0 Delays for memory access for CORE0
Priority Based Arbitration for Memory Controller Arbiter CORE0 memory access latency reduces from 428ns to ~310ns. • Result For Priority based Arbitration Scheme:- • Average Delay for CORE0 transactions reduces from 428ns to 310ns. • Average SDRAM access delay for CORE0 is ~68ns while it was 72ns previously. But it is still 22% of the total Latency. Delay in different components of Memory Controller for transactions from CORE0
Memory Channel Utilization Analysis for CORE0 Detailed Memory Channel Utilization for the entire system COMMAND and DATA PHASE utilization for CORE0 only HIT = 8.4 % MISS = 12.315 % For optimum architecture HIT % >> MISS %
Initial Inferences • A large percentage of accesses are resulting in page misses, causing: • Increased access latency, • In-efficient usage of JEDEC interface and • Higher power consumption due to increased “precharges” and “activates” • Possible reasons for inefficient system could be • Mapping of application addresses to memory addresses • Page policy • Page crossovers • Rank crossovers
Memory Channel Utilization Analysis for CORE0 Maximum MISSES are due to transaction in same Bank but different rows COMMAND and DATA PHASE utilization for Core_0 with CMD_PHASE divided into CMD setup due to different reasons for MISS.
Refined Inferences • The reason for almost all Page MISS in this system is transaction on same Bank but different Page • Possible solution for resolving this • Change in traffic (not always possible) • Use a memory with bigger page sizes
Increasing Page Size from 1KB to 2KB Command Setup phase drastically reduces from 41.262% to 15% Increased page size results in desired performance Zero Page Miss. All memory accesses result in Page Hit.
Effect of increasing Page Size on overall Delay CORE0 memory access latency reduces from 310ns to ~222ns. Delays for memory access for CORE0 reduces from 68ns to 52 ns.
Conclusions • System level performance analysis allows detection of performance problems • Detailed data path visibility allows identification of hot-spots, e.g: • Arbitration scheme & • Hit/Miss ratio of SDRAM • Analyzing the hot-spots allows identification of root causes • Systematic refinement allows creation of optimum architecture for targeted use cases