150 likes | 312 Views
Requirements for System-on-Chip Analysis from Simulation to Silicon. Pascal Chauvet pascal@sonicsinc.com March 19, 2009. H.264 HiP @ L4.1 decoder. 2D/3D GFX. Transport Demux. Display processing. H.264 HiP @ L4.1 decoder. Host CPU. Audio DSP. INTERCONNECT. Video out.
E N D
Requirements for System-on-Chip Analysis from Simulation to Silicon Pascal Chauvet pascal@sonicsinc.com March 19, 2009
H.264 HiP @ L4.1 decoder 2D/3D GFX Transport Demux Display processing H.264 HiP @ L4.1 decoder Host CPU Audio DSP INTERCONNECT Video out INTERCONNECT Memory Subsystem #1 Memory Subsystem #2 INTERCONNECT Peripherals A communication centric approach • The Interconnect and memory sub-system are the central components that co-ordinate the activity in a multi-core system and greatly affect performance of the complete system • They are the most efficient places to: • Collect performance data • Capture system transaction traces • Centralize system debugging resources Requirements for System-on-Chip Analysis from Simulation to Silicon
SystemC Simulation RTL simulation On-Chip Debug and Performance monitoring Static trace System analysis and debugging Not considered here Common Analysis DB Queries Extract / Compare Performance, functional and debug information Requirements for System-on-Chip Analysis from Simulation to Silicon
Importance of performance analysis • Many SoC architectural choices impact performance • Buffer sizes • Topology • Clock crossing • Width conversion • Threads and concurrency • Multi-channel • Hardware Protection mechanisms • Flow control (Blocking vs. Non-blocking) • Support for special burst types (2D blocks) • Quality of Service • Need area/performance analysis to control die costs Requirements for System-on-Chip Analysis from Simulation to Silicon
Tuning an SoC’s performance • Construct models to analyze the data flow performance of the SoC that consider: • Communication characteristics(expected stimulus) for each core • Communication requirements(desired results) for each core • Grouped into scenarios that represent expected application use cases • Example characteristics • Peak bandwidth, burst size, address pattern, # outstanding transactions • Example requirements • Sustained throughput, average/worst-case latency, DRAM efficiency • Things to analyze • Contention – How do traffic flows interact? • Buffer utilization – Are buffer resources used efficiently? • Quality of Service – Do we meet real-time deadlines while minimizing CPU latency? • Memory performance – Are we effectively scheduling DRAM traffic? Requirements for System-on-Chip Analysis from Simulation to Silicon
Common analysis database • Compare • Simulation traces (explore scenarios) • Performance figures (simulated or measured on chip) • Only retrieve the data that you needed • Aggregate related data • Organize related data by component • Organize related data by traffic flow • Windowed based analysis • Organize data by thread Requirements for System-on-Chip Analysis from Simulation to Silicon
Static trace analysis • You can “data mine” existing device results – simulations and/or emulations – to help analyzing and optimizing for the next generation • No simulation is required! • You can: • Reverse engineer IP-core behavior (treat as black box) • Analyze the system activity for specific applications • Identify potential parallelism among cores • Extract specific address patterns(2D block transfer, raster scan …) • Find potential address locality (extrapolation for DRAM or channel utilization) Requirements for System-on-Chip Analysis from Simulation to Silicon
Stimulus generation • Tons of work has been invested in system modeling • ESL is trying to address that aspect • Use synthetic traffic patterns – or legacy traces • Model cores according to their activity and their role played in the overall performance of the system • Cached processors and DMA behave very differently • Address patterns are crucial for performance analysis • Affect DRAM page miss ratio • Reactive stimulus is best • React to communication delays and change patterns Requirements for System-on-Chip Analysis from Simulation to Silicon
Abstraction for SystemC simulation? • High level cycle approximate simulation can be attractive for faster simulation but: • Hard to interpret aggregated results (workload execution and message transfer times) • What about latency and resource utilization? • Complex modeling techniques and challenging model characterization against RTL • Cycle accurate model may be slower but satisfies all the requirements for system and performance analysis as well as debugging • Structural accuracy and cycle accuracy of the model allows apple to apple comparison with RTL and on-chip • Fine grained instrumentation for a detailed analysis Requirements for System-on-Chip Analysis from Simulation to Silicon
Full visibility of the system Faster simulation speed Full instrumentation with runtime data processing (DB compression) End to end transaction tracing (transaction ID) Arbitration and QoS tracing Buffer utilization Latency (min, max, distribution, …) SystemC simulation Requirements for System-on-Chip Analysis from Simulation to Silicon
Standards • Don’t create proprietary solution • Using industry standards and methodologies is a must • OSCI TLM2.0 (Approximately Timed), OCP-IP TLM • SCV transaction recording API • Standard query language like SQL to retrieve and organize analysis data • Work of the OCP-IP NOC Benchmark working group for system benchmark, application modeling • IP-XACT to capture metadata information about instrumentation Requirements for System-on-Chip Analysis from Simulation to Silicon
On-chip analysis solution • Silicon real estate dedicated to on chip debugging and performance monitoring has to be minimized • Currently less than 3% but growing… • Leverage existing interconnect resources and services – for instance: • Quiescence detection • Correlating requests and responses • Buffer states • … but do not disrupt the normal system behavior Requirements for System-on-Chip Analysis from Simulation to Silicon
What can be done on-chip? • System event detection by looking at multiple core interfaces at the same time • Triggering/filtering • Transaction tracing is expensive as it requires: • Dedicated on-die memory space • Additional hardware to pass transaction ID for end to end tracing • Performance monitoring • Request/Response latency (single or average) • Throughput measurement • Resource utilization (buffers, …) Requirements for System-on-Chip Analysis from Simulation to Silicon
Summary • Checking the performance of a system should be possible at every stages of the development • A consistent analysis environment helps compare all the collected results • Analysis database that collects and organizes related performance data simplifies analysis • Critical IP like the interconnect or the memory subsystem have to be modeled accurately • On-Chip performance monitoring is becoming a must • It must share HW resources with test/debugging capabilities and also other system services Requirements for System-on-Chip Analysis from Simulation to Silicon
Thank you!http://www.sonicinc.com Requirements for System-on-Chip Analysis from Simulation to Silicon