160 likes | 187 Views
This explores SoC architecture design challenges, covering interconnect topology, arbitration, QoS, memory configuration, and performance optimization. It discusses leveraging traffic simulation techniques to accelerate analysis and achieve accurate results. The proposed approach involves abstracting components with bus transactors to simulate realistic traffic flows and explore new architectures efficiently. It emphasizes the trade-off between accuracy and simulation speed to meet performance requirements. The verification process includes system Verilog protocols, AXI traffic characterization, and performance exploration tools to ensure functionality and performance criteria are met. The use of VPE (formerly AVIP) for abstraction and investigation of system components is highlighted, enabling detailed analysis without compromising accuracy. This approach aims to streamline SoC architecture exploration while maintaining accuracy and efficiency in simulation processes.
E N D
Fast SoC Architecture Exploration Using Traffic Simulation Techniques Nadjib Mammeri, ARM
Problems we are trying to solve What interconnect topology should I use? What arbitration and QoS schemes? How should I configure my memory controller? DMC queue length? Memory width? How to optimally size my interconnect/memory system and still meet my performance requirements?
SoC Architecture Exploration Current Techniques Spreadsheet: Not accurate, Fast, Cheap RTL simulation: 100% Accurate, Slow, Expensive RTL emulation: Accurate, Fast, Expensive Behavioural SystemC models: Accurate, Fast, Expensive Traffic Profiling: ~Accurate, Fast, Cheap Abstracting away some components or parts of the system and replacing them with bus transactors that can: Generate realistic traffic which is statistically equivalent to SoC data flows Re-use existing data flows to explore new architectures Uses constrained random techniques
Our proposed approach • Iteration time of a spreadsheet with the accuracy approaching RTL simulation LOW Mathematical formula, not dynamic LOW Spreadsheet Analysis minutes/hours Statistical or recorded traffic profiles RTL simulation, VPE, User VIP Industry standards VIP minutes/hours Cycle time Realistic behaviour Acceleration/ Emulation VIP, Logic Tiles, SW Adding S/W, external I/F with realistic scenarios days/weeks Observe actual behaviour Silicon/ Applications months/years HIGH HIGH
How is it done? When analysing performance, content or functional intent of the data is not important but the nature and flow of traffic is. Reduction in simulation time can be achieved by trading off functional accuracy of end points. Accuracy should be preserved in the DUT and in the interconnect because it is the performance bottleneck. How simulation speed-up is achieved By ‘giving-up’ execution of functions within the emulated device in favour of emulating its traffic No need to model their cycle-accurate behaviour By replacing real data with constrained random data
Functional Verification Complete AXI functional Verification solution System Verilog Master, Slave, Monitor RTL Protocol assertions RTL Coverage Points Performance Exploration Profile editor toolkit GUI RTL Profile extraction RTL Profile generation AXI Traffic Characterization and Analysis AXI Traffic Replay and Adaptation Profile Data Profile Data IEEE 1800 SystemVerilog Testbench AXI Slave Interface AXI Master Interface DUT User Customer VIP AXI Slave AXI Master AXI Master AXI Monitor (Block or Sub-system) Customer IP AXI Master Interface AXI Slave Interface What is VPE (formerly AVIP) ?
Abstraction example1 If I would like to investigate my interconnect topology, I would keep the RTL for my interconnect and abstract away all end points (masters and slaves). Replace them with VPE masters and slaves Master Master Master Master Monitor Monitor Monitor Monitor Master2 Master 3 Master 4 Master 1 AXI Interconnect AXI Interconnect Slave 2 Monitor Slave 1 Slave Slave
Abstraction example2 If I would like to investigate my memory controller configurability, I would use the RTL for my interconnect and DMC and abstract away other end points. Replace them with VPE masters and slaves Master Master Master Master Monitor Monitor Monitor Monitor AXI Interconnect Monitor Slave DMC Master2 Master 3 Master 4 Master 1 AXI Interconnect Slave 1 DMC
Traffic Profiling (1) Traffic profiles statistically characterise the traffic (transactions) on an AXI connection Traffic flow is an identifiable stream of traffic (AXI transactions) between two points in a system Examples: When profiling at slave 1, traffic coming from Master 2 can be identified using AxID If we know Master 1 always does 4-beat bursts we can identify its traffic flow based on AxLEN
Traffic Profiling (2) A profile is associated with a connection and can have multiple flows Flows contain histograms that store statistical data of both payload and timings information. Payload histograms Histograms describing traffic payload information (control of a transaction, response of a transaction but no data content) ADDRESS, ID, BURST, SIZE, LEN, RESP etc… Timing histograms Histograms describing traffic timings information ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc…
AXI Timing Histograms Inter transaction timings ITT: Histogram parameter defining the inter-transaction timings in a flow (time between successive transactions). Intra transaction timings Flow timings: timings that describe the flow of traffic. Connection timings: timings that are considered as properties of the connection
AXI Intra-Transaction Timings RIL: Time between handshake on the AR channel and the first read transfer on the R channel RW: Time between RVALID and RREADY WIL: Time between handshake on the AW channel and the first write transfer on the W channel WW: Time between WVALID and WREADY
How accurate is it? • 4 hours to 4 minutes – VPE Master executing 2M cycles of traffic profile in place of real Mali200 RTL running Proxycon/Samurai content Real RTL Original captured traffic profile now used to drive VPE Master VPE profile executes much faster than real RTL but generates represent able & controllable traffic VPE Profile
Master Slave Monitor More VPE Features AXI Protocol checker AXI Protocol coverage Traffic profile extraction Transaction recording/ visualisation
Conclusion System architects requires novel techniques with short iteration times to analyze performance and fine tune their SoCs. VPE introduces a new approach that combines high level modeling and statistical low level random generation techniques to explore and verify IP performance. Traffic profiling can be used by VPE masters and slaves to generate statistically equivalent traffic and by VPE monitors when monitoring performance.