E N D
1. Fast SoC Architecture Exploration Using Traffic Simulation Techniques Nadjib Mammeri, ARM
2. Problems we are trying to solve What interconnect topology should I use? What arbitration and QoS schemes?
How should I configure my memory controller? DMC queue length? Memory width?
How to optimally size my interconnect/memory system and still meet my performance requirements?
3. SoC Architecture Exploration Current Techniques
Spreadsheet: Not accurate, Fast, Cheap
RTL simulation: 100% Accurate, Slow, Expensive
RTL emulation: Accurate, Fast, Expensive
Behavioural SystemC models: Accurate, Fast, Expensive
Traffic Profiling: ~Accurate, Fast, Cheap
Abstracting away some components or parts of the system and replacing them with bus transactors that can:
Generate realistic traffic which is statistically equivalent to SoC data flows
Re-use existing data flows to explore new architectures
Uses constrained random techniques
4. Our proposed approach VPE provides the accuracy of RTL simulation but drastically reduces cycle time when compared to building a conventional system for analysis
Faster than developing a cycle-accurate System-C
“Generating a Mali200 traffic profile took us 3 days to create given the RTL testbench“ – Project Technical Lead PD Fabric Verification
More accurate than Excel
“VPE animates benchmarking data to bridge the gap between spreadsheet analysis and slow RTL simulation" – Senior Technical Marketing Manager PD Fabric Marketing
The main advantage of AVIP is its ability to execute much more quickly than RTL, but at the same time, enable you to generate traffic that you can represent and control by emulating its traffic patterns instead of executing functions within the emulated master or slave device. For this reason, it is quicker to use traffic profiling than to develop a cycle accurate SystemC model. It is also more accurate to use traffic profiling than to perform spreadsheet analysis, because traffic profiling bridges the gap between spreadsheet analysis and slow RTL simulation.VPE provides the accuracy of RTL simulation but drastically reduces cycle time when compared to building a conventional system for analysis
Faster than developing a cycle-accurate System-C
“Generating a Mali200 traffic profile took us 3 days to create given the RTL testbench“ – Project Technical Lead PD Fabric Verification
More accurate than Excel
“VPE animates benchmarking data to bridge the gap between spreadsheet analysis and slow RTL simulation" – Senior Technical Marketing Manager PD Fabric Marketing
The main advantage of AVIP is its ability to execute much more quickly than RTL, but at the same time, enable you to generate traffic that you can represent and control by emulating its traffic patterns instead of executing functions within the emulated master or slave device. For this reason, it is quicker to use traffic profiling than to develop a cycle accurate SystemC model. It is also more accurate to use traffic profiling than to perform spreadsheet analysis, because traffic profiling bridges the gap between spreadsheet analysis and slow RTL simulation.
5. How is it done? When analysing performance, content or functional intent of the data is not important but the nature and flow of traffic is.
Reduction in simulation time can be achieved by trading off functional accuracy of end points.
Accuracy should be preserved in the DUT and in the interconnect because it is the performance bottleneck.
How simulation speed-up is achieved
By ‘giving-up’ execution of functions within the emulated device in favour of emulating its traffic
No need to model their cycle-accurate behaviour
By replacing real data with constrained random data
-> So we need these bus transactors that can generate meaningful and controllable traffic. This is VPE.-> So we need these bus transactors that can generate meaningful and controllable traffic. This is VPE.
6. Functional Verification
Complete AXI functional Verification solution
System Verilog Master, Slave, Monitor
RTL Protocol assertions
RTL Coverage Points
Performance Exploration
Profile editor toolkit GUI
RTL Profile extraction
RTL Profile generation
AXI Traffic Characterization and Analysis
AXI Traffic Replay and Adaptation
What is VPE (formerly AVIP) ? VPE provides the benefit of 2 products in one:
1/ The unprecedented facility to capture, analyse and replay AXI pus performance statistics
2/ VPE also provides all of the hygiene factors that conventional EDA from VIP provides :
Functional directed and constrained random testing
Protocol checking
Protocol coverage
VPE is compatible with all main System Verilog simulators: Synopsys VCS, Cadence Incisive and Mentor Questa (ModelSim)
VPE provides the benefit of 2 products in one:
1/ The unprecedented facility to capture, analyse and replay AXI pus performance statistics
2/ VPE also provides all of the hygiene factors that conventional EDA from VIP provides :
Functional directed and constrained random testing
Protocol checking
Protocol coverage
VPE is compatible with all main System Verilog simulators: Synopsys VCS, Cadence Incisive and Mentor Questa (ModelSim)
7. Abstraction example1 If I would like to investigate my interconnect topology, I would keep the RTL for my interconnect and abstract away all end points (masters and slaves).
Replace them with VPE masters and slaves
8. Abstraction example2 If I would like to investigate my memory controller configurability, I would use the RTL for my interconnect and DMC and abstract away other end points.
Replace them with VPE masters and slaves
9. Traffic Profiling (1) Traffic profiles statistically characterise the traffic (transactions) on an AXI connection
Traffic flow is an identifiable stream of traffic (AXI transactions) between two points in a system
Examples:
When profiling at slave 1, traffic coming from Master 2 can be identified using AxID
If we know Master 1 always does 4-beat bursts we can identify its traffic flow based on AxLEN
10. Traffic Profiling (2) A profile is associated with a connection and can have multiple flows
Flows contain histograms that store statistical data of both payload and timings information.
Payload histograms
Histograms describing traffic payload information (control of a transaction, response of a transaction but no data content)
ADDRESS, ID, BURST, SIZE, LEN, RESP etc…
Timing histograms
Histograms describing traffic timings information
ITT, AWW, WW, WIL, WBL, ARW, RW, RBL etc…
11. AXI Timing Histograms Inter transaction timings
ITT: Histogram parameter defining the inter-transaction timings in a flow (time between successive transactions).
Intra transaction timings
Flow timings: timings that describe the flow of traffic.
Connection timings: timings that are considered as properties of the connection
-If I’m the master. Set AW payload and then set the Valid signal and then wait for a Ready signal. I do not control the time between my AWValid and Ready
This time is a property of the connection. This is a connection timing
- But I do control the time between my AW request and when to send Data on the W channel. This is a flow timing.-If I’m the master. Set AW payload and then set the Valid signal and then wait for a Ready signal. I do not control the time between my AWValid and Ready
This time is a property of the connection. This is a connection timing
- But I do control the time between my AW request and when to send Data on the W channel. This is a flow timing.
12. AXI Intra-Transaction Timings RIL: Time between handshake on the AR channel and the first read transfer on the R channel
RW: Time between RVALID and RREADY
WIL: Time between handshake on the AW channel and the first write transfer on the W channel
WW: Time between WVALID and WREADY
13. How accurate is it? The waveform view in the background shows two simulation traces separated by a green clock line in the middle of the waveform display
A/ The top trace is a capture from a Mali200 running in an RTL simulation which executed 2 million cycles in approx 4 hours
B/ The bottom trace is a replay capture from and VPE master emulating the bus traffic of a Mali200. This simulation took 4 minutes to run 2M cycles!
The build shows how:
1/ The traffic profiling toolkit was used to define an ‘empty’ Mali200 profile which was then populated using an VPE monitor. The current graphic is showing a populated profile with bandwidth analysis figures visible.
2/ How the populated traffic profile was then used to drive an VPE master in a simulation where the Mali200 was ‘swapped out’ by the VPE master
3/ The next build points out the waveform generated by the VPE master – note visually how the distribution and payloads statistically match the original
4/ The final build shows that the same monitor was used to re-capture the profile from the VPE master giving the same bandwidth/latency distribution
The waveform view in the background shows two simulation traces separated by a green clock line in the middle of the waveform display
A/ The top trace is a capture from a Mali200 running in an RTL simulation which executed 2 million cycles in approx 4 hours
B/ The bottom trace is a replay capture from and VPE master emulating the bus traffic of a Mali200. This simulation took 4 minutes to run 2M cycles!
The build shows how:
1/ The traffic profiling toolkit was used to define an ‘empty’ Mali200 profile which was then populated using an VPE monitor. The current graphic is showing a populated profile with bandwidth analysis figures visible.
2/ How the populated traffic profile was then used to drive an VPE master in a simulation where the Mali200 was ‘swapped out’ by the VPE master
3/ The next build points out the waveform generated by the VPE master – note visually how the distribution and payloads statistically match the original
4/ The final build shows that the same monitor was used to re-capture the profile from the VPE master giving the same bandwidth/latency distribution
14. 14 More VPE Features
15. Conclusion System architects requires novel techniques with short iteration times to analyze performance and fine tune their SoCs.
VPE introduces a new approach that combines high level modeling and statistical low level random generation techniques to explore and verify IP performance.
Traffic profiling can be used by VPE masters and slaves to generate statistically equivalent traffic and by VPE monitors when monitoring performance.
16. Questions