Presenter : Ching -Hua Huang

National Sun Yat-sen University Embedded System Laboratory A Configurable Bus-Tracer for Error Reproduction in Post-Silicon Validation Shing-Yu Chen ; Ming-Yi Hsiao ; Wen-Ben Jone ; Tien-Fu Chen Nat. Chia-Tung Univ., Hsinchu, Taiwan 2013 International Symposium on VLSI Design, Automation, and Test (VLSI-DAT) Presenter :Ching-Hua Huang

Abstract In today’s modern system-on-chips (SoCs), there are several intellectual properties (IPs) on the system to provide different functionality. However, the more complex communications on SoCsare, the harder the programmer could discover all errors before first silicon during verification. Therefore, we provide a reconfigurable unit for recording the transactions between IPs and adopt logical vector clock [1] as timestamp of each trace. The programmable trigger unit (PTU) in debugging node (DN) could be configured by the validator to cache their interest sequences of transaction. Because the traces of transactions would have their own timestamp, during the post-silicon validation, we could reproduce the errors in faulty transactions between IPs and get more information for bypassing or fixing the problems.

Abstract – Cont. Furthermore, due to several entries of traces, which would shrink observation window very quickly, we also implement a compressor to compress traces before we store them into trace buffer. Finally, our experiments demonstrate that the proposed debugging architecture is capable of recording the critical transactions, and by the proposed reconfigurable debugging unit the debugging execution time can be reduced more than 80%.

Introduction • SoCshave several indispensable benefits. • Re-usable IP blocks simplifies the complexity. • The flexibility for specified applications. • Lowering the power consumption. • SoCsstill have some challenges need to be overcome: Validationand Debugging. • SoCvalidation requires identifying errors in individual IP blocks, their interactions and whole system by running test programs. • The test program wouldn’t stop until a system failure occurs. • SoCdebuggingwill exploits debugging software to localize the failure to a small regionand find the root cause. • Fix or bypass the failure in the end.

What’s the problem 5 • With the rapid progress of process techniques, dozens of Ips and communication fabrics are integrated into a chip. • The more complex communication behaviors will raise the probability of interaction errors, which is due to unexpected communications between Ips. • The main challengesin SoC validation : • Limit length of on-chip trace buffer • The design time would increase because of the hard-to-detect bugs. • Re-generate the faulty sequences because of non-deterministic execution. • A methodology called “cyclic debugging” for removing a hardware bug is often adopted in today SoC development.

Related Work [5] Distrubuted Hardware Matcher Framework [6] A reconfigurable debugging instrument [7] About Post-Silicon Validation It monitor particular sequences of signals about potential errors for each IP. Dynamically create new hardware structures in existing silicon for debugging purposes. Post-silicon validation has four crucial steps: (1)Detecting a problem (2)Localizing the problem (3)Identifying the root cause of the problem (4)Fixing or bypassing the problem [11,12] Some other works based on transaction debugging [9] SigRace [10] IMITATOR To order the transaction sequences and identify relation between each transaction among IPs [8] Scalable DFD architecture with distributed ELA [1] Lamport’s logical vector clock [13] AMBA Open Specifications - ARM To replay the error interactions and fetch crucial information in transactions To better utilize the available on-chip storage for distributed trace buffers This method is based on AHB buses [This paper]

Proposed Method - Summary • A hardware debugging solution is proposed to solve the above challenges. • Key idea is caching transactions of interest as well as their timestamps • Via a hardware monitor piggybacked on the IP’s interface. • Targeted to stored the compressed timestampin the on-chip trace buffer. • This work are emphasize on debugging communications among Ips : • Propose a debugging architecture, debugging node, with programmable unit, for monitoring the interactions between IPs. • Propose a timestamp recording mechanism for ordering non-deterministic interactions between IPs and with compression techniquein order to widen observation windows of traces.

Proposed Method – Debugging Architecture • This work is based on the trace-based debugging . • Each master interface and arbiter with a DN, respectively. • A Global Timing Vector (GTV) for maintaining latest timing vector. • In order to watch erroneous transaction sequences, a DN is attached to IP’s wrapper interface. • Master wrapper provides DN sufficient information to watching any types of transfers the master requested. • DN observes the wrapper signals and compares the signals with the triggered conditions in PTU.

Proposed Method – Debugging Node (1) • First, the trigger condition in the Programmable Trigger Unit (PTU)can be reconfigured by the Control Unit. • After running a test program, each DN record the timing information of the triggered communication into a timestamp unit, called Local Timing Vector (LTV).

Proposed Method – Debugging Node (2) • When PTU had been triggered, the LTV would request the latestGlobal Timing Vector (GTV)and increase the number in field of vector and update GTV. • Subsequently, the detected communication event and its new LTV would be compressed and stored into trace buffer. • When the test program finished, the traces could be utilized for localizing or reproducing error sequences. • It not only processes data for locating bug but also analyzes the timing information for bug reproduction.

Proposed Method – LTV Compressing and Recovering (1) • This work have four masters and an arbiter in the SoCsystem, to consider that each field in a timing vector is 32 bits and each vector has five fields, therefore each LTV will be 160 bits in total. • Obviously, LTVis too large to the trace buffer. • LTVwill occupy a lot of space of the trace buffer and will shrink the trace buffer observation window. • To overcome this problem • Use a compressor before the trace buffer • Widen the observation window • Only recorded the difference of LTVs in the trace buffer.

Proposed Method – LTV Compressing and Recovering (2) • The progress of LTV compression is as follows: • (1) When compressor gets the latest LTV, it computes the difference between the incoming LTVand writes the difference to the trace buffer. • (2) Compressor copies the incoming LTV • (3) The difference will be processed to recover the order info. • Finally, it get several recovered timing vectors from each masters or arbiters in the system.

Before Experiment • Limit length of on-chip trace buffer • Reduce Design time ? • Observation window length? • Re-generate the faulty sequences because of non-deterministic execution. • Vector timestamp ?

Experiment setup • The debugging system is implemented in C++ and embed it into a 4-processor SoCsystem [14]. • The SoC has 4 ARM cores with individual 32-Kbyte L1 cache, a 64-Kbyte shared L2 cache that are connected in a system bus. • The debugging system has 5 DNsincluding 1-Kbyte trace buffer. • To capture the WSR and RWDR errors, DNs are configured to watch write and read request of these two maters. • WSRis a master’s write transfers continuously own a slave’s access • System performance may severely slowdown if WSR occurs frequently. • RWDR is another error caused by data race of two masters • It may result system crush due to wrong data operations.

Experiment result • Below shows thenormalized debugging time between different DN configurations and the observation window length for different DN configurations. • The debugging execution time is significantly reduced if DN is configured precisely for detecting specified errors. • For communication errors, the experimental result shows that the observation is significantly improved by using appropriate DN configurations.

Experiment result – Cont. • Below shows the normalized log size overhead of vector timestamp in 20000, 40000 and 80000 transactions processed. • The proposed compression technique improves the log size and trace buffer observation window length.

Conclusion and My comment • Conclusion • SoCvalidation and debugging solutions are increasingly important for future SoCdesigns. This work presents a debugging system for recording and reproducing system interaction errors. • The execution history is recorded into trace buffers on each DN. • After the execution of test program, the special post-analysis algorithms would be utilized for reproducing errorsbetween IPs. • The results of experiments demonstrate the debugging system • It is capable of dealing with a variety of system-level errors • Improves the debugging execution timemore than 80% • Storage overhead with compression technique. • My comments • I think the way of the bug reproduction is difficult to implementation. • Because I can’t forecast that the error happened. • The experimental result is simple to comprehend.

Presenter : Ching -Hua Huang