190 likes | 308 Views
Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto, M5S 3G4 yangy@eecg.utronto.ca Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09. EICE team. Presenter : Shih -Tung Huang. Nicola Nicolici Dept. of ECE
E N D
Automated Data Analysis Solutions to Silicon Debug Yu-Shen Yang Dept. of ECE University of Toronto Toronto, M5S 3G4 yangy@eecg.utronto.ca Design, Automation & Test in Europe Conference & Exhibition, 2009. DATE '09. EICE team Presenter :Shih-Tung Huang Nicola Nicolici Dept. of ECE McMaster University Hamilton, L8S 4K1 nicola@ece.mcmaster.ca Andreas Veneris Dept. of ECE & CS University of Toronto Toronto, M5S 3G4 veneris@eecg.utronto.ca
Research tree C/C++ Development Tooling IEEE 1149.7 Eclipse dIP: A Non-Intrusive Debugging IP for Dynamic Data Race Detection in Many-core GDB OpenOCD [paper] [thesis] [user guide] FLEXIBLE DEBUGGING FRAMEWORK FOR HETEROGENEOUS MULTICORE PLATFORM Multi-core debugging environment
Abstract 3 • Since pre-silicon functional verification is insufficient to detect all design errors, re-spins are often needed due to malfunctions that escape into the silicon. This paper presents an automated software solution to analyze the data collected during silicon debug. The proposed methodology analyzes the test sequences to detect suspects in both the spatial and the temporal domain. A set of software debug techniques are proposed to analyze the acquired data from the hardware testing and provide suggestions for the setup of the test environment in the next debug session. A comprehensive set of experiments demonstrate its effectiveness in terms of run-time and resolution.
What’s the problem Spatial domain temporal domain 4 • Post-silicon verification • Scan chain [6][7] • Trace buffer [8][9] • To find bug root engineer need to use above tool again and again • Trace buffer is limited • Spend lot of time to fine bug root • This paper propose method • Automated software help engineer to find bug root • Spatial domain • Which module may cause bug root • temporal domain • what time dose the bug happen
Introduction 5 • Error/defect can type into two kind • Deterministic • Input are controlled synchronously • Non-deterministic • Input are controlled asynchronously • interrupts from peripherals or timing of refresh cycles for dynamic memories • Bug will happen when event trigged • contribution • Automated figure 1 flow to help find suspect(s) in hierarchical manner
Related work Similar to Hierarchical debug in Post-silicon[16] X-simulation[17] Scan chain [6][7] Trace buffer [8][9] Similar approach [10][11][12] software hardware used reference This paper method Automated data analysis (post-silicon verification) 6
Propose methodology assumption • Erroneous silicon behavior is deterministic • Methodology can only find deterministic suspect(s) • Can access internal states’ value • internal states’ value mean register value • Through scan chain [6][7] and trace buffer [8][9] • Functional bug Can escape in silicon by use programmable hardware[14] • Partial states equivalence • Golden model is high-level model • There are some states can not be mapped • test vectors faulting due to silicon signals error 7
Silicon scan chan and trace buffer • Traced group is 16bits width and use multiplexer to select traced group which controlled by JTAG • Trace buffer can be separateinto two segment for tracing difference time’s and group’s single 8
Propose methodology overview • This methodology have three goals • Find suspect modules in in hierarchical manner (Spatial domain) • Find which time slot (critial interval) does bug happen (temporal domain) • Find state elements which in bug propagation path • Start form reducing test vectors • This paper believe if time before Tn‘s pattern is pass, then we can eliminate the pattern before Tn • Error is not in patten before Tn • Start from Tn in next debug section • Store state elements’ value 9
Propose methodology step • This algorithm can separateinto three step • Step1 : Hierarchical Diagnosis analysis which module is suspect • Step2 : Timeframe Diagnosis find which time slot (critical interval) does error happen • Step3 :X-simulation[17] simulate unknown output port value as golden result Round, defined by n, how many level should be examine Debug session Debug runs 10
Hierarchical Diagnosis • Goal: identify suspect modules • Compare real output value and expect output value which came from X-simulation • by Boolean satisfiability instance • How many hierarchy level should examine in one debug session is defined by n • Output suspect list to Timeframe Diagnosis First round Second round 11
Timeframe Diagnosis • Goal: identify which time slot (critical interval) dose bug happen • How to determine time slot • By the algorithm below 12
Timeframe Diagnosis • How to identify which time slot dose bug happen • By comparing real value and golden result at Tn , Tn+3 and Tn+6 • If Tn is OK and Tn+3, Tn+6 are error, then bug is happen in Timeframe module 1 13
Timeframe Diagnosis • Problem • Partial state equivalence problem • Number of Golden state element is not enough to map real state element • S3 error happen between Tm and Tn, but only when Tn+i can be detected • S1 and S2 have Golden state element, but S3 don’t have Tn+i 14
Timeframe Diagnosis • Solution • Algorithm have all state elements’ value at initial module (time at Tn) • When time at Tn+i algorithm detect error happen • Return all state elements’ value and restart before time Tn • Objective • Dump all state elements’ value which around error happened time Tn+i 15
Experiment 1 • More then 90% module is reduced • Lot amount of trace data can be reduced 16
Experiment 2 • Force on n value • How many hierarchy level should examine in one debug session • Fig 8(a), identify more suspects when n increase, because not enough information • Timeframe 17
Experiences 18 • Four different benchmarks worse case performance down is 12.25% • Compare with related work [9]
Conclusion 19 • At first, this paper force many-core data race detection • through now how many solutions are proposed, benefits and drawbacks • Hardware: use lot of huge area when core number increase • Software: probe effect • Second, propose the ideal of this paper • Use relate work [3] to prevent probe effect • Use relate work [4] for Data race detection • Proposed overall framework • Finally, use experiences to show the maximum performance down is 12.25% of the four testbench