430 likes | 592 Views
RTL-Synchronized Transaction Reference Models. Dave Whipp Fast-Chip Inc. Motivation. Needed Cycle Verification Now, not 6 months later Why build two models, when one will do We had a working “functional” model Don’t Chase RTL Avoid modeling artifacts of the implementation. Overview.
E N D
RTL-SynchronizedTransaction Reference Models Dave Whipp Fast-Chip Inc.
Motivation • Needed Cycle Verification • Now, not 6 months later • Why build two models, when one will do • We had a working “functional” model • Don’t Chase RTL • Avoid modeling artifacts of the implementation
Overview • What is Transaction Synchronization • Patterns in Transaction Synchronization • Methodology, Futures, Summary
Part 1 What is Transaction Synchronization?
A Functional Model int classify_packet ( Packetpacket_data, Uint32 rule_address ) { int result = ITERATE while (result == ITERATE) { RuleStruct rule; read_rule(&rule, rule_address); int field = extract(rule, packet_data); interpret(rule, field, &result, &rule_address); } return result; }
“Bringup” Flow csim.log C-sim test.script Compare RTL-sim rtl.log
Transaction Interactions Thread A Thread B Read-Rule Write-Rule Rules DB
Trace Files • A trace of the sequence of transaction steps • Each synch point has a name, and thread-ID • Comments provide context (values from RTL) • Often hand-edited during debug Example: [1536] read_rulethread_A# addr=h8a34 data=h1578 [1544] write_rulethread_B# addr=h8a34 data=h5343 [1632] read_rulethread_A#addr=h8a34 data=h5343 [1694] write_rulethread_B# addr=h8a34 data=hf519 [1694] read_rule thread_A# addr=h8a34 data=hf519
“Synchronized” Flow csim.log C-sim Compare test.script RTL-sim rtl.log
Simulation Kernel Read Synch Call Synch function [pending] [not pending] Pending Synch Points (task list) Read Stimulus
Monitor A Delay Delay Mem Arb B Memory Access with Arbiter
Monitor B Monitor A Memory B A Delay Delay Dual Port Memory Access
A Functional Model int classify_packet ( Packet packet_data, Uint32 rule_address ) { int result = ITERATE while (result == ITERATE) { RuleStruct rule; read_rule(&rule, rule_address); int field = extract(rule, packet_data); interpret(rule, field, &result, &rule_address); } return result; } } intcontinue_read_rule () {
Refactoring • Move local variables into a “context” structure. Create an instance (on the heap, not the stack) at start of transaction – and delete at end. • Replace iterative loops with recursive functions. • For each function that requires synchronization (directly or indirectly), replace the call with a request/callback pair.
“Context” Structure struct context { Packet packet_data; Uint32 rule_address; RuleStruct rule; int field; int result; void (*callback) (int); };
Introduce Context Structure void classify_packet_request ( Packet packet_data, Uint32 rule_address, void (*callback)(int)) { struct context *cxt = calloc(1, sizeof(struct context)); cxt->packet_data = packet_data; cxt->rule_address = rule_address; cxt->callback = callback; cxt->result = ITERATE; classify_packet_iterate(cxt); } void packet_classify_reply(struct context *cxt) { int result = cxt->result; void(*callback)(int) = cxt->callback; free(cxt); callback(result); }
Non-Recursive Implementation void classify_packet_iterate ( struct context *cxt ) { while (cxt->result == ITERATE) { read_rule(&cxt->rule, cxt->rule_address); cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); } classify_packet_reply(cxt); }
Recursive Implementation void classify_packet_iterate ( struct context *cxt ) { if (cxt->result == ITERATE) { read_rule(&cxt->rule, cxt->rule_address); cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); classify_packet_iterate(cxt); } else { classify_packet_reply(cxt); } }
Synchronized Implementation void classify_packet_iterate ( struct context *cxt ) { if (cxt->result == ITERATE) { read_rule_request(&cxt->rule, cxt->rule_address, &classify_packet_continue); } else { classify_packet_reply(cxt); } } void continue_read_rule ( struct context *cxt ) { cxt->field = extract(cxt->rule, cxt->packet_data); interpret(cxt->rule, cxt->field, &cxt->result, &cxt->rule_address); classify_packet_iterate(cxt); }
Rules DB Packet Buffer Transaction Diagrams Classify Packet [done] [iterate] Read Rule Extract Interpret
Part 2 Patterns in Transaction Synchronization
Adding a Cache • Cache needn’t effect transactions • Data-RAM not modeled • cache is coherent • Can rerun all tests, with no changes to C model • Tag RAM is an Addition, not Modification • Independent Transactions • Independent Synchronization
Read/Write Check ECC Miss Rd/Wr Tag RAM A Hit Rd/Wr Delay Delay Cache Mem Arb B Correct Errors Single Port, Cached
Read Tag [hit] [miss] Read Data Write Tag Read Tag Check ECC Write Tag Cache Transaction (Read)
FIFOs and Counters • Delay elements need no synchronization • But synchronization can increase locality • Some FIFOs can drop transactions • Synchronize overflow: don’t model actual size • Counters seem to need cycle-based model • We want to avoid this • Correct Synch propagates “forces” to Model
Push Pop Force Producer FIFO Consumer Flow Control Drop Synchronizing a FIFO
Pop FIFO Transaction Diagram [drop] [push]
Checker: Queue Size Assertions Push Producer FIFO Consumer Pop Drop FIFO Synchronization Checker
Force Sample Update value value Counters load Register Client +1 clk select sample_en
Scaffolding • Permit verification incomplete RTL • Encourage end-to-end skeletons • Implement “incorrect, but simple” algorithms • Don’t wait for complete RTL • Postpone modeling the algorithm • Use synch to avoid chasing a moving target • Remove scaffolding once RTL is complete
Hit Rd/Wr Miss Rd/Wr Tag Ram Read Node Result Cache Hit Miss An Algorithm Cache Node Memory Tree Search
Read Tag [hit] [miss] Backdoor search Algorithm Cache: Transactions Read Node [match] [iterate] [No match]
Speculation • When hardware speculates: • Effect precedes cause • Transaction model appears incorrect • Creative accounting can sometimes help • Insert a “virtual” delay • Filter based on future events
Read Data Read Data Speculation Read Ctrl
Speculation Read Ctrl Read Data
write Delay (2 clocks)? read Update Data RAM Update Ctrl RAM advance (2 clocks)? read write Speculative Reads Lookup (Pipe) Stage 1 Stage 2 Stage 3 Stage 4
Part 3 Methodology, Futures, Summary
Verification Flow • RTL Simulation is expensive • Licenses • CPU time • Post-Processing is cheap • Stop simulations when broken • But not if bug is in test/model
Methodology • Cycle-Precise Reference Comparison • Without a cycle-accurate model • Verify the System First • Bringup Flow (Function Model) • Synchronized Flow (Transaction-Testbench) • Postpone module level testing • Use scoreboarding to identify unit testbenches • Only build unit-testbenches for stable modules
Comparison with Platform-Based • System-on-Chip Methodology • Verify components first • Verify system as composition of verified units • Complex-ASIC Methodology • Verify transactions first • Verify units in context of verified transactions • An “Agile” Methodology
Future Work • Performance in non-synchronized mode • Use threading to avoid fragmentation • Synchronization as basis of SW architecture • Cycle-model plug-in could provide synch • Can postpone this plug-in until tapeout • But what if we want a cycle-model earlier? • Example: up-front performance validation
Summay • Cycle timing is a “Don’t Care” • Initial verification uses “Functional” model • Refactor into “Transaction” model • RTL provides cycle timing • Caches, like FIFOs, are just delay elements • “Forces” in testbench propagate to model • “Coarse-grain first” methodology
Questions mailto:Dave@Whipp.name http://Dave.Whipp.name/dv