260 likes | 522 Views
FPGA IP Verification for Use in Severe Environments. 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant. Trends. Smaller geometries allow more functions Synthesizable HDL makes design-reuse practical Gate-level design is difficult with high density
E N D
FPGA IP Verification for Use in Severe Environments 2005 MAPLD International Conference September 2005 Paper #237 Ian Land Ian Bryant
Trends • Smaller geometries allow more functions • Synthesizable HDL makes design-reuse practical • Gate-level design is difficult with high density • Resource-intensive • Takes a long time • Increases likelihood of error • Thus, block-level design is needed • Intellectual property (IP) reduces effort and risk, if done right… • A robust design process is followed, with thorough verification • IP is proven in many applications, including space & severe environments • A MIL-STD-1553 example demonstrates
Gate Phase Robust Design Process • Structured design flow should be phase-gated • Proposal • Justification for development and creation of the project plan • Definition and Planning • Preliminary datasheet creation defining the core • Test plan is needed • Development • The core is implemented and deliverables are created • Verification and Validation • Testing against plan and specification (ie. MIL-STD-1553; PCI) • Release • Release of product for volume sales • Configuration Management, Feedback and Revision
MIL-STD-1553 Example • Actel has developed three products • A full-featured BC, RT, MT • A ‘simple’ bus controller • A ‘simple’ remote terminal • Highlight: the simple remote terminal, Core1553BRT • Originally released in 2002 (first production August, 2002) • 12 and 16 MHz version • Updated for minor changes in 12/2002 • Loop back test, version text in code, etc. • Updated for Verilog translation issue in April 2004 • Updated in 9/2004 and 11/2004 to work with design tool updates • Revised to include 20 and 24 MHz versions in January 2005 • Manchester encoders/decoders tested as part of full-featured BC, RT, MT • ProASIC3/E FPGA Family support added
Mil-STD-1553 RT Development • Proposal • Substantial customer demand for MIL-STD-1553 bus interface • Review of specification and competitive products suggested we could improve market offerings with rad-tolerant 1553 FPGA • Definition • MIL-STD-1553 Specification • Preliminary datasheet highlighting the features in the proposal • Development • Developed remote terminal • Paid careful attention to Manchester encoder/decoder blocks that would be re-used across product family • Built two testbenches • Verification – runs full set of tests and mimics validation • User – runs fewer tests for incorporation into larger system design
RT Development, p.2 • Verification and Validation • Stable, tested code with reviewed test results • Check corner cases and key parameters • Make sure parity errors injected on every bit • 12 and 16 MHz; 12 is the harder case due to clock extraction • Tested against existing MIL-STD-1553 COTS tester and • Certified Development Kit at Test Systems, Inc. • Completely for 16 MHz and partially for 12 MHz • Validated Core1553 Evaluation Board • This is important to use with the verification test bench for future updates • Release gives first-rate integration • Core builds complete, board release, release note, user guide, data sheet, certification papers • Solution improves integration • Developed application note, reference design and example designs since 2002
Updates for Speed and Space • Added 20 and 24 MHz in early 2005 (v2.2) • Manchesters validated in full-featured BC, RT, MT core • Moved CLKSPD generic to 2-bit input port • Allows single netlist to support four frequencies • Modified top-level and backend timers • Updated test benches for 20 and 24 MHz and port maps • Fixed erroneous SYNCOUT pulses • Occur with some non-Actel transmitters on the bus • Updating for space in late 2005 (v3.0) • Protect the core from entering illegal states • Hardware test for a babbling transmitter • Re-qualify the core at Test Systems, Inc.
Severe Environment Considerations • Level 3 verification minimum; level 4 validation • MIL-STD-1553 cores have 3rd-party review at Test Systems, Inc. • Requires a validation report review - actions and responses • Have a certification envelope - test VHDL & Verilog versions at different speeds • Have exceptional documentation and support • Tool flow documented with versions for exact design replication • Minimize possibility of integration engineer problems • High coverage standards and well-explained variances • Code coverage target of 100% for RTL • Consider using error detection and correction for memory • Protect the core from entering illegal states and memory upsets • Synplicity default could lock if SEU upset • Adds redundancy and reduces risk • Use EDAC for memory • Avoid the possibility of a babbling transmitter • Can occur if failure of redundant system • Continuously investigate other means to improve quality • Over-sampling • The need for incorporating DO-254
MIL-STD-1553B Tool Issues • Limit tools and document for validated cores • Version 3.0 core will be qualified in hardware with • Synplicity 8.1 used for synthesis • Designer 6.2 used for layout • ModelSim 6.0c Actel OEM used for simulation • So is what happens if a customer uses • Exemplar, or even Synplicity 7.71 • The qualification is not repeatable… • The customer still needs to qualify their system • IP vendors should document what tool versions are used for qualified IP cores to be used in severe environments for • Repeatability • Re-use
Code Coverage • A way to prove that the test benches actually test all the designed in functions • Allows to verify that all lines of code covered • Today’s tools allow • Statement coverage • Branch coverage • Condition Coverage • Expression Coverage • Toggle Coverage • BUT • Does not guarantee that the design actually implements the specification • Both the core and testbench may not include a function
Core1553BRT Code Coverage • Modular core design allows us to create tests to exercise a particular portion of code • Verification Testbench reaches >99% • Non covered lines are inspected and verified, typically conversion functions or branches in code that are coded purely for safety
Branch coverage does not show 100%, but it is. The reason is that we have safe coding, that checks conditions before it does stuff, these conditions are always true but the code is better and safer with these statements. Some others are when INIT => case MUXSEL is when "000" => DSTATE <= WRITE0; -- RX Mode Code when "010" => DSTATE <= TXSTAT; -- TX Mode Code when "001" => DSTATE <= WRITE0; -- RX Data Transfer when "011" => DSTATE <= TXSTAT; -- TX Data Transfer when "100" => DSTATE <= WRITE0; -- Bcast RX Mode Code when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code LATCHSW <= '1'; when "101" => DSTATE <= WRITE0; -- Bcast RX Data Transfer when "111" => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer LATCHSW <= '1'; when others => end case; We never do the others, because we list valid states 0-7 above, but the VHDL language requires us to cover all possible states including "ZZZ" in std_logic, this could be rewritten as -- which would give 100% coverage but whose meaning is not so obvious ! when INIT => case MUXSEL is when "000" => DSTATE <= WRITE0; -- RX Mode Code when "010" => DSTATE <= TXSTAT; -- TX Mode Code when "001" => DSTATE <= WRITE0; -- RX Data Transfer when "011" => DSTATE <= TXSTAT; -- TX Data Transfer when "100" => DSTATE <= WRITE0; -- Bcast RX Mode Code when "110" => DSTATE <= MSGSTAT; -- Bcast TX Mode Code LATCHSW <= '1'; when "101" => DSTATE <= WRITE0; -- Bcast RX Data Transfer when others => DSTATE <= MSGSTAT; -- Bcast TX Data Transfer LATCHSW <= '1'; end case; Coverage is Actually 100% There is a trade off here between coverage and readability In the first example its understandable what the 111 condition does, no so in the second ? They synthesize to the same circuit
CoverageFrom 99% to 100% • Getting the last 1% of coverage is time consuming • Especially in designs that include lots of error detection and recovery logic • Often in attempting to do this you will by accident force the design into an unexpected state that highlights an issue • Core1553BRT • In going from 99% to 100% we discovered that when we are transmitting and verifying the loop backed data - if the last word of a burst (Data or Status) contained all zeros and a Manchester error was introduced by the transceiver then we did not detect the error • We did detect just Manchester errors • We did detect just data errors • Additional tests now added to test benches to verify this in all future releases.
Safe State Machines • Although space FPGA’s incorporate redundancy though triple flip flops and voting, RTL code also needs to be safe • Commercial FPGA synthesis tools can generate ‘unsafe’ state machines • Optimized for small area or speed • One - hot state machines by default • Some have option of Safe State machines • Make sure all illegal states are covered • BUT HOW DO YOU PROVE IT IS SAFE? • For example, beware of hidden illegal conditions in the code like counters that count to a value and reset • What happens if the count toggles to a value > the reset condition? • In reality - design redundancy in and test it • Fix the state encoding • Synthesis tool independent • Make test benches to force illegal states
Hard Code states using bit_vectors Make sure all 2**N values specified In the Case statement Do not use others clause, list all states. Simulator will warn if you’ve forgotten any states Using bit_vector means that you need not worry about the ‘X’ and ‘Z’ branches in the case In Illegal States Clear critical signals e.g. Transmit enable Send FSM back to IDLE state Create a FSM_ERROR output One for each state machine Synthesis Make sure state registers are not duplicated, if they are you may not detect the illegal state Make sure any FSM optimization in the Synthesis tool is disabled -- RT Data word transfers signals -- Hard encoded for safe state machines signal DSTATE : bit_vector(3 downto 0); constant IDLE : bit_vector(3 downto 0) := "0000"; ….. constant ALLDONE : bit_vector(3 downto 0) := "1100"; constant UNUSED0 : bit_vector(3 downto 0) := "1101"; constant UNUSED1 : bit_vector(3 downto 0) := "1110"; constant UNUSED2 : bit_vector(3 downto 0) := "1111"; attribute syn_preserve of DSTATE : signal is true; attribute syn_encoding of DSTATE : signal is "orginal"; attribute syn_replicate of DSTATE : signal is false; Case DSTATE is …. when UNUSED0 | UNUSED1 | UNUSED2 => FSMD_ERROR <= '1'; DSTATE <= IDLE; -- clear critical controls BENDREQ <= '0'; ENC_STB <= '0'; DBUSY <= '0'; CMDDONE <= '0'; end case; Safe State MachinesDesign
How do you prove that the resultant netlist includes the safe state machine ? Identify the STATE registers in the netlist. Using the simulator force the state register to all states Reset core after each test to prevent side effects of forcing states Verify that the FSM_ERROR output is asserted printf("Testing Main State Machine - 16 states, 13-15 Illegal"); for state in 0 to 15 loop resetcore(RSTNOW,CLK16); printf(" Testing State %d : Restart by typing : do forcefsm.do 0 %04b",fmt(state)&fmt(state)); assert FALSE report "Ignore ERROR, restart simulation ^^^^^^" severity ERROR ; -- before restarting state machine is forced to the illegal state wait for 1 us; -- allow time for tcl script to force error check_state(state, (state>=13), status, ERR); end loop; resetcore(RSTNOW,CLK16); ---------------------------------------------------------------------------------------------------------------------------------------- force -deposit sim:/tbench/u12__0/uut1/DSTATE_3/Q $state_bit3 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_2/Q $state_bit2 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_1/Q $state_bit1 0 force -deposit sim:/tbench/u12__0/uut1/DSTATE_0/Q $state_bit0 0 Safe State MachinesTesting
Safe State Machines Results and Memory Protection • Has an effect on gate count and performance compared to normal implementation flows • Causes a 7% increase in gate count • Causes a 1% drop in performance • But still fits in device and meets performance requirements • Memory Usage • Make sure that EDAC memory is used, • Consider about scrub rates, etc. • Avoid memory because it is more easily upset by radiation
What is a ‘Babbling’ Transmitter? • Requirements • All RT’s are required to monitor outputs to detect if they are babbling and if so stop, referred to as a Fail Safe Timer • If detected by the bus controller it sends a message to the terminal using the other bus to stop the babbling transmitter • How can a RT babble? • Two errors (failures) have to occur within the terminal: • The logic that controlled the enable signal to the transmitter has to fail, and second, • The terminal's fail-safe timer (maximum of 800.0 microseconds) has to have failed. • Some designs use a digital counter for the fail-safe timer, a single failure in a clock line could cause a babbling transmitter
Transmit Timeout MIL-STD-1553 requires that a separate circuit monitors the transmissions and stops the transmitter if a babbling transmission is detected i.e. greater than 33 words transmitted Even though the protocol state machines may never theoretically cause this, it is a requirement to include this logic Separate circuit that monitors the Transmit enables and detects if active for greater than 680us If triggers, then enable to external transceiver is disabled and error condition generated. process(CLKSPD) begin case CLKSPD is when "00" => HWTIMVALUE <= "0100001"; -- 12MHz when "01" => HWTIMVALUE <= "0101011"; -- 16MHz when "10" => HWTIMVALUE <= "0110110"; -- 20MHz when others => HWTIMVALUE <= "1000001"; -- 24MHz end case; end process; PTXTTIM: process(CLK,RSTn) variable TXT_TIMER : std_logic_vector(14 downto 0); begin if RSTn='0' then TXT_TIMER := ( others => '0'); TXT_ERROR <= '0'; elsif CLK'event and CLK='1' then TXT_ERROR <= '0'; if TXT_TXBUSY='1' then TXT_TIMER := TXT_TIMER + 1; else TXT_TIMER := ( others => '0'); end if; if TXT_TIMER(14 downto 8) = HWTIMVALUE then TXT_ERROR <= '1'; end if; end if; end process; Avoid Babbling Transmitter Design
Babbling Transmitter Testing • How do you test this ? • Protocol State machines do not do this in normal operation • Create test mode input - TESTTXTTOUT • Modifies the protocol state machine • When high, causes >32 data words to be transmitted • Test benches set this and verify that the core detects the babbling transmitter • Allows testing, but does this create an additional failure mechanism ? • May be pulled inactive by an external resistor, if this was to fail then the core would fail • External Input can be disabled • Can remove logic from core to prevent this error condition • Synthesis will remove the error injection logic.
Another considerationOver Sampling • Some systems can be improved by over-sampling input streams • Then filtering or voting • 1553B • Already has well protected data stream • Manchester coding • “00” and “11” patterns are error conditions • Parity on data words • Core1553BRT • Samples incoming data at 6X, 8X 10X or 12X the base 2MHZ rate • Required for clock extraction and ability to handle 1553B jitter and noise requirements • Additional over sampling is not implemented at present because • As is, Core1553BRT passes all requirements required by the 1553B RT test • Would require higher speed clocks • Higher power consumption • Larger device • Would require a major redesign • Adds additional risk with a major redesign
RTCA/DO-254Design Assurance Guidance for Electronic HW • Advisory Circular 20-152 • Ratified 6/30/05, calls for DO-254 compliance for design assurance levels A, B or C • DO-254 standard originally developed in 2000 • DO-254 is a hardware standard, IP is hardware • There are many misunderstandings about this standard • So far, there is no precedence for DO-254 certified IP • We are focusing on section 10 by considering to provide Hardware Design Life Cycle Data for relevant cores • What does it require? • A DO-254 development flow in addition to the ISO-certified flow • More documentation • It forces discipline to follow a test plan and document against that plan • PHAC and HAS are important elements • Without this, customers treat our IP as COTS products (section 11)
Lessons Learned • High quality = attention to detail • You cannot do too much verification for IP in severe environments • We found a bug increasing code coverage from 98% to 100% • Have gate reviews backed with data • Document variations from perfect • For example, if code coverage is 99%, understand why • Experience matters • Design • Products • Customers • There needs to be a way to add objectivity to verification • Against a tester • By a third party • Have another person review the code or perform verification • You can always improve • Core originally tested at multiple speeds, but not multiple languages • DO-254 adds additional discipline to the development process
Conclusion • Pre-built and verified IP can reduce risk, if • A structured, robust development process is followed • Phase-gate process, even if simplified • Additional concerns for severe environments are considered • Safe state machines • Redundant check for babbling • Verification and validation is demonstrated • Code coverage near 100% • Certification of demonstration board design • Deliverables and documentation ease use • Helps integration and design re-use • Many customers prove the core in a variety of environments • More than one company can do on its own
PCI bus to instrument panel 1553 bus to rest of craft Shared Memory (on or off-chip) PCI 1553 RT ASM51 MCU(8051) Memory Data Bus Special Function Register Bus SerialChannel Prog.I/O Synchronous Serial Channel (SDLC) Asynchronous Serial Channel (UART) RemoteMonitor SensorModule Data Transfer Port Avionics Control Port ConclusionBlock-based Design Enables Development Spacecraft I/O Board Example