1 / 24

Review of a Mission-Critical, Digital System for an Air force Project

Review of a Mission-Critical, Digital System for an Air force Project. Rich Katz 1 , Rod Barto 1 , and Kevin Hames 2 1 NASA Office of Logic Design 2 NASA Johnson Space Center. Introduction.

elwyn
Download Presentation

Review of a Mission-Critical, Digital System for an Air force Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review of aMission-Critical, Digital Systemfor an Air force Project Rich Katz 1, Rod Barto 1, and Kevin Hames 2 1 NASA Office of Logic Design 2 NASA Johnson Space Center Page 1

  2. Introduction • January 2003: Air Force request to the NASA Office of Logic Design to independently assess electronics design. • Rapid sampled assessment performed • Review Subject: Safety of critical missile electronics containing an EEPROM-based Programmable Logic Device (PLD) Page 2

  3. Power Inputs EEPROM-based PLD Rocket Motor Clock … “Stuff” JTAG Control Simplified Block Diagram All Lines High = “Bad” Is it safe? Note: The PLD and many other devices are consumer-grade COTS. Page 3

  4. Safety Criteria and Assessment • No Single Point Failure Requirement • Requirement: Probability (Mishap) < 10-6 • Air Force Criteria • Contractor calculated reliability orders of magnitude better than requirement • Contractor could not show work – multiple numbers in different places. • Did not account for all single point and common mode failures, as will be shown • Safety has issues Page 4

  5. Some Technical Areas Examined • Testability and JTAG • Power Supply • Timing Analysis • Finite State Machines • Proper Termination of Device Pins • Device Configuration Retention • Quality and Manufacturer’s position • Synchronization and Metastable States Page 5

  6. IEEE JTAG 1149.1 Interface TRST* Line Not Implemented Static: Not “trying” to drive TAP Controller into TEST-LOGIC-RESET state • Both devices here are consumer-grade COTS • 54ABT8996 Is Available but not used • PLD internals not tested by Built-In-Test (BIT) PLD Page 6

  7. I/O Pin Structure • Structure Common to all I/O’s of Interest • Common Mode Failures • “MODE” Line • Instruction Register • TAP Controller • External JTAG Control • TCLK not running with TMS=‘1’ to mitigate Real Data Blocked Q Update Register I/O Element I/O Cell Circuitry JTAG Circuitry Page 7

  8. IEEE JTAG 1149.1TAP Controller and Instruction Register TAP Controller (State Machine) TCK Shift Register is undefined in TEST-LOGIC-RESET State TRST* (optional) Shift CLK Shift Register TDI TDO Reset Chip Control Parallel Latch Latch Page 8

  9. Power Supply • The manufacturer states, in their “Operating Requirements for XXXXX Devices Data Sheet” ... Slower rise times can cause incorrect device initialization and functional failure. • Power rise time requirements are not known to the Project nor are they in the data sheet. Can not be shown that the device will properly initialize and function. Page 9

  10. Timing Analysis • Logic Design described in VHDL and synthesized • Examination of the tool’s output file showed: • ** PROJECT TIMING MESSAGES ** • Warning: Found ripple clock -- warning messages and Report File information on tco, tsu, and fmax may be inaccurate • No obvious ripple counters in the design. Contractor engineers had not examined the output file and could not explain either the apparent presence of a ripple counter or the impact of the warning message, if any, above. Page 10

  11. Counter: Unused States -- SYNCHRONIZATION CONTROLLER ************************************************* sync_ctrl: PROCESS (g_rst_l_pin, g_clk_pin) BEGIN IF (g_rst_l_pin = '0') THEN -- NO GLOBAL PRESET IN PLD ... ELSIF (g_clk_pin'EVENT AND g_clk_pin = '1') THEN -- CLOCK RISING EDGE -- SYNCHRONIZATION COUNTER IF (sync_cnt_rst_l = '0') THEN -- RESET sync_cnt <= 0 ; ELSE -- INCREMENT sync_cnt <= (sync_cnt + 1); END IF; -- SYNCHRONIZATION COUNTER RESET IF (sync_cnt = 48) THEN -- RESET sync_cnt_rst_l <= '0'; ELSE -- DO NOT RESET sync_cnt_rst_l <= '1'; END IF; Unused states not defined Page 12

  12. Proper Termination of Device Pins Page 13

  13. Proper Termination of Device Pins From the data sheet:: During in-system programming, each device's VPP pin must be  connected to the 5.0-V power supply.  During normal device operation, the VPP pin is pulled up internally and can be  connected to the 5.0-V supply or left unconnected. The contractor has had significant number of devices that failed to program (15 out of 250), cause not known. However, the manufacturer states in the data sheet: XXXX EPLDs are fully functionally tested. Complete testing of each programmable EEPROM bit and all logic functionality ensures 100% programming yield. There is no mechanism in the system, as designed, to verify that quantity of charge stored in the EEPROM cells. There is no provision for testing whether the logic configuration will be correct, which requires the correct state of all EEPROM cells.  The functionality and safety of the system, after decades of storage, can not be guaranteed. Page 14

  14. Quality and Manufacturer’s Position XXXXXX's products are not authorized for use as critical components in life support devices or systems without the express written approval of the president of XXXXXX Corporation. As used herein:    1. Life support devices or systems are devices or systems that       (a) are intended for surgical implant into the body or       (b) support or sustain life and whose failure to perform when           properly used in accordance with instructions for use provided           in the labeling can be reasonably expected to result in a           significant injury to the user.    2. A critical component is any component of a life support device or       system whose failure to perform can be reasonably expected to cause       the failure of the life support device or system, or to affect its       safety or effectiveness. Page 15

  15. Other Issues Page 17

  16. Unterminated CMOS Inputs Page 18

  17. Unterminated RS-422 Clock CLOCK DATA Page 19

  18. Transition Time Into PLDReset I/F Circuit – Other inputs similar To PLD and 10 kohm pull-up Page 20

  19. Reset Input: Transition TimeDrilling Down Into Device PLD spec is 40 ns. tR >> 40 ns. Page 21

  20. Technical Lessons [Re-]Learned • Violations of part manufacturers’ specifications • Did not look at tool reports and unable to explain them. • High level design methodology used – abstract models. • Designers not familiar with the state machine encodings used, implication of unused states. • Consumer grade vs. military grade devices • Military grade available in some cases – direct substitution • Military grade available with proper part choices • “Upscreening” issues • Input transition time requirements not met. • Signal and clock terminations not properly implemented • VCC waveform susceptibility Page 22

  21. Factors Contributing to Project Problems • Original contractor group sold; moved to another state • Few original engineers followed and continuity lost • New contractor staff not fully cognizant of design • Worst case analysis not performed • Not a contractual requirement • Contractor processes did not require it internally • No independent analysis Page 23

  22. Factors Contributing Successful ReviewBy The Independent Assessment Team • Safety engineer had issues and concerns; solidly backed by his management. • Project worked to rapidly resolve issues and concerns • Chartered Independent Assessment Team • Established and ensured communications and transfer of data between the design group and IAT • Took a neutral technical position and let the IAT perform its independent assessment using its own methods. • Contractor, Project, Safety, and IAT all had safety and mission success as primary goal. • Consensus rapidly achieved (hours) and efforts turned to improving system. Page 24

  23. Conclusions • Small, seemingly inconsequential design errors can have a major impact on safety and/or mission success • Can not be determined by test • PLD (and other) device mechanisms must be well understood to construct an adequate model for analysis • Designers never saw their “design”; shielded by abstracted models • Hardware components • Logic designed by software that does not know circuit criticality • Configuration storage mechanisms in selected reconfigurable needlessly decreases reliability and system safety • It is noted that the PLD was not intended to be reconfigured in system. Page 25

  24. Conclusions (cont’d) • Root cause of all failures must be determined as rapidly as practical. • “Acceptable yields” do not mean an acceptable product for critical systems. • A number was presented for system safety. • All single point and common mode failures not considered invalidating it. • Unfamiliar with the implication of JTAG 1149.1 maloperation. • Contractor knew how it worked but not how it failed. Page 26

More Related