270 likes | 377 Views
Test Methodology for Characterizing the SEE Sensitivity of a Commercial IEEE 1394 Serial Bus (FireWire). Christina Seidleck Raytheon ITSS Lanham, MD. Stephen Buchner QSS Landover, MD. Hak Kim Jackson & Tull Wahsington, DC. P.W. Marshall Consultant Brookreal, VA. Kenneth LaBel
E N D
Test Methodology for Characterizing the SEE Sensitivity of a Commercial IEEE 1394 Serial Bus (FireWire) Christina Seidleck Raytheon ITSS Lanham, MD Stephen Buchner QSS Landover, MD Hak Kim Jackson & Tull Wahsington, DC P.W. Marshall Consultant Brookreal, VA Kenneth LaBel NASA GSFC Greenbelt, MD Abstract Introduction Typical PC-based Imple- mentation The Protocol Layers What Was Tested? Modes of Operation Packet-Based Transactions Test Part Function and Acc Radiation Characterization Radiation Test Hardware Setup Radiation Test Hardware Diagram Radiation Test Software Software Flow for Asynchronous Mode Software Flow for Isochronous Mode Two Main Types of Error Observed Example of a Soft Error Example of a Hard Error SEFIs Categorized by Steps Required to Start Communications Results LLC Asynchronous Mode Results PHY Asynchronous Mode Results LLC Irradiated Hard Errors Results LLC Irradiated Soft Errors Results PHY Irradiated Hard Errors Conclusions References
Abstract The Single Event Effect (SEE) responses of two FireWire serial buses based on the IEEE 1394 standard were tested with heavy ions and protons. A unique approach to testing and categorizing the SEEs is presented.
Introduction and Background • IEEE 1394 is a formal description of the architecture called FireWire originally developed by Apple Computer. FireWire is an advanced serial bus used for connecting numerous high performance devices together. • Why FireWire? • Less Expensive Alternative to Parallel Buses - a variety of devices can connect directly to a single serial bus, ~4.5 meters allowed between devices (cable implementation) • Backplane and Cable Implementations Supported -only the cable implementation is presented here • Plug and Play Support - supports automatic configuration of devices without intervention from the host system • Scalable Performance - support of transfer rates of 400Mb/s, 200Mb/s, and 100Mb/s • Attachment Of Up To 63 Nodes On A Single Serial Bus • Supports Two Transmission Modes: Isochronous and Asynchronous • Peer to Peer Transfers - data can be transferred between individual nodes without intervention from the host system
PC Digital VCR CD-ROM Digital Camera Laser Printer Typical PC-Based 1394 Implementation The serial bus allows a variety of high-speed peripheral devices to be attached and supported 1394 Cable
The Protocol Layers of the 1394 Software Driver Bus Management Interface Asynchronous Transfer Interface Isochronous Transfer Interface Bus Manager Transaction Layer Isochronous Resource Manager Cycle Master Link Layer Node Controller Physical Layer Serial Bus Management Layer Serial Bus
What Was Tested? Physical Layer PHY • 16 Internal Registers Link Layer LLC • FIFOs • PCI Registers • OHCI Registers Commercial 1394 Development Board Development Board Vendor LLC Part Number Lot Date Code PHY Part Number Lot Date Code CA-OAAO45T TSBKOHCI403 TI TSB12LV26PZT TSB41AB3PFP OCC4RTT VS052ABC4 NSC CS4210VJG CS4103VHG VS052ABC4 CS4210A-DK
Modes of Operation Asynchronous Isochronous • Data transfers target a particular node based on a unique address (one-to-one transfers) • Data transfers do not require a constant data rate • All data transfers of this type are guaranteed 20% (min.) of overall bus bandwidth • Verifies data delivery with acknowledge, CRC checks and response codes • Supports data retransmits • Used when data integrity is required/critical • Data transfers target nodes based on a channel number of the transfer (like a broadcast, one-to-many) • Receiving nodes “listen” to channel numbers to receive data packets • No error detection or retransmits • Uses constant bandwidth which is requested from the isochronous resource manager • 80% of bus bandwidth used for isochronous data transfers • Used for time critical, error-tolerant data transfers
Packet-Based Transactions All transactions are transmitted over the bus in a packetized form. Different types of packets are defined for asynchronous and isochronous modes. • Asynchronous Packets: • Reads • Writes • Locks Sample Write Request Packet Sample Acknowledge Packet Destination Address Source ID Data Label CRC Acknowledge Code Parity Transaction Type • Isochronous Packets • Stream Data Sample Stream Packet Channel Number Transaction Type Data CRC Acknowledge
Test Part Function and Accessibility Link Layer LLC Physical Layer PHY • Functions • Forms packets for transmission • Provides address decoding for incoming asynchronous packets • Provides channel number decoding for incoming isochronous • packets • Performs CRC error checking • Functions • Electrical and mechanical interface for transmission and • reception of packets transferred across the bus • Arbitration - ensures only one node at a time transmits • on the bus • Registers Monitored • FIFOs • 42 out of a possible 102 Open Host Controller Interface (OHCI) • registers • 21 out of a possible 22 PCI registers • Registers Monitored • Due to the volatility of the 16 registers on the PHY, none • was monitored
Radiation Characterization • Protons (TRIUMF) and heavy ions (BNL and TAMU) used to test parts from Texas Instruments and National Semiconductor. • Irradiate PHY and LINK chips separately on DUT board. • National Semiconductor part underwent destructive latchup when irradiated with ions having a LET = 27 MeV.cm2/mg. Therefore, did a full characterization on the TI parts only.
Radiation Test Hardware Setup • Two personal computers (PCs) with PCI slots were used in the test • Each had an IEEE1394 board • One of the PCs with the devices-under-test (DUTs) was placed in the beam line while the other was placed • in a remote area • The two PCs were connected by their 1394 interface via a 10 ft 1394 cable for data communication • A PCI bus isolation card was placed between the DUT board and its host PC • This card enables current consumption readings from the +5V supply to the DUT board from the host PC • via the PCI interface • A HP34401A Digital Multi-Meter (DMM) was used to read and record this supply current
Radiation Test Hardware Diagram Target Area 10ft. 1394 Board 1394 DUT PHY LLC Beam 1394 Cable Remote PC (CTRL) PCI Bus Isolation Card HP34401A DMM Monitor, Keyboard, Mouse Host PC Monitor, Keyboard, Mouse Laptop
Radiation Test Software • Custom device driver software was developed using C++ and Jungo’s WinDriver targeted for a PC • Windows NT 4.0 platform • Software was an interrupt driven program which established continuous communications between • DUT and CTRL at 100 Mbps • For SEL testing at BNL no registers were monitored • For proton testing at TRIUMF only asynchronous mode was implemented • For heavy ion testing at TAMU both asynchronous and isochronous modes were implemented
Setup Setup • Lockdown memory • Set node ID • Set Delay • Lockdown memory • Set node ID • Set Delay • Enable receive buffers • Turn on interrupts • Enable receive buffers • Turn on interrupts Software Flow for Asynchronous Mode Wait for interrupt Determine test type LLC or PHY Form data request packet and send to DUT Register data request packet • Request Buffer Response Buffer Register data response packet • Determine test type requested • Poll LLC or PHY registers • Form Data Response Packet • Compare Data • Log Errors • Continue Test Loop CTRLR DUT
Software Flow for Isochronous Mode • Setup • Lockdown memory • Enable ARRS for bus reset packets • Turn on Isoch receive buffer • Turn on interrupts • Setup • Lockdown memory • Enable ARRS for bus reset packets • Turn on Isoch receive buffer • Turn on interrupts Register data solicit stream packet Form register data solicit packet Wait for interrupt Receive Buffer Receive Buffer Register data stream packet • Poll LLC registers • Build data packet • Compare register values • Log errors • Continue loop CTRLR DUT
Two Main Type of Errors Observed • Soft Errors • Bit flips logged by software which occurred in registers, FIFOs or data that did not disrupt communications • between the DUT and CTRLR during the test run • Hard Errors or SEFIs • Errors occurring in registers which halted communications between the DUT and CTRLR during the • test run • Errors of this type required a series of software and/or operator steps in order to recover communications • SEFIs were further classified by the steps taken to re-establish communications
25 24 23 20 13 12 11 10 8 7 6 5 4 3 2 31 30 29 28 27 26 22 21 19 18 17 16 15 14 9 1 0 Bit Example of a Soft Error Asynchronous Request Filter Low Register on the LLC Enables reception of asynchronous request packets on a per-node basis (handles lower node IDs). When an asynchronous request packet is received, the source node ID is examined. If the bit corresponding to the node ID is not set in this register, then the packet is not acknowledged and the request is not queued. In this example, the register is setup such that only asynchronous request packets from nodes 0 and 1 will be accepted. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 If bit 26 transitions to a 1, this incorrectly would enable asynchronous request packets from node 26 to be accepted.
25 24 23 20 13 12 11 10 8 7 6 5 4 3 2 31 30 29 28 27 26 22 21 19 18 17 16 15 14 9 1 0 Bit Example of a Hard Error (SEFI) Host Controller Control Register on the LLC Provides flags for controlling the TSB12LV26 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Reserved Reserved Bit 17 is the Link Enable bit. This bit is set to 1 when the system is ready to begin operation. If an upset cleared it to 0, the TSB12LV26 would be logically and immediately disconnected from the 1394 bus. No packets would be received or transmitted. Communications would be halted between the CTRLR and DUT.
SEFIs Categorized By Steps Required to Start Communications Step Action 1 SEU test loop is restarted on the CTRLR, i.e., a packet is sent to DUT requesting register information 2 Software bus reset. Force CTRLR to be root, initiate bus reset in the PHY, reset node on LLC. Restore registers and flush FIFOs. Set bus Ops, IRMC, CMC, ISC, configuration ROM, enable transmit and receive. Implies step 1. 3 Reload software application. This refreshes the lockdown memory region shared by hardware and software. Implies steps 2,1. 4 Able to verify CTRLR is sending register data solicit packets to DUT. Able to verify that DUT receives the packets and sends data response packet to CTRLR. CTRLR cannot see response packet from DUT. Power cycle the CTRLR. Implies steps 3,2,1. 5 Disconnect/reconnect the 1394 cable. This causes hard bus reset, tree ID process. 6 Step 5, followed by steps 3, 2, 1. 7 Step 6 followed by cold rebooting DUT followed by steps 3, 2, 1. 8 Cold reboot DUT followed by steps 3, 2, 1. 9 Step 5 followed by step 8. 10 Reboot CTRLR followed by steps 3, 2, 1. 11 Reboot both CTRLR and DUT PCs followed by steps 3, 2, 1.
Results - LLC Running Asynchronous Mode 3 39.2 51.6 59.6 73 4.2 8.39 11.9 27.7 ERRORS IN LLC RUNNINGASYNCHRONOUS MODE “Soft” Errors x 1 No errors observed current jumped from 18mA >44mA 0 0 0 0 0 0 0 0 x 2 Register error, self corrected and no change in current 1.3E-4 1.0E-5 4.6E-5 2.5E-5 8.8E-5 3.1E-4 2.4E-4 1.3E-4 3 Register error, self corrected, current jumped 18mA >44mA x 0 0 0 0 0 0 0 0 “Hard” Errors 4 0 x Restart communications from CTRLR 0 0 8.3E-7 0 0 6.8E-6 0 5 x Software bus reset current junped from 18mA to 44mA 0 0 0 0 0 2.6E-5 0 0 x 6 Reset CTRLR and/or DUT software 0 0 0 0 4.3E-6 8.3E-7 2.3E-6 0 7 x Software bus reset and reset software on DUT and CTRLR 0 0 4.3E-6 4.2E-7 0 1.3E-5 0 0 x 8 CTRLR sends packet, does not listen cold reboot CTRLR 0 0 2.3E-6 0 0 0 0 0 9 x Disconnect/reconnect cable (hard bus reset) 0 0 0 0 0 0 0 0 10 x Disconnect/reconnect cable, reload bus DUT software 6.8E-6 0 0 0 0 0 0 0 x 11 Reset cable and cold reboot DUT 0 0 0 0 2.3E-6 5.7E-5 0 0 x 12 Cold reboot DUT after lockup, but no change in current 0 0 8.3E-7 4.5E-6 2.6E-5 1.4E-5 0 0 13 x Cold reboot DUT after lockup, current jump 18mA to 44mA 0 2.2E-6 1.7E-6 4.5E-6 0 0 0 1.4E-5 0 0 0 x 14 Reset cable, reboot DUT and software, delta I=0 0 0 0 0 0 15 0 0 x Reset cable, reboot DUT and software: 18- >44mA 0 0 0 0 0 0 x 0 0 0 0 0 16 Reboot CTRLR, reload software on bus, DUT and CTRLR 0 0 0 x 0 0 4.3E-6 0 0 0 6.8E-6 0 17 Reboot both computers, reset all software
Results - PHY Running Asynchronous Mode 3 39.2 51.6 59.6 73 4.2 8.39 11.9 27.7 ERRORS IN PHY RUNNINGASYNCHRONOUS MODE “Soft” Errors x x x 1 No errors observed current jumped from 18mA >44mA 0 0 0 0 0 0 x x 2 Register error, self corrected and no change in current x 0 0 0 0 0 0 3 Register error, self corrected, current jumped 18mA >44mA x x x 0 0 0 0 0 0 “Hard” Errors x 4 x Restart communications from CTRLR 0 0 x 0 0 1.0E-4 6.4E-5 5 x Software bus reset current junped from 18mA to 44mA x x 9.1E-6 0 0 0 0 0 x x 6 Reset CTRLR and/or DUT software x 0 0 0 0 0 0 7 x Software bus reset and reset software on DUT and CTRLR x x 0 0 0 0 0 0 x 8 CTRLR sends packet, does not listen cold reboot CTRLR x x 0 0 0 0 0 0 9 x Disconnect/reconnect cable (hard bus reset) x x 9.1E-8 0 8.3E-7 0 0 0 10 x Disconnect/reconnect cable, reload bus DUT software 0 x x 0 3.3E-6 0 0 0 x 11 Reset cable and cold reboot DUT x x 0 0 0 0 0 0 x 12 Cold reboot DUT after lockup, but no change in current x x 0 0 0 2.0E-4 0 0 13 x Cold reboot DUT after lockup, current junp 18mA to 44mA x x 0 0 0 0 0 0 x x x 14 Reset cable, reboot DUT and software, delta I=0 0 0 0 0 0 0 15 0 x Reset cable, reboot DUT and software: 18- >44mA 0 x x 0 0 0 0 x 0 x 16 Reboot CTRLR, reload software on bus, DUT and CTRLR 0 x 0 0 0 0 x x x 17 Reboot both computers, reset all software 0 0 2.5E-6 3.6E-5 2.0E-4 2.6E-4
Conclusions • NSC part exhibited destructive latchup at LET=27 MeV.cm2/mg • TI part exhibited both SEUs (soft errors) and SEFIs (hard errors) • At low LETs, the errors are mostly soft errors • The presence of SEFIs resulting in rebooting of the system makes this part problematic for space usage. • power cycling may be required • An improved test would involve: • automatic reboot • another device
References Anderson, Don and Mindshare, Inc. FireWire System Architecture. Addison-Wesley:Reading Massachusetts, 1999. 1394 Open Host Controller Interface Specification. Release 1.1, January, 2000. S. Buchner, et al. Radiation Testing of the 1394 FireWire. Presentation SEU Symposium, Los Angeles. April, 2002. Sponsors NEPP NRL/NPOES Special thanks to Kent Larson and Mike Worcester of Boeing