290 likes | 439 Views
Results of Single-Event Effects Testing of Advanced Networks. S. Buchner, J. Howard, C. Seidleck, P. Marshall, M. Carts, H. Kim, K. LaBel, R. Stattel, C. Rogers and T. Irwin. NASA Electronic Parts and Packaging (NEPP) Program’s Electronic Radiation Characterization (ERC) Project
E N D
Results of Single-Event Effects Testing of Advanced Networks S. Buchner, J. Howard, C. Seidleck, P. Marshall, M. Carts, H. Kim, K. LaBel, R. Stattel, C. Rogers and T. Irwin. • NASA Electronic Parts and Packaging (NEPP) Program’s Electronic Radiation Characterization (ERC) Project • NASA Remote Experimentation and Exploration Program • DTRA RHM MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Introduction • Future space missions will have to process vast amounts of data (supercomputing levels) in a radiation environment. • The data will have to be transferred between instruments and computers or between computers before downlinking. • High performance networks that consist of COTs parts that are likely to be sensitive to SEEs will be needed. Computer #2 Instrument Computer #1 Switch MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Introduction • Using protons and heavy ions at accelerators, we have performed Single-Event Effects testing of: • AD8151 Crosspoint switch • Myrinet Crossbar switch • FireWire (IEEE1394) serial bus MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Introduction • Ionizing particles caused: • Single-Event Upset (SEU or bit error) • Single-Event Functional Interrupt (SEFI) • Single-Event Latchup (SEL) MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
AD8151 Digital Crosspoint Switch • Operates at 3.2 Gbps • Low power • 33 Inputs • 17 Outputs • Tested vs: • Data rate • # of paths • Ion LET MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
AD8151 Digital Crosspoint Switch . Bipolar switch MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Myrinet • Network system for a cluster architecture consists of two pieces: • network switching (XBar16) • network interface hardware (Network Interface Card (NIC)) • A prime control processor talks to all the individual nodes of the cluster via the Myrinet network. • Messages transported across Myrinet as packets. A packet consists of: • Header • Payload • Trailer MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Myrinet MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
IEEE 1394 (FireWire) • Specifications for backplane and cable • Cable contains 6 wires with maximum length of 4.5 meters. • Cable minimizes wire harness, provides power, reduces cross talk. • More than one node can access the bus at a time. • Inexpensive, available, reliable - COTS. • Scaleable 100, 200, 400 MHz ( 800, 1600 and 3200 MHz). • Data transmitted in packets with “Header,” “Data,” and “Checksum” • Two modes - Isochronous and Asynchronous. • 256 Terabytes of addressable memory-mapped space (48 bits per node, 63 nodes per bus segment and 1024 bus segments). • Plug and play. MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
0 Physical ID 2 DV Monitor 2 1 0 0 Physical ID 4 Physical ID 3 PC Digital VCR 1 2 0 0 0 0 Physical ID 1 Physical ID 6 Physical ID 0 Physical ID 5 Fixed disk drive Fixed disk drive DV Camcorder D Camera 1 1 1 1 2 2 2 2 IEEE 1394 (FireWire) MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Bus manager Cycle Ctl Packet Xtr Packet Rcv IEEE 1394 (FireWire) • Tested as a function of: • Mode • Ion LET • Layer MICROPROCESSOR TRANSACTION LAYER (Read, Write, Lock) Isoc. Resource Manager LINK LAYER Node Controller PHYSICAL LAYER Serial Bus Management Arbitration En/Decode Data Resync Connectors Bus Initial. Sig. Levels 1 MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
IEEE 1394 (FireWire) . PHYSICAL LINK MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
General SEE Results • SEEs are remarkably similar in all three network components: • HARD ERRORS • Single-event functional interrupts (SEFIs) requiring reprogramming or rebooting to restart communications. • SOFT ERRORS • Bit errors (AD8151), • SEUs (FireWire) • Lost packets (Myrinet) . MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
AD8151 Cross Point Switch MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing - AD8151 • Tested with BER tester • Observed: • Single bit errors • Bursts of errors • Loss of synchronization MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing - AD8151 • Tested with BER tester • Observed: • Single bit errors • Bursts of errors • Loss of synchronization MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing - AD8151 • Tested with BER tester • Observed: • Single bit errors • Bursts of errors • Loss of synchronization MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing - AD8151 • Tested with BER tester • Observed: • Single bit errors • Bursts of errors • Loss of synchronization MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Myrinet Cross Bar Switch MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing - Myrinet • Myrinet tested via NICs and in-house software • SEFI events occur when all data packets are lost - requires power cycle to recover • Data packets are dropped whenever Checksum is in error. • SEU events are seen as either single packet loss or multiple packets in succession are lost, but normal operation recovers without any intervention. MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing – SEFIs – Myrinet • NIC results (table) show about the same cross section, independent of part hit. • Xbar results (graph) show cross section difference between front and back plane switches. MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing – SEUs – Myrinet • NIC results (table) show a variety of cross sections for the different parts. • Xbar results (graph) show single packet loss cross section difference between front and back plane switches. MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
IEEE 1394 FireWire MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing – SEFIs – IEEE 1394 (FireWire) • 9 different SEFIs categorized according to how to restart communications. • Observed soft errors that did not disrupt communications MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing – SEFIs – IEEE 1394 (FireWire) • 9 different SEFIs categorized according to how to restart communications. • Observed soft errors that did not disrupt communications MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Results of SEE Testing – SEUs – IEEE 1394 (FireWire) • 9 different SEFIs categorized according to how to restart communications. • Observed soft errors that did not disrupt communications MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner
Summary and Implications • High performance networks contain COTS parts that are sensitive to SEE. • Observed SEEs common to all three “networks”: • SEFIs following which reboot or reprogram was needed • SEUs that lead to a short drop out of valid data • SEL in the NS FireWire but not in others • SEE mitigation steps will be needed for use in space MAPLD2002, Laurel MD 11th September, 2002. Presented by S. Buchner