120 likes | 274 Views
Adaptive Fault Tolerant Computer (AFTC) NASA EPO (Education Public Outreach) - Environmentally Adaptive Fault Tolerant Computer (EAFTC) / Dependable Multiprocessor DM) Animation Dr. John R. Samson, Jr. Honeywell Aerospace, Defense & Space Clearwater, FL john.r.samson@honeywell.com.
E N D
Adaptive Fault Tolerant Computer (AFTC) NASA EPO (Education Public Outreach) - Environmentally Adaptive Fault Tolerant Computer (EAFTC)/ Dependable Multiprocessor DM) Animation Dr. John R. Samson, Jr. Honeywell Aerospace, Defense & Space Clearwater, FL john.r.samson@honeywell.com Click when you are ready to start the presentation
- These limitations are due, in part, to mechanical (launch vibrationand stage separation shock), thermal limited onboard cooling capability), and the radiation environment experienced in space 1 + 1 2 Spacecraft Computer Narration: - Computers in space can, and do, perform many useful functions. - Current computing speed and capability in space is limited by the physical environment of space.
1 + 1 3 1 + 1 2 Spacecraft Computer Narration: - Occasionally, a cosmic event, e.g., a solar flare or a pulsating star, generates a burst of radiation which travels through space and interacts with the electronics in the spacecraft computer causing it to perform an incorrect computation. - While these interactions generally cause limited, temporary effects in electronics, such an incorrect computation could be a minor nuisance, but it also could create a life-threatening situation for an astronaut onboard the spacecraft. - Because these errors are temporary and don’t cause any permanent damage to the Spacecraft Computer or any of its components, they are known as “soft” errors. - In order to minimize the occurrence of such erroneous computations, today’s spacecraft computers are specially designed to be resistant to space radiation. - This special radiation design process tends to make the space computers large and slow.
1 + 1 2 Spacecraft Computer Narration: - Space scientists and mission planners would like to migrate the processing power of today’s PCs, i.e., Personal Computers, and supercomputers to space, giving them more capability for science experiments and autonomous missions. - NASA’s New Millennium Program Space Technology 8 (ST8) project is satisfying that desire with the development of the Adaptive Fault Tolerant Computer (AFTC), which can respond in real-time to radiation events, to eliminate or minimize the effects of radiation on the space computers.
1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #4 Proc. #2 Proc. #1 Radiation Environment Sensor AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - The Adaptive Fault Tolerant Computer (AFTC) consists of a cluster of high performance COTS (Commercial-Off-The-Shelf) data processors with PC-like performance, an AFTC Controller which is one of those slower radiation hardened space processors which is virtually totally immune to the effects of space radiation, and a radiation environment sensor and alert generator. - The high performance data processors work cooperatively to provide the space scientist and mission planners with the highest possible computationally capability with the highest possible availability. - The AFTC Controller hosts the software that controls the cluster of data processors both in normal operation and in the presence of radiation induced errors in the high performance data processors. - The radiation environment sensor continually senses the radiation environment and informs the AFTC Controller when action needs to be taken to mitigate the effects of the current radiation environment.
1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #4 Proc. #2 Proc. #1 Radiation Environment Sensor AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - In one AFTC operational scenario, at some point in time, a cosmic event occurs which emits a burst of radiation. Such a cosmic event may be recent, or it may have occurred some time in the past with a star that is light years away from the earth.
1 + 1 3 1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #3 Proc. #4 Proc. #2 Proc. #4 Proc. #1 Proc. #2 Proc. #1 Radiation Environment Sensor AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - The radiation from the cosmic event eventually reaches the AFTC and may cause a computational error in one of the high performance data processing nodes. - The Advanced Fault Tolerant Computer Controller senses the problem with the failed high performance data processor and temporarily halts the processing being performed by the cluster.
1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #3 Proc. #4 Proc. #4 Proc. #2 Proc. #2 Proc. #1 Proc. #1 Radiation Environment Sensor AFTC Controller Proc. #8 Proc. #8 Proc. #5 Proc. #5 Proc. #6 Proc. #6 Proc. #7 Proc. #7 Data Processors Narration: - Since the problem caused by the radiation event is most likely a temporary “soft” error, i.e., a problem which causes no permanent damage to any of the components in the AFTC, the AFTC Controller restarts the failed data processor along with the application on all of the data processors in the cluster. - The AFTC returns to full computational capability - The time required for the AFTC Controller to detect the problem and restart the application is very small, resulting in high availability to the mission.
1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #4 Proc. #2 Proc. #1 Radiation Environment Sensor Radiation Environment Sensor AFTC Controller AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - In another scenario, the onboard radiation environment sensor detects a change in the environment that warrants protective action and sends an alert to the AFTC Controller.
1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #4 Proc. #1 Proc. #2 Radiation Environment Sensor AFTC Controller AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - The AFTC Controller temporarily halts the cluster processing and switches the algorithms being performed by the cluster to a set of algorithms which performs the same computation, but are more resilient to the occurrence of soft errors. - Because the radiation resistant algorithms exhibit more overhead, the computations may not run as fast. - The time required for the AFTC Controller to respond to the alert level and restart the application is very small, again resulting in high availability of the system while maximizing the computational capability to the mission.
1 + 1 2 1 + 1 2 1 + 1 2 1 + 1 2 Adaptive Fault Tolerant Computer (AFTC) Proc. #3 Proc. #4 Proc. #1 Proc. #2 Proc. #3 Proc. #4 Proc. #1 Proc. #2 Radiation Environment Sensor Radiation Environment Sensor AFTC Controller AFTC Controller AFTC Controller Proc. #8 Proc. #5 Proc. #6 Proc. #7 Proc. #8 Proc. #5 Proc. #6 Proc. #7 Data Processors Narration: - When the radiation alert level returns to the original level, the AFTC returns to full computational capability. - Such an adaptive change in the algorithm, configuration, or mode of operation of the AFTC doesn’t necessarily need to be invoked by a sensed change in the radiation environment. Such a change in the AFTC can be invoked by a change in the operational mode of the spacecraft. - For example, the AFTC can adapt to differences in the computational requirements for different modes of spacecraft operation: launch, rendezvous and docking, trans-lunar or trans-Martian coasting, landing, roving, or remote habitat monitoring. - The AFTC technology advance is a single computer which can adapt to meet the needs of future missions and the needs of a wide variety of operating modes within a single mission.
25 x 79 1975 Spacecraft Computer Narration: - NASA, JPL, and Honeywell Defense and Space Systems in Clearwater, Florida, the prime contractor for the NMP ST8 AFTC experiment, are developing the technologies needed to migrate COTS high performance data processing capability to space, offering one to two orders of magnitude (10 times to 100 times) improvement of processing capability over what can be flown in space today. This will increase both the amount of science that can be done on a given platform and increase the autonomy needed for future landers, rovers, and remote science missions.