E N D
On-Board Device and System Architectures with the Version-Threshold Adaptation to Hardware and Software FaultsV.S. Kharchenko1, V.V. Sklyar21National Aerospace University named after N.E. Zhukovsky ”Kharkiv Aviation Institute”,Kharkiv, Ukraine 610702Kharkiv Military University, Kharkiv, Ukraine
Abstract On-board real-time devices and systems (BRDS) reliability could ensured by the ability to perform specified functions under hardware and software faults. It’s possible if to use of multiversity approach based on the introduction of software (and hardware) redundancy and means of checking, diagnosis and reconfiguration. The problem of analysis and synthesis of multiversion BRDS (MBRDS) is connected with research of architecture adaptation algorithms. The method of two-parametric adaptation of MBRDSs is proposed. The adaptive MBRDS models are developed and of reliability of architectures with version-threshold adaptation (VTA) is researched in this paper. The VTA method, technique of assessment and choice of fault-tolerant architectures were used under aerospace processing systems development. It allowed to reduce probability of system failures. The proposed architectures, adaptation methods and reliability models are theoretical foundation of developed tools for simulation, design and testing of fault-tolerant MBRDSs.
I. Introduction A reliability of real-time computer control systems could ensured by the ability to perform all specified functions under hardware and software faults. It’s possible if to use of multiversity approach [1,2]. This approach is based on the introduction of software (and hardware) redundancy and means of checking, diagnosis and reconfiguration [3-5]. The problem of analysis and synthesis of multiversion BRDS is connected, first of all, with research of architecture adaptation algorithms. That is why, it is required to solve the following, in our opinion most important, tasks in order to assess reliability of MBRDSs and to choose optimal architecture. 1.The method of MBRDS adaptation to hardware and software should be proposed. 2.Algorithms of adaptation and architectures of MBRDSs should be developed and systemized. 3.It is necessary to improve the procedure of MBRDSs reliability assessment. 4.It is necessary to research the different fault-tolerant MBRDS architectures and to formulate the recommendations for their choice. The goal of paper is development of adaptive MBRDS models and research of reliability of architectures with version-threshold adaptation. The subject of study is limited to majority MBRDSs, since methods of adaptation could be presented in them wider. Majority multiversion architectures are realized in computer control systems of nuclear power plants, aerospace complexes and other critical applications [6‑9].
II. Methods of Adaptation of Multiversion On‑Board Real-Time Devices and Systems Threshold adaptation (TA), i.e. adaptation by means of a change of majority element response threshold (the majority element passes from the operation scheme «2 out of 3» to the scheme «1 out of 3»), is used in one-version BRDS (OBRDS). Besides, a system with threshold adaptation could be partially adaptive (PA) or fully adaptive (A); let’s call such a system hereinafter just an «adaptive» one. In case of an adaptive system, if one of the channels fails, the system continues to operate in the two-channel mode before one of two channels fails. After the second channel failure the system passes into the one-channel mode. If one of the channels fails under partial adaptation, the system passes immediately into the one-channel mode of operation. Besides, the two-channel mode of operation is intermediate. A more complex variant of adaptation, version-threshold adaptation, could be used in multiversion systems. The adaptation algorithm becomes more complex due to the fact that it is necessary to take into account failures of both hardware and software in every channel. There could be three types of adaptation: VTA1 (both hardware and software adaptation), VTA2 (software adaptation only) and VTA3 (hardware adaptation only).
Method VTA1. In case of VTA1, it is necessary to organize storage of all three software versions in the primary hardware component of each of three channels. VTA1 requires a unified operation system for all three channels. Three variants are possible in case of partial VTA1: 1) full partial – partial adaptation both by software versions and hardware channels; when one channel fails, the system goes into the one-channel mode of operation; 2) hybrid adaptation – partial by one component and full by the other one: 2a) partial adaptation by hardware channels and a full one by software versions (partial VTA3, full VTA2). In case of software version failures, the algorithm of a full version adaptation (VTA2) is realized; in case of hardware channel failures, the algorithm of a partial channel adaptation (partial VTA3) is realized; 2b) full adaptation by hardware channels and partial adaptation by software versions (full VTA3, partial VTA2). In case of the failure of one of the software versions or one of the hardware channels, the algorithm of partial version adaptation (partial VTA2) is realized and the system passes into the one-version mode of operation; the full adaptation (full VTA3) is realized in this mode when hardware channels fail.
Method VTA2. In order to realize this type of adaptation, all three software versions should be stored in the primary hardware component of each of the channels. A unified operation system for all three channels is required in case of VTA2. The channel failure occurs both in case of hardware and software failures. The system allows only a single failure of hardware channels. In case of failure of hardware and software of one of the channels, the system with partial VTA2 passes into the one-version mode of operation. Method VTA3. If the channel fails, the software version is not used any more. In case of partial adaptation the system operates either in the three-channel or one-channel mode. Thus, the system passes into the one-channel mode in case of both hardware and software failure of any channel. The proposed classification of OBRDS and MBRDS architectures is given in the Table 1.
Table 1. The proposed classification of various types of majority MBRDS architectures
III. Analysis and assessment of reliability of adaptive multiversion real-time systems A. The graph-event method of MBRDS reliability assessment The proposed graph-event method of MBRDS reliability assessment includes the following stages. 1.Analysis of the adaptation algorithm and building up the graph of transitions between the system statuses. Algorithm of operation of the fault-tolerant real-time system is represented in the form of the oriented graph. Graph nodes correspond with various system statuses and graph edges correspond with possible ways of transition between these statuses. The graph has one initial node that corresponds with the initial status of the system when all software versions and hardware channels are in the up state. Failure of one of the software versions or one of the hardware channels may occur at any moment of the system operation. That is why, there are two edges at each node that corresponds with the up state status of the system. The system failure occurs after a certain number of failures of software versions and hardware channels. Deadlock vertexes of the graph correspond with invalid statuses of the system. Besides, partially adaptive systems have intermediate states, in which the system passes into the one-channel (one-version) mode of operation in accordance with the realised algorithm.
2.Building up an event model of transitions between the system statuses. All statuses of the majority redundant system are multiple of three in the standard combinatorial probability model. Failures of hardware and software components are asymmetric to each other in MBRDS, i.e. a split of the graph nodes occurs and the same node can have both up state and invalid states. We should consider all possible combinations of failures of hardware and software components for such nodes. System statuses and transitions between them correspond with nodes and edges of the graph. Symmetrical statuses of the system do not require detailing, that is why they are represented in one-event diagram. In case of asymmetric statuses, it is necessary to sort out all possible combinations of failures of software versions and hardware channels, that is why, they are represented in three-event diagram. 3.Identification of probabilities of the system being in up state statuses. Each of the up state statuses is considered to have a probability of the system to be in that status. In case of symmetrical statuses, multiplier values are equal to three, for others they are determined on the basis of the event diagrams. 4.PNF formula derivation for the system. PNF of MBRDS is equal to the sum of probabilities of the system being in up state. That is why it is necessary to sum up all probabilities obtained at the third stage and transform the formula in such a way that it is convenient for conducting calculations. Then it should be transformed with consideration of the complete component model of MBRDS. The graph-even model of specific MBRDS functioning and formula for calculating its PNF are the results of application of the procedure.
B. Analysis of MBRDS system reliability block diagrams Let us analyze reliability block diagrams of multiversion systems in Fig. 1.
Figure 1: The reliability block diagram of majority MBRDS with one-version (a) and multiversion (b) operation system
Each of these diagrams could be divided into three parts: 1) part SW including separate reservation of functional software modules (SWFi) (Fig. 1, a and b); 2) part HW including common reservation of hardware components – storage of operation system (HW1S), storage of SWFi (HW1Fi), processing units (HW2) (Fig. 1, a) or hardware components (HW1S, HW1Fi, HW2) and operation system versions (SWS) (Fig. 1, b); 3) part ME including non-reserved majority element consisting of the corresponding components SWM, HW1M, HW2M and operation system SWS (Fig. 1, a) or ME components (Fig. 1, b).
Assuming similar-type components in various MBRDS channels to be equally reliable, we have the following formulas for probabilities non-failures (PNF) of parts SW, HW, ME: (1) for MBRDS with the common operation system; (2) for MBRDS with the multiversion operation system.
C. Reliability assessment of MBRDS Let’s consider MBRDS with VTA1. The graph of transitions between the system statuses is shown on the Fig. 2. Graph node designators show the number of failed SW versions and HW channels in this status.
Figure 2: The graph of transitions between the system statuses of MBRDS with VTA1
The event model for the system statuses is shown on the Fig. 3. Up state statuses of the SW version are designated with the non-strikeout V with the version number, down state statuses – with the strikeout V. The up state status of the HW channel is designated with a white rectangular, down state – with a black one.
Figure 3: The event model for the system statuses of MBRDS with VTA1
The graph-event model is the aggregate of the graph of transitions and the event model. This model of MBRDS with VTA1 is shown on the Fig. 4. Nodes of the event model are identical to the nodes of the transition graph.
There is one exception – the node that corresponds with the 1HW1SW event, which is split into two asymmetric states: when adaptation is required and when it is not required. Hence, it’s need to consider all three variants of the «positional relationship» of SW version and HW channel failures. In first two cases, HW and SW failures occur in different channels. Therefore, the adaptation algorithm comes into force and the up state channel is revealed with the probability D = DHWDSW, where DHW (DSW) – probability of HW (SW) failure detection. In the third case, HW or SW failure occurs in one channel, hence, two out of three channels are up state and adaptation is not required.
Probabilities of the system up state status are following: 0 –; 1HW – ; 2HW –; 1SW – ; 2SW – ; 1HW1SW– ; 1HW2SW– ; 2HW1SW– ; 2HW2SW– .
Meanings for values in the formula (3) are determined by dependencies (1). The final formula for PNF calculation looks as follows: (4)
Models of MBRDSs with hybrid VTA1 of two different types are shown on the Fig. 5 and the Fig. 6.
Figure 5: The graph-event model of hybrid adaptive MBRDS (VTA1: VTA2 – A, VTA3 – PA)
Figure 6: The graph-event model of hybrid adaptive MBRDS (VTA1: VTA2 – PA, VTA3 – A)
The formula for PNF of hybrid adaptive MBRDS (VTA1: VTA2 – A, VTA3 – PA) looks as follows: (5)
The formula for PNF of hybrid adaptive MBRDS (VTA1: VTA2 – PA, VTA3 – A) looks as follows: (6)
The plots of dependencies PNF from time and gains in PNF comparatively one-channel system for various types of MBRDSs are shown on the Fig. 7.
Figure 7a: The plots of PNF of MBRDS;the parameters of MBRDS: n = 5; DHW = DSW = 0,9; (SWS )= 10-6 1/hour;(SWFi )= 10-5 1/hour; (HW1Fi )=10-61/hour;(HW1S )= 10-61/hour; (HW2) = 10-61/hour;(SWM )= (HW1M )= (HW2M)= 10-8 1/hour.
Figure 7b: The plots of PNF gains of MBRDS ;the parameters of MBRDS: n = 5; DHW = DSW = 0,9; (SWS )= 10-6 1/hour;(SWFi )= 10-5 1/hour; (HW1Fi )=10-61/hour;(HW1S )= 10-61/hour; (HW2) = 10-61/hour;(SWM )= (HW1M )= (HW2M)= 10-8 1/hour.
IV. Conclusion. Results of practical application The methods of the version-threshold adaptation for MBRDS are proposed. Analysis of MBRDS reliability is hampered by the fact that reliability block diagrams have to be specified for each of the variants of component failures. To overcome these difficulties, the a graph-event method of MBRDS reliability assessment is developed. Calculation of PNF of MBRDSs with various types of adaptation showed that the systems with VTA1 haves the most reliable characteristics. It should be noted that in this case the selected values of hardware failure rates are real, then achievement of the failure rate of about 10-5 1/hour for software components is actually on the verge of (and sometimes beyond) the possibilities of the modern technologies of programming. That is why, creation of software for MBRDS is a complex task and requires a maximum technological maturity of the developer. The set of the considered architectures of the MBRDSs with the version-threshold adaptation should be added to the reconfiguration algorithms of software and hardware components, similar to algorithms for multiversion majority systems [7,8]. The methods of the version-threshold adaptation, technique of assessment and choice of fault-tolerant system architectures were used under aerospace processing systems development. It allowed to reduce probability of system failure in 2-3 times. The proposed architectures, adaptation methods, and the software reliability models database and technique of their choice are a base for creation of a system of automated modeling and designing of fault-tolerance real-time control computer systems.
References [1] Laprie J.-C. Dependability Handbook, Laboratory for Dependability Engineery, LAAS, Report n 98-346, 1998, 365 p. [2]Lyu M.R. Handbook of Software Reliability Engineering, McGraw-Hill Company, 1996, 805 p. [3]Hakan Audin, Rami Mehlem, Daniel Mosse. Tolerating Faults while Maximing Reward, Proceedings of the 12th Euromicro Conference on Real-Time Systems (EUROMICRO-RTS 2000). [4]Choi C.Y., Johnson B.W., Profeta III J.A. Analysis of Dependable Architectures, IEEE Transaction on Reliability, vol. 46, n 3, 1997, pp.316-322. [5]Moss T.R., Woodhouse J. Criticality Analysis Revisited, Quality and Reliability Engineering International, vol.15, Issue 2, 1999, pp.117-121. [6]Welke S.R., Johnson B.W., Aylor J.H. Reliability Modelling of Hardware/Software Systems, IEEE Transaction on Reliability, vol.44, n 3, 1995, pp.413-418. [7]Kharchenko V.S. Methods of an Estimation of Multiversion Safety Systems, Proceeding of the 17th International System Safety Conference, Orlando, FL, Aug. 16-21, 1999, pp. 347-352. [8]Kharchenko V.S. Multiversion Systems: Models, Reliability, Design Technologies, Proceeding of the Tenth European Conference of Safety and Reliability, Munich, Germany, 13-17 September, 1999, pp. 73-77. [9]Kharchenko V.S., Sklyar V.V. Monte-Carlo simulation of the unmanned multiversion systems. Abstract of International Conference of Monte-Carlo Simulation, Monte-Carlo, Monaco, June 18-21, 2000, p. 33.