Distributed systems faults and it's solution

2017 Distributed Systems Common Faults and it's Solution www. Techsparks.co.in [Techsparks] +91-96531-59085

Distributed Systems Common Faults and it's Solution What is Fault? When the underlying assumptions of any system get violated then it is referred as a fault. The disrupted internal data state that reflects a fault is called an error. A failure accounts for the externally visible deviation from specifications. 1.Faults in Distributed Systems 1.Data Corruption 2.Hanging Processes 3.Misleading Return Values 4.Misbehaving Machines 5.Hardware/Software/Network Outages 6.Over commitment of Resources 7.Insufficient Disk Space 2. Silent-Fail-StutterModel Reasons for this type of failure: 1.System memory

Silent-fail-stutter is an appropriate model because a memory tester program can discover a corrupt memory chip and doing this test incurs a cost. 2.Processor cache The behaviour is not fail-stop because a faulty cache processor does not retain information about a cache block failure across reboots even in the case of permanent failure. 3.Implications of Silent-Fail-Stutter Since components may fail and might not send the signal of failure to other components, some components periodically or on certain events verify the state of each component and in case failure is detected, report it to other components in Distributed systems. If a single component keeps the check, designers should make sure that this component is more trustworthy than the ones it checks. All the components can co-operate to perform this operation of checking in a distributed manner. 4.Failure Detection

To detect whether a failure has occurred or not and if there is need to trace the cause of that failure. A sudden failure of lower level component can result in the failure higher components of the chain and to trace the fault; we may need to jump down the hierarchy. 5.Evaluation We looked at how components should convey the results of test and we decided to use a database to log the results of tests and timestamp of tests. We also recorded the results of application execution into the database.

6.Conclusions We have successfully detected the faults in large distributed systems and proposed silent-fail-stutterfault model to precisely model component behaviour while keeping up tractability. 7.References 1.Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtualorganizations. International Journal of Supercomputing Applications (2001) 2.Patterson, D.A., Gibson, G.A., Katz, R.H.: A case for redundant arrays of inexpensive disks(raid). In Boral, H., Larson, P.A., eds.: Proceedings of the 1988 ACM SIGMOD InternationalConference on Management of Data, Chicago, Illinois, June 1-3, 1988, ACM Press (1988)109–116 3.Avizienis, A., Laprie, J.: Dependable computing: From concepts to design diversity. In:Proceeding of the IEEE. Volume 74. (1986) 629–638

Distributed systems faults and it's solution

Distributed systems faults and it's solution

Presentation Transcript

Distributed Systems and Algorithms

Networking and Distributed Systems

Distributed systems and Distributed databases design

Distributed Systems and Architectures

Distributed Systems: Atomicity, Decision Making, Faults, Snapshots

Parallel and Distributed Systems

Networks and Distributed Systems

Distributed Systems Course Distributed Multimedia Systems

Concurrent and Distributed Systems

Distributed Systems Course Distributed File Systems

Distributed Systems and Algorithms

Parallel and Distributed Systems

DISTRIBUTED ALGORITHMS AND SYSTEMS

Parallel and Distributed Systems

Tolerating Faults in Distributed Systems

Distributed Systems: Faults

Networks and Distributed Systems

Distributed Systems Course Distributed File Systems

Networks and Distributed Systems

Networks and Distributed Systems