50 likes | 82 Views
Distributed systems is very good topic in PhD thesis but most of the students face many problems during their thesis project. Here is the list of those problems with their solutions. For more information Visit: www.techsparks.co.in
E N D
2017 Distributed Systems Common Faults and it's Solution www. Techsparks.co.in [Techsparks] +91-96531-59085
Distributed Systems Common Faults and it's Solution What is Fault? When the underlying assumptions of any system get violated then it is referred as a fault. The disrupted internal data state that reflects a fault is called an error. A failure accounts for the externally visible deviation from specifications. 1.Faults in Distributed Systems 1.Data Corruption 2.Hanging Processes 3.Misleading Return Values 4.Misbehaving Machines 5.Hardware/Software/Network Outages 6.Over commitment of Resources 7.Insufficient Disk Space 2. Silent-Fail-StutterModel Reasons for this type of failure: 1.System memory
Silent-fail-stutter is an appropriate model because a memory tester program can discover a corrupt memory chip and doing this test incurs a cost. 2.Processor cache The behaviour is not fail-stop because a faulty cache processor does not retain information about a cache block failure across reboots even in the case of permanent failure. 3.Implications of Silent-Fail-Stutter Since components may fail and might not send the signal of failure to other components, some components periodically or on certain events verify the state of each component and in case failure is detected, report it to other components in Distributed systems. If a single component keeps the check, designers should make sure that this component is more trustworthy than the ones it checks. All the components can co-operate to perform this operation of checking in a distributed manner. 4.Failure Detection
To detect whether a failure has occurred or not and if there is need to trace the cause of that failure. A sudden failure of lower level component can result in the failure higher components of the chain and to trace the fault; we may need to jump down the hierarchy. 5.Evaluation We looked at how components should convey the results of test and we decided to use a database to log the results of tests and timestamp of tests. We also recorded the results of application execution into the database.
6.Conclusions We have successfully detected the faults in large distributed systems and proposed silent-fail-stutterfault model to precisely model component behaviour while keeping up tractability. 7.References 1.Foster, I., Kesselman, C., Tuecke, S.: The anatomy of the grid: Enabling scalable virtualorganizations. International Journal of Supercomputing Applications (2001) 2.Patterson, D.A., Gibson, G.A., Katz, R.H.: A case for redundant arrays of inexpensive disks(raid). In Boral, H., Larson, P.A., eds.: Proceedings of the 1988 ACM SIGMOD InternationalConference on Management of Data, Chicago, Illinois, June 1-3, 1988, ACM Press (1988)109–116 3.Avizienis, A., Laprie, J.: Dependable computing: From concepts to design diversity. In:Proceeding of the IEEE. Volume 74. (1986) 629–638