150 likes | 294 Views
Fault Management. IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong. Overview. Fault Management is the process of locating and correcting network problems or faults Comprehensive fault management is probably the most important task in Network Management.
E N D
Fault Management IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong
Overview • Fault Management is the process of locating and correcting network problems or faults • Comprehensive fault management is probably the most important task in Network Management
Benefits of Fault Management Process • Increased network reliability • Provides tools allowing engineer to quickly • Detect problems • Initiate recovery procedures • Need to maintain the illusion of complete and continuous connectivity • Also provides tools to extract information about the networks current state
Accomplishing Fault Management • Can be considered as a three (3) step process • Identify the fault • Isolate the cause of the fault • Correct the fault if possible
Identifying the fault • Gathering Information to identify a problem • To learn that a problem exists we need to gather data about the current state of the network • Two approaches • Log critical network events • Poll network devices
Identifying the fault • Critical network events • Examples • Failure of a link • Lack of response from host • Transmitted by network device when fault conditions occur • Reactive method • If device fails it cannot send an event
Identifying the fault • Occasional Polling • Can help find faults in a timely manner • Tradeoff • Degree of timeliness vs bandwidth consumption • Other factors • Number of devices to poll • Bandwidth of links
Identifying the fault • Example of Occasional Polling • Assume each query and response is 100 bytes long (including data and header information) • For a network of 30 devices • (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval • Polling every minute • 800 bits/second • (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173 Megabits/hour • Polling every 10 minutes • 17.3 Megabits/hour • May not know about event for 10 minutes
Deciding Which Faults to Manage • Need to decide which faults to mange • Need to prioritise faults • If number of faults reports is high network may not handle volume • Limiting event traffic can reduce redundant transmissions and storage • Factors to consider • Scope of control over network • Size of network
Fault Management of a Network Management System • Simplest system • Reports existence of fault but NOT location • More complex tool • Uses capability of hosts and network devices to • Send critical network events • Facilitate isolation of fault cause • Advanced tool • Correction of fault
Impact of a Fault on the Network • A fault management tool MUST be capable of analysing how a fault can affect other areas of the network • Need to know • What services the fault • STOPS • IMPACTS • Not only that a fault has occurred but also how that fault affects other network communication • Data can come from performance management tools
Form of Reporting Faults • Common forms of fault reporting • Text • Graphical • Auditory signals • Text • Will work on any type of terminal
Form of Reporting Faults • Graphical • Considered to be very effective • Can use flashing images to gain attention • Colour can be used to indicate device status • Auditory signals • Will quickly call attention to the occurrence of a fault