1 / 13

Fault Management

Fault Management. IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong. Overview. Fault Management is the process of locating and correcting network problems or faults Comprehensive fault management is probably the most important task in Network Management.

nbassett
Download Presentation

Fault Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fault Management IACT 418/918 Autumn 2005 Gene Awyzio SITACS University of Wollongong

  2. Overview • Fault Management is the process of locating and correcting network problems or faults • Comprehensive fault management is probably the most important task in Network Management

  3. Benefits of Fault Management Process • Increased network reliability • Provides tools allowing engineer to quickly • Detect problems • Initiate recovery procedures • Need to maintain the illusion of complete and continuous connectivity • Also provides tools to extract information about the networks current state

  4. Accomplishing Fault Management • Can be considered as a three (3) step process • Identify the fault • Isolate the cause of the fault • Correct the fault if possible

  5. Identifying the fault • Gathering Information to identify a problem • To learn that a problem exists we need to gather data about the current state of the network • Two approaches • Log critical network events • Poll network devices

  6. Identifying the fault • Critical network events • Examples • Failure of a link • Lack of response from host • Transmitted by network device when fault conditions occur • Reactive method • If device fails it cannot send an event

  7. Identifying the fault • Occasional Polling • Can help find faults in a timely manner • Tradeoff • Degree of timeliness vs bandwidth consumption • Other factors • Number of devices to poll • Bandwidth of links

  8. Identifying the fault • Example of Occasional Polling • Assume each query and response is 100 bytes long (including data and header information) • For a network of 30 devices • (100 + 100) * 30 = 6000bytes/polling interval = 48,000 bits/polling interval • Polling every minute • 800 bits/second • (48,000 bits/polling interval * 60 secs * 60 polls) = 172,800,000 = 173 Megabits/hour • Polling every 10 minutes • 17.3 Megabits/hour • May not know about event for 10 minutes

  9. Deciding Which Faults to Manage • Need to decide which faults to mange • Need to prioritise faults • If number of faults reports is high network may not handle volume • Limiting event traffic can reduce redundant transmissions and storage • Factors to consider • Scope of control over network • Size of network

  10. Fault Management of a Network Management System • Simplest system • Reports existence of fault but NOT location • More complex tool • Uses capability of hosts and network devices to • Send critical network events • Facilitate isolation of fault cause • Advanced tool • Correction of fault

  11. Impact of a Fault on the Network • A fault management tool MUST be capable of analysing how a fault can affect other areas of the network • Need to know • What services the fault • STOPS • IMPACTS • Not only that a fault has occurred but also how that fault affects other network communication • Data can come from performance management tools

  12. Form of Reporting Faults • Common forms of fault reporting • Text • Graphical • Auditory signals • Text • Will work on any type of terminal

  13. Form of Reporting Faults • Graphical • Considered to be very effective • Can use flashing images to gain attention • Colour can be used to indicate device status • Auditory signals • Will quickly call attention to the occurrence of a fault

More Related