Network Monitoring

Network Monitoring Prof. Choong Seon HONG

Network Monitoring • Access to monitored information • How to define monitoring information, and how to get that information from a resource to a manager • Design of monitoring mechanisms • How best to obtain information from resources • Application of monitored information • How the monitored information is used in various management functional areas

Network Monitoring Information • Information type • Static : infrequent changing information. Ex) Port ID, Number of Ports • Not frequently changed • Dynamic : state information. Ex) state of protocol machine or the transmission of packet • Statistical : derived from dynamic information. Ex) average number of packets transmitted

Call_Blocked Packed_Loss Time_Delay Throughput Statistical DB Abstraction of state and event variables State_Variable Event_Variable Dynamic DB Sensor activation and data collection Sensor DB Switch_Server Buffer Source Station _info Server Switch_Buffer Switch_Source Status_Sensor Derived_Status_sensor Event_Sensor Configuration DB Static DB Network Monitoring Information (cont’d) • Organization of a management information base by Mazumdar and Lazar(1991)

Network monitoring configurations • Monitoring application : visible to user such as performance monitoring, fault monitoring and accounting monitoring • Manager function : having basic monitoring function • Agent function : gathering and recording management information from one or more elements, and communicate with monitor • Managed objects : management information that represents resources and their activities • Monitoring agent : generating summaries and statistical analyses of management information

Network monitoring configurations (cont’d) • Functional architecture for network monitoring Monitoring application Manager function Monitoring application Manager function Monitoring agent Agent function Managed object Agent function Managed object Agent function Managed object A model for summarization Manager-agent model

Network monitoring configurations (cont’d) Monitoring application Manager function Monitoring application Manager function Agent function Managed objects Subnetwork or Internet Agent function Managed object Managed resources in manager system Resources in agent system Monitoring application Manager function Monitoring application Manager function Subnetwork or Internet Subnetwork or Internet External monitor Proxy monitor agent Agent function Agent function Observed traffic LAN LAN

Polling and event reporting • Information is collected and stored by agents and it is used by multiple managers • Two techniques to make the information • polling : request-response interaction between a manager and agent • querying any agent and requesting the values of various information elements • agent responding with information from its MIB • event reporting : initiative with the agent and the manager with the role of a listener • giving current status of agent to manager • preconfigurable reporting period or settable by manger • generating a report when a significant event (ex, a change of state or an unusual event (ex., fault) • more efficient than polling for monitoring object whose states or values change relatively infrequently

Polling and event reporting (cont’d) • Telecommunications management system : very high reliance on event reporting • SNMP approach puts very little reliance on event reporting • SNMP and OSI systems management

Performance monitoring • Performance indicators • absolute prerequisite for the management of telecom network : measuring the performance of the network, or performance monitoring • difficulties to choose appropriate indicators because of following: • there are too many indicators in use • the meanings of most indicators are not yet clearly understood • some indicators are introduced and supported some manufacturers only • most indicators are not suitable for comparison with each other • frequently, the indicators are accurately measured but incorrectly interpreted • in many cases, the calculation of indicators takes too much time, and the final results can hardly be used for controlling the environments • service-oriented measures (availability, response time, accuracy) and efficiency-oriented measures (throughput, utilization)

Performance monitoring (cont’d) • Availability : percentage of time that a network system component, or application is available for a user • A = MTBF / MTBF + MTTR, where MTBF : mean time between failures, MTTR : mean time to repair • Availability of serial and parallel connections A A 0.98 x 0.98 = 0.96 1- 0.98 = 0.02 : one unavailability 0.02 x 002 =0.0004 : both unavailability 1-0.004 = 0.9996: availability of combined unit A2 Serial A 2A - A2 A Parallel

Performance monitoring (cont’d) • Response time • Several studies show that a computer and a user interacts at a pace that neither has to wait on the other • productivity increases significantly • the cost of work drops • quality tends to improve • Up to two seconds : it was acceptable for most interactive applications • User response time: the time span between the moment a user receives a complete reply to one command and enters the next command - referred to as think time • System response time: the time span between the moment the user enters a command and the moment a complete response is displayed

Performance monitoring (cont’d) • Elements related to response time Inbound terminal delay : the delay in getting an inquiry from terminal to the communications line • Inbound queuing time : the time required for processing by the controller or PAD device • Inbound service time : the time to transmit the communications link and nodes (controller to host’s front-end processor) • Processor delay : the time for processing in the front-end processor, the host processor, the disk driver and so on • Outbound queuing time : the time a reply spends at a port in the front-end processor waiting to be dispatched to the network. • Outbound service time : the time to transmit the communications facility from the host’s front-end processor to the controller. • Outbound terminal delay : primarily due to line speed

Performance monitoring (cont’d) • Elements of response time • TI = inbound terminal delay • WI = inbound queueing time • SI = inbound service time • CPU =CPU processing delay • WO = outbound queueing delay • SO = outbound service time • TO = outbound terminal delay SI Network interface (e.g., bridge) Server TO PC Network WI WO TI CPU SO

Performance monitoring (cont’d) • Accuracy • Accurate transmission of data between user and host or between two hosts • using error-correction mechanisms in protocol such as the data link and transport protocols • generally not user concern • rejection rate: the percentage of time the network cannot transfer information because of a lack of resources and performance • > 2% indicates significant problems • Throughput • an application-oriented measure • the number of transactions of a given type for a certain period of time • the number of customer sessions of a given application during a certain period of time • the number of calls for a circuit-switched environment

Performance monitoring (cont’d) • Goodput • the probability or the rate of successfully received packets with no packet loss that causes packet loss at the receiver • Utilization • a more fine-grained measure than throughput • determining the percentage of time that a resource is in use over a given period of time • to search for potential bottlenecks and areas of congestion • usually increasing exponentially as the utilization of a resource increases (see figure 2.10)

Performance monitoring (cont’d) • Collecting utilization data • On a bridge or router • packet forwarding rate • percentage of dropped frames (on each interface) • number of packets in a queue • processor load • On a file server • processor load • disk access rate • NIC utilization

Performance monitoring (cont’d) • Network Management System • A simple tool • provide real-time information about network devices and links • preferably in graphical form such as a line or bar graph • A more complex tool • setting thresholds can trigger a subsequent action Utilization (%) 60 50 40 Threshold for alarm: 60% Rearm alarm at 40% Time(sec)

Performance monitoring (cont’d) • Thresholds have a priority (low, medium, high) • Graphing historical data • line graph:examining trends in data such as utilization • bar graph: comparing values • pie graph: demonstration the percentage of values 100 Memory used (Kbytes) Packets passed (K) 21% Appletalk 35% IP Utilization (%) 5% OSI 29% DECnet 4% unknown Time (seconds) 1/98 2/98 3/98 Line graph Bar graph Pie graph 6% SNA

Performance monitoring (cont’d) • An Advanced tool • Examining the historical data • receive the state of the network and performance problems • retrieve information from the database • analyze the state of the network Threshold value 60 Predicted utilization Utilization (%) Computed actual utilization 30 60 90 120 150 180 Time (days)

Dispatching discipline Waiting line (queue) Departure Server Service time Utilization Waiting time in the queue Waiting time in the queuing system Performance Management • Simulating the network • analyze future performance and determine what configuration can produce the greatest performance • build the network model • how the simulation should calculate each component • how it should react to the simulation • Queuing analysis

Performance Management • predicting response time, rejection rate and availability • sufficient input • simulate traffic Limit of experience 12 sec Actual response time Response time 8 sec 4 sec Projected response time 0.2 0.4 0.6 0.8 system load (utilization)

Performance Management • Reporting performance information • text reports are the most common way • utilization and error rates • network devices and links • data in either a graphical format or on a bitmapped display

Fault Management • Problems of Fault Monitoring • Fault observation • unobservable faults (e.g. the existence of deadlock) • partially observable faults (e.g. failure of some low-level protocol in an attached device) • uncertainty in observation (e.g. Lack of response) • Fault diagnosis • multiple potential causes • too many related observation • interference between diagnosis and local recovery • absence of automated testing tools

Fault Management • Propagation of failure to higher layers Application failure Transport failure Data link failure Client Client Transmission break Router Router Mux Mux

Fault Management • Examples of test that a fault monitoring should have • connectivity test • data integrity test • protocol integrity test • data saturation test • connection saturation test • response time test • loopback test • function test • diagnosis test

Accounting Management • Accounting monitoring • Keep track of user’s usage of network resources • Resources • communication facilities: LANs, WANs, leased lines, dial-up lines, and PBX systems • computer hardware: workstations and servers • software and systems: applications and utility software in servers, a data center, and end user sites • services: includes all commercial communications and information services available to network users • Accounting data • user identification • receiver • number of packets • resources used • time stamps & priority level

Accounting Management • Determining network resource usage • total number of transaction • number of logins • total number of packets • total number of bytes (reflecting bandwidth) • billing on output • bytes received • Email • Acknowledgment • Security level

Accounting Management • Accomplishing Accounting Management • Gathering data about the utilization of network resources • Using metrics to help set usage quotas • Billing users for their network use • Metrics and Quotas • SNMP • RFC 1272 “Internet Accounting Background” • define services to be metered and usage reporting • define the types of information necessary at various layers • Metrics work with quotas • Billing • One-time installation fee • Monthly fee • Fee based on amount of network resources consumed

Accounting Management • Network management system • A simple tool • monitor for metrics that exceed quotas • report that data • A more complex tool • perform network billing • determine where to poll for billing information • An advanced tool • forecast the need for network resources • establish reasonable metrics and quotas • predict their billing costs • Reporting • real-time display:the current value of a metric • text reports: historical accounting and billing information

Network Monitoring