340 likes | 481 Views
GridNM Network Monitoring Architecture (and a bit about my phd). Yee-Ting Li, <ytl@hep.ucl.ac.uk> 1 st Year Report @ UCL, 17 th June 2002. What the GRID is. Distributed System Interconnected with networks Balancing processors, storage and network utilisation
E N D
GridNMNetwork Monitoring Architecture(and a bit about my phd) Yee-Ting Li, <ytl@hep.ucl.ac.uk> 1st Year Report @ UCL, 17th June 2002
What the GRID is • Distributed System • Interconnected with networks • Balancing processors, storage and network utilisation • Like the SETI project on steriods • Networking is important to make GRID work GridNM - Yee-Ting Li
Networking Important! • Only way two grid nodes can communicate with each other • Need ways of determining how ‘efficiently’ they talk • Focus on: • The characterising how they talk • The language they use to talk GridNM - Yee-Ting Li
Part 1 • Network Metrics and Measurement • GridNM • Case studies GridNM - Yee-Ting Li
Network Metrics / Characteristics • Metric: ‘several quantities related to the performance and reliability of the Internet that we'd like to know the value of. When such a quantity is carefully specified, we term the quantity a metric.’ • Can be empirical or derived • Singletons, Sample and Statistical Metrics GridNM - Yee-Ting Li
Example Metrics • Connectivity • One-way delay • Two-way delay • Throughput / goodput • Network path • Loss • Jitter GridNM - Yee-Ting Li
Metrics Example • Video Conferencing • Needs predictable bit rate • Doesn’t usually matter if bit rate changes too much • Needs constant jitter • Low one-way delay preferable • FTP • Needs reliable transport • Throughput depends on urgency of data • Jitter and delay don’t matter GridNM - Yee-Ting Li
Measurement Methodology • How to get the metrics • Must be repeatable – need to define methodology carefully • Direct measurement of a performance metric using injected test traffic. • Projection of a metric from lower-level measurements. • Estimation of a constituent metric from a set of aggregated measurements. • Estimation of a given metric at one time from a set of related metrics at other times. GridNM - Yee-Ting Li
Measurement Example • ‘ping’ measures rtt – a direct measurement • Sending a single ‘ping’ would give a singleton - empirical • Sending 10 pings (a sample) out and getting the average is a statistical metric – derived • Using a set of measurements over time, we can derive an Estimate of the rtt • Projection would be if we had the owd for each router to the next – add all up together to get path owd. GridNM - Yee-Ting Li
Network Monitoring Uses • Monitoring is measuring over long periods of time • Gives an indication of network performance over time – a baseline • Allows comparison of different tools for analysis • Allows analysis of how different protocols behave in different conditions – in real life • Allows ‘tuning’ of existing protocols to make most out of network GridNM - Yee-Ting Li
GridNM • Architecture for monitoring the network • Backend – collects data for presentation • Logs metrics in ASCII log files on a single host • Allows mesh measurements – all nodes performs measurements to al other nodes • Uses standard UNIX infrastructure – ssh • Should be easily adaptable to using Globus certifications once interactive processing is introduced in EDG. GridNM - Yee-Ting Li
GridNM (cont…) • Uses existing (and future tools) to collect metrics • Modular - uses XML to describe available resources • Hosts • Tools • Locks hosts if under measurement – prevents other tests affecting metrics • Currently monitoring 6 sites around Europe using 5 tools GridNM - Yee-Ting Li
GridNM ‘plot’ GridNM - Yee-Ting Li
Security • As secure as SSH • But requires automatic logon • Denial of Service Attacks • Certain Tools (eg iperf) require servers to be run. • GridNM runs the server (unless otherwise told not to) before each tests on the remote host GridNM - Yee-Ting Li
Tool Examples GridNM - Yee-Ting Li
UDP versus TCP GridNM - Yee-Ting Li
Rtt – good network GridNM - Yee-Ting Li
Rtt – periodicity GridNM - Yee-Ting Li
Rtt – bad network GridNM - Yee-Ting Li
Rtt – bad network, loss GridNM - Yee-Ting Li
TCP / Iperf Throughput GridNM - Yee-Ting Li
TCP Performance GridNM - Yee-Ting Li
TCP Performance GridNM - Yee-Ting Li
What does TCP do? Socket buffer size • Tap is independent of Tank size • Tank filled by application • Valve opening (data rate) determined by feedback from network • Small tanks mean small data rate • Large tanks mean larger data rate • Even larger tank mean smaller data rate?!?! TCP Protocol Network GridNM - Yee-Ting Li
Investigation • Possible explanation: • Rate of tank filling < rate of water flow out • i.e. application not fast enough to fill socket buffer past threshold • BUT - needs further investigation • Back to back lab tests with PCs and routers • Comparison to other tcp based tools GridNM - Yee-Ting Li
Part 2 • Network Communication Languages • Known as transport protocols - determines how applications put traffic into the network • Sits on top of IP – common language of the internet GridNM - Yee-Ting Li
Transport Level Protocols • TCP (HTTP, FTP, GridFTP) used for file transfer • Gives guarantee on delivery • All data is copied precisely • Performance can be poor • Respects other internet users • UDP (Real, H323) used for video conferencing • Gives no guarantees on delivery • Data may be incomplete • Performance good • Doesn’t respect other internet users GridNM - Yee-Ting Li
UDP versus TCPperformance at high speeds GridNM - Yee-Ting Li
Measuring Performance of Transport Level Protocols • Need to identify what we want to measure – the metrics. • Dependant on the use of the transport protocol. Need to analyse application level usage • For Grid: • Movement of ‘transient’ data • File Transfer and Replication • process jobs or ‘sandboxes’ • Movement of Real-Time Data • Video Conferencing – Access Grid • Real-Time applications GridNM - Yee-Ting Li
Transport Protocols ‘NG’ GridNM - Yee-Ting Li
Tools to Measure Grid Traffic • Eg TCP • Can use web100 – allows analysis of TCP traffic via fundamental variables important to TCP/IP\ • GridFTP allows logging of transfer information • UDP (UDP Blast, Tsunami) • Need either transport level recording (like web100) or application monitoring • PGM / CC • Need application to be built to use transport protocol • General Solution • Gather SNMP data from nodes along network. GridNM - Yee-Ting Li
Future Directions(the phd bit) • Provision Title in field of • Providing Advanced Transport Protocols for Grid Applications • Aim: Use GridNM infrastructure to analyse performance of different transport protocols • Implement findings into Grid infrastructure, eg GridFTP, to improve grid processes (processing jobs, file transfer, file replication, Access Grid…) GridNM - Yee-Ting Li
Conclusion • Created a flexible infrastructure to monitor and analyse internet traffic • Shown metrics for different scenarios • Given performance overview of current transport protocols • Identified future areas of research into Transport Protocols for the grid. GridNM - Yee-Ting Li