350 likes | 449 Views
Networking for the Grid. Yee-Ting Li eScience Summer School @ Edinburgh. What the GRID is. Worldwide Distributed System Interconnected with ‘networks’ Balancing processors, storage and network utilization Networking is important to make GRID work. Networking Important!.
E N D
Networking for the Grid Yee-Ting Li eScience Summer School @ Edinburgh
What the GRID is • Worldwide Distributed System • Interconnected with ‘networks’ • Balancing processors, storage and network utilization • Networking is important to make GRID work
Networking Important! • Only way two grid nodes can communicate with each other • Need ways of determining how ‘efficiently’ they talk • Focus on: • The characterising how they talk • The language they use to talk
Part 1 • Networking • Networking Monitoring • Networks are also transient • Network performance also varies as you’re sharing with n million other users • Sometimes you can notice periodic patterns – sometimes you can’t • Difficult to analyse and create trends/predictions • Show steps towards…
Networking 101 • Networking straight forward • Just connect to the network and it works! • HA!
Networking • Complex? Get’s more complex! • Each node has it’s own scheduling priorities • Routers must serve trillions of data units per second!
Networking • Complex stack from which data has to flow to get onto network • Each node on the network also has their own stacks • Routers have IPR on stacks – no one knows what Cisco stuff looks like!
Example Metrics • Connectivity • Delay • One-way delay • Two-way delay • Throughput / goodput • Network path • Loss • Jitter
Metrics Example • Video Conferencing • Needs predictable bit rate • Doesn’t usually matter if bit rate changes too much • Needs constant jitter • Low one-way delay preferable • FTP • Needs reliable transport • Throughput depends on urgency of data • Jitter and delay don’t matter
Network Monitoring Uses • Monitoring is measuring over long periods of time • Gives an indication of network performance over time – a baseline • Allows comparison of different tools for analysis • Allows analysis of how different protocols behave in different conditions – in real life • Allows ‘tuning’ of existing protocols to make most out of network
Possible Users of a NM Web Service • Network Managers • See how much bandwidth is being used • Network Analysts • Make things faster and better! • Resource Brokers • Broker to determine where to send jobs – Network Cost • Bandwidth Brokers • Allocate bandwidth depending on current network state • Replication Managers • Distribute data only when network is not busy • QoS Brokers (aka Managed bandwidth Services) • Universal language for intercommunication..? • Next Generation FTP • First look up historical throughputs before sending to determine best path
GridNM • Architecture for monitoring the network • Backend – collects data for presentation • Logs metrics in ASCII log files on a single host • Allows mesh measurements – all nodes performs measurements to al other nodes • Uses standard UNIX infrastructure – ssh • Should be easily adaptable to using Globus certifications once interactive processing is introduced in EDG.
GridNM (cont…) • Uses existing (and future tools) to collect metrics • Modular - uses XML to describe available resources • Hosts • Tools • Locks hosts if under measurement – prevents other tests affecting metrics • Currently monitoring 6 sites around Europe using 5 tools
Web Service Network Monitoring • GridNM just one Network Monitoring Program • Many different programs out there! • Unify data exchange between different monitoring infrastructures
piPEs • Internet2 e2ePI Architecture for network monitoring • Defines information flow to diagnose networks and hosts performance – white paper • Incorporates a ‘finger pointing’ mechanism to identify poor performers • Ideal starting point! • BUT… found out about it too late… • Currently investigating implementation with SLAC software + web service as possible implementation of piPEs software
GGF NMWG • Defines characteristics that are just the values that we are interested in • Defines classes of metrics, e.g. bandwidth, delay etc. that these characteristics report • Defines singleton and derived characteristics • Defines samples of data and their inherent sampling patterns • Timestamps • Still in draft form…
GGF NMWG cont. / Schema Design • As it’s all in XML, designing a XML schema to describe ‘objects’ to be passed around • XML Schema Document (XSD) • Focusing actually implementing what the NMWG document says… and doesn’t say… • Note: We are also tackling this from a pure OO design too – however, due to technical differences between objects in C++, Java and SOAP/XML then there may be issues to overcome…
Part 2 • Network Communication Languages • Known as transport protocols - determines how applications put traffic into the network • Sits on top of IP – common language of the internet
Transport Level Protocols • TCP (HTTP, FTP, GridFTP) used for file transfer • Gives guarantee on delivery • All data is copied precisely • Performance can be poor • Respects other internet users • UDP (Real, H323) used for video conferencing • Gives no guarantees on delivery • Data may be incomplete • Performance good • Doesn’t respect other internet users
UDP vs TCP • Udp: min=274, max=565, ave=493, stdev=43 • Tcp: min=37, max=292, ave=195, stdev=40 • Summary: tcp is rubbish! – why?
Memory and Disk transfers Fast Ethernet Over 60Mbits/s iperf >> file copy OC3 Disk limited File copy disk-to-disk Iperf TCP Mbits/s Les Cottrell, SLAC
What does TCP do? Socket buffer size • TCP retransmits lost data • Even retransmits data it ‘thinks’ has been lost! • Needs and uses a ‘windowing’ system • Uses ACKnowledgements from reciever • Grows a Congestion Window ‘cwnd’ to determine the size of window • Model: • Tap is independent of Tank size • Tank filled by application • Valve opening (data rate) determined by feedback from network • Small tanks mean small data rate • Large tanks mean larger data rate TCP Protocol Network
TCP socket buffer sizes • Iperf observations: 490 • Standard socket buffer graph • Shows linear(ish) region followed by plateau • Optimal socket buffer size just over 2mB
Retransmitted Data • Graph shows the amount of retransmitted data against the throughput • Retransmitted data is due to loss on the network • General case ACK’s have to timeout before resending • We get more retransmitted data for low throughputs with large windows
Measuring Performance of Transport Level Protocols • Need to identify what we want to measure – the metrics. • Dependant on the use of the transport protocol. Need to analyse application level usage • For Grid: • Movement of ‘transient’ data • File Transfer and Replication • process jobs or ‘sandboxes’ • Movement of Real-Time Data • Video Conferencing – Access Grid • Real-Time applications
Web 100 & TCP • OSI states that we should not know anything about the separate layers • How do we know something is going wrong? – your throughput decreases! • Prevents congestion collapse! • Need Web100! Allows in depth tcp stack analysis per flow • Kernel patch – 2.4.16, alpha1.2 • New version – 2.4.19 alpha2.0pre1 • Using program to grab web100 results - logvars
Reliability of Web100 results… • Still alpha… but reliable • Graph against iperf throughputs correlate very well • At least as reliable as the result offered by iperf!
Congestion Window • Looking at the max_cwnd achieved for each measurement… • Appears to be two regions • with high correlation of throughput and max cwnd • A linear region where we get the a range of throughputs for same max_cwnd • Cwnd never grows beyond 1500kbytes!
Bandwidth Delay Product • Window = bandwidth * delay • We want • Bandwidth = 1,000,000,000 bit/sec • We have • Delay = 19ms • Window needs to be an average of… • =1e+9 * 19e-3 / 8 bytes • =2.25mbytes! • We only achieve ~1.5mbytes max! • Need to implement some monitoring of the degree of the average and variation of cwnd for each tcp connection…
TCP Optimisation • It’s actually TCP that is limiting our transfer rates! • All applications use it! • Understandable as TCP hasn’t changed much for the last 15-20 years! • When standard link was about 56kbit/sec! • Solution: Need new TCP implementations!
What is High Speed TCP? • Changes the way TCP behaves at high speed (ie large cwnd) • Standard TCP has two modes • Slow start (not very slow…) • Congestion Avoidance • Focuses on Congestion Avoidance Region – ie when TCP knows (thinks it knows…) how well the network behaves… • BUT only when we are at high speeds, else do what normal Standard TCP does… • Readily deployable 1st step towards Equation Based Congestion Control
What does it do? • Standard TCP uses two parameters • Increase parameter, a • Decrease parameter, b • i.e. AIMD( a,b ) • Standard TCP uses • a=1 • b=0.5 • High Speed TCP introduces • a->a(cwnd) • b->b(cwnd) • i.e. The value of a and b depends on the current congestion window size • If we increase a more with larger cwnd we can get back up to our ‘optimal’ cwnd size for the network path • If we decrease b less we don’t lose as much bandwidth due to a small congestion window
What exactly does it do? • Based on the TCP response function • Relates loss and throughput • Uses the TCP response function to investigate certain parameters • High_Window, High_Loss; largest cwnd needed for x throughput and the required loss for that throughput • Low_Window, Low_Loss; smallest cwnd when we actually switch from Standard TCP and the required loss rate for that cwnd size • High_B; the smallest decrease in b when we are at a large cwnd • Equations to transform this information into a table for a(cwnd) and b(cwnd)