640 likes | 752 Views
The importance of the network. Physical Network View. Overlay View. From a distributed systems standpoint, the physical network provides the backbone for overlays.
E N D
The importance of the network Physical Network View Overlay View • From a distributed systems standpoint, the physical network provides the backbone for overlays. • Distributed systems developers take for granted that a node can talk (reliably, if need be) to any other node in the same physically connected component via some identifier (say, IP address) • Tools, such as network coordinates, can help developers
CS525: What about the network? End-to-End Arguments in System Design J.H. Saltzer D.P. Reed D.D. Clark
Overview • The end-to-end argument • Examples/applications of the argument • Discussion
Function placement • Should functions that end users/applications perform be implemented at lower or higher levels? • If we want to transfer a file reliably, what should be the job of each computer subsystem? • How about encryption? Delivery Acknowledgement? Duplicate message suppression?
End-to-end argument The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible. (Sometimes an incomplete version of the function provided by the communication system may be useful as a performance enhancement.)
Careful file transfer Computer B Computer A
Careful file transfer Computer B Computer A F • File transfer program on A asks file system to read F from disk
Careful file transfer Computer B Computer A F • File transfer program on A asks file system to read F from disk • File transfer program on A asks communication system to send file
Careful file transfer Computer B Computer A • File transfer program on A asks file system to read F from disk • File transfer program on A asks communication system to send file • Communication system transmits packets
Careful file transfer Computer B Computer A F • File transfer program on A asks file system to read F from disk • File transfer program on A asks communication system to send file • Communication system transmits packets • Communication system gives F to file transfer program on B
Careful file transfer Computer B Computer A F • File transfer program on A asks file system to read F from disk • File transfer program on A asks communication system to send file • Communication system transmits packets • Communication system gives F to file transfer program on B • File transfer program on B asks file system to write F to disk
What can go wrong? Computer B Computer A A A • Reading to and writing from file system
What can go wrong? Computer B Computer A A A B B • Reading to and writing from file system • Breaking up file / reassembling file
What can go wrong? Computer B Computer A A A C B B • Reading to and writing from file system • Breaking up file / reassembling file • Transmitting file over communication system
Possible solution #1 • Ensure each step by some form of error checking: duplicate copies, redundancy, timeout and retry, etc. • Packet error checking at each hop • Send every packet three times • Acknowledge packet reception at each hop
Problems with this solution • Not complete; still requires application level checking • May not be economical Computer B Computer A A A B B
Possible solution #2 • “End-to-end check and retry” • Application commits or retries based on checksum value. • If errors along the way are rare, this will most likely finish on first try.
Performance • Lower levels can be reliable as a performance booster • Transferring large files • Regardless of data communication, end-to-end check must be done • Tradeoff based on performance, not correctness • Is the amount of effort put into the reliability worth the performance gain?
Delivery guarantee Computer A Computer B Computer A Computer B • ARPANET returns RFNM to acknowledge successful message delivery • Is this really useful to end application? message message RFNM got it
Data encryption • Communication system needs keys • Cleartext at host, before application • Authenticity check must be performed Computer B Computer A
Data encryption • Keys are maintained by end application • Ciphertext before application • Authenticity by default (assuming both keys are private) Computer B Computer A
Identifying the ends • Low level bit checking is bad for real-time voice transfer: high level error checking is better. • However, low level reliability measures may be fine is voice is being stored.
Discussion: Layering model Computer A Computer B • TCP (usually) runs only at end hosts • Does TCP violate end-to-end by being below application? • Is giving the application the option of TCP or UDP the way to go? Application Application Router Transport Transport Network Network Network Data Link Data Link Data Link Physical Physical Physical
Discussion: TCP splitting • Performance much better in wired section • Intermediate node acts as end host • What else can we do? Computer B Computer A
Discussion: Spam • The end user for email is generally considered to be a human. • By the end-to-end argument, the network should deliver all mail to the user. • Are spam control mechanisms therefore in violation of the end-to-end argument? • If so, is it an appropriate violation?
Discussion: End-to-end today • Is the end-to-end argument still valid today? • Is hardware good enough that we don’t have to worry about end checks? • Applications are becoming more and more complex. • Do P2P systems, such as Chord, violate end-to-end? • Does in-network aggregation, such as in sensor networks, violate end-to-end?
Stable and Accurate Network Coordinates J. Ledlie et al. (Harvard University) In International Conference on Distributed Computing Systems (ICDCS’06) Some slides taken from the author’s presentation
Outline • Background • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter • Conclusion
Outline • Background • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter • Conclusion
Motivation of Network Coordinates (-15,20) (-40,20) E (20,20) C D Player Game Server (0,8) A B (25,8) F (-39,7) RTTAB I Direct measurement is not scalable! Predict latency by coordinates G (20,-15) (-25,-17) H (9,-20) Pick server with lowest mean latency for all players. Use centroid of network coordinates! –Server A
Benefits of NCs • Estimate/Predict RTT without direct probing • Scalability • Make well-understood geometric algorithms applicable to distributed systems problems • Powerful abstraction
How Network Coordinates Work A • A starts measurement to B. • B replies with its coord. A deduces RTT. • A computes estimate and error. • A moves toward ideal coord, relative to B. • Repeat with C, D, E. • Predict to X. (103,84) C A A (100,80) A A RTT=60ms 60ms Coord? E D X at (140,20) B (70,40) Estimate=|(100,80)-(70,40)|=50ms Error=RTT-Estimate=60-50=10ms Goal: minimize global prediction error X
Vivaldi Network Coordinates • Simple • Adaptive • Periodic RTT measurements with neighbors • Refine coordinates (pulled or pushed by each neighbor) • Decentralized • Works well… in simulation
Outline • Network Coordinates • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter • Conclusion
Problem #1: Latencies are not Static • Raw latency data have errors and change RTTAC=5ms,5ms,6ms,40ms,41ms,40ms RTTAB=60ms,60ms,59ms,1000ms,70ms,60ms A B C
Problem #1(a): Errors are Unpredictable Three hours of measurements from berkeley to uvic.ca 82% of measurements within 1ms of median
Problem#1(b): Latencies can Change Three days of measurements from ntu.edu.tw to 6planetlab.edu.cn Need to remove noise, but remain adaptive
Outline • Network Coordinates • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter • Conclusion
Solution #1: Latency Filter Problem: Latencies are not Static • Filtering with histories • Minimum of previous four samples works best. Time newest oldest t0 Receives 1000ms RTT t1 t2 How do they find out?
Solution #1: Latency Filter • General Moving Percentile (MP) filter • h: size of the history window • p: percentile returned as the prediction • e.g. “Minimum of previous four samples” • h=4, p=25% • Run experiments on the 3-day trace, varying h and p • Evaluation metric: Relative Error • “h=4,p=25%” achieves the lowest error Relative Error = (|RTT-Estimate|)/RTT
Latency Filter in the Big Picture Simple Thresholds Sliding Windows
Latency Filter in Practice 226 PlanetLab nodes (coord in 3D Space) Latency Filter (h=4,p=25%) Raw Coordinates Latency Filters eliminate outliers that cause distortions of many coords all at once (e.g., minute 38 of the video)
Outline • Network Coordinates • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter
Problem #2:Changing Coordinates is Expensive • Frequent coord change, even with Latency Filter • App-specific cost • e.g., cascading heavyweight process migration in streaming DBs • Most apps would prefer to be notified only when significant change occurs • Is it possible to tell apps less frequently and retain high accuracy?
Outline • Network Coordinates • Two Practical Problems • Latencies are not static • Changing coordinates is expensive • Proposed Solutions • Latency Filter • Update Filter
Solution #2: Update Filter Problem: Changing Coordinates is Expensive • Distinguish system-level coordinatesCs from application-level coordinates Ca Simple Thresholds Sliding Windows
Solution #2: Window-based Update Filter • Keep history of recent coordinates • Divide history into two windows (sets): current (newest) and start (oldest) • When current and startdiverge (by some metric), update application with new coordinate • Two Metrics • Local Relative Distance • Energy
Update Filters:Local Relative Distance • Remember nearest known neighbor • Add coords to start and current windows • Compare centroids of two windows B dmin A C C0 C1 C2 C3
Update Filters:Local Relative Distance • Remember nearest known neighbor • Add coords to start and current windows • Compare centroids of two windows B B d dmin A A C Start Window Ws C4 C0 C1 C2 C3 Current Window Wc
Update Filters:Local Relative Distance • Remember nearest known neighbor • Add coords to start and current windows • Compare centroids of two windows B B d dmin A A C Start Window Ws C4 C5 C0 C1 C2 C3 Current Window Wc If Centroid(Ws)-Centroid(Wc) > d x e