480 likes | 606 Views
What about the Network?. CS 525 Spring 2009 Advanced Distributed Systems. End-to-End Arguments in System Design. J.. Saltzer , D.P. Reed and D.D. Clark M.I.T. Laboratory for Computer Science Presented by: Abdullah Al- Nayeem. Where to Place Functionalities?.
E N D
What about the Network? CS 525 Spring 2009 Advanced Distributed Systems
End-to-End Arguments in System Design J.. Saltzer, D.P. Reed and D.D. Clark M.I.T. Laboratory for Computer Science Presented by: Abdullah Al-Nayeem
Where to Place Functionalities? • Example: Reliable file transfer • Should reliability be implemented per-hop by the communication subsystem? • Or, end-to-end by host applications? Department of Computer Science, UIUC
Where to Place Functionalities? • Possible failures in file transfer: • Disk access failure (hardware) • Packet drop or duplicated packet (communication) • File system error (software) • Communication subsystem cannot itself guarantee reliability. • Also increases network complexity • More overheads for applications that do not require reliability. • Application layer can provide full reliability, even without any support from lower layers of the network. • End-to-end checksum and retry Department of Computer Science, UIUC
End-to-End Argument (E2EA) • The lower layers of the network are not the right place to implement application-specific functions • Move functions “up and out” • “The function in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not possible.” Department of Computer Science, UIUC
Typical Examples • Bit error recovery • Security using encryption • Duplicate message suppression • Recovery from system crashes • Delivery acknowledgement Department of Computer Science, UIUC
Benefits of E2EA • Core network can be simpler and faster • Less assumptions required on the networks • More flexibility in developing new network technologies and applications • Helped in proliferation of the Internet • Dumb networks, intelligent hosts Department of Computer Science, UIUC
Extension of E2EA • Lower layers may implement partial application-specific functions, but only for performance improvements. • Reducing retries in data transmissions • Should the level of reliability at the network be higher than the expected application reliability? • What are the possible tradeoffs? • Short-term performance vs. long-term flexibility • Performance vs. cost Department of Computer Science, UIUC
Identifying the Ends • VoIP: Human user is the end-point • File Transfer: Application is the end-point • Only the end-points knows how to guarantee required reliability Voice over IP File Transfer Voice Files Department of Computer Science, UIUC
Moving Away from E2EA • Hosts are not always trustworthy • Security attacks, e.g. denial of service • E2EA does not guarantee congestion control • Unfriendly host • Communications are not always between two end-points • Multicast, broadcast • How does the network handle these circumstances? Department of Computer Science, UIUC
Other Issues • ISP control, filtering, network monitoring • Government interventions • More subtle end points • Anonymous users using third-party services • Cloud computing entities (SaaS user, SaaS provider, Cloud provider) • Do these factor imply the end of E2EA? Department of Computer Science, UIUC
Summary • End-to-End argument is not an absolute, but a design tool • End-to-End argument can help in organizing “layered” communication systems. Department of Computer Science, UIUC
Consensus Routing: The Internet as a Distributed System John P. John1, Ethan Katz-Bassett1, Arvind Krishnamurthy1, Thomas Anderson1, Arun Venkataramani2 1Dept. of Computer Science, Univ. of Washington, Seattle 2University of Massachusetts Amherst 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2008 Presented by: Ahmed Khurshid
Motivation • Internet routing protocols (both intra and inter domain) usually favors responsiveness over consistency • A new route is incorporated in the forwarding table before propagating the same to neighbors • Results in routing loops and blackholes • Usually there is no extra effort to ensure consensus • Solutions have been proposed for intra-domain routing Department of Computer Science, UIUC
Motivation – Routing loop 2 prefers the path through 3 2 and 3 each prefer the other over 6 5: 1-5, 5: 4-5 5: 3-4-5 5: 4-5 5: 2-4-5 Minimum Route Advertisement Interval (MRAI) Timer Policy change causing BGP loops at 2 and 3 when 4 withdraws a prefix from 2 and 3 but not 6 Link failure causing BGP loops at 2 and 3 Department of Computer Science, UIUC
Motivation – Blackhole AP is prefered over CD CD Recovered iBGP link recovery causing blackholes Department of Computer Science, UIUC
Consensus Routing • A consistency first approach that cleanly separates safety and liveness of routing • Safety: All the routers use a consistent route towards a destination (i.e. no loops) • Liveness: Quick reaction to failures and policy changes • Uses two simple ideas to ensure both consistent behavior and quick reaction • Runs a distributed coordination algorithm to ensure globally consistent view of routing state • Forwards packets using one of two logically distinct modes Department of Computer Science, UIUC
Stable Mode • Unlike BGP, consensus routing does not immediately incorporate a newly learned route into the forwarding table • Periodically, all routers engage in a distributed coordination algorithm that determine the most recent set of complete updates • The coordination is based on classical distributed snapshot and consensus algorithms • Chandy-Lamport snapshot algorithm • Paxos • Output of the coordination is used to compute a set of stable forwarding tables (SFTs) that are guaranteed to be consistent • SFTs replace traditional FIBs (Forwarding Information Base) Department of Computer Science, UIUC
Stable Mode – Update Log A B C Tier-1 Route advertisement/withdrawal D E F G Tier-2 H I J K Tier-3 (Stub) Users Users Users Users Store updates into the update log without modifying the SFT Department of Computer Science, UIUC
Stable Mode – Distributed Snapshot A B C Tier-1 Marker message D E F G Tier-2 H I J K Tier-3 (Stub) Users Users Users Users Updates in the snapshot may be complete or incomplete Department of Computer Science, UIUC
Stable Mode – Aggregation Consolidators A B C Tier-1 Snapshots D E F G Tier-2 • Better reachability • Longevity • Full mesh topology among the ASes H I J K Tier-3 (Stub) Why? Users Users Users Users Tier-1 ASes are good candidates for being consolidators Department of Computer Science, UIUC
Stable Mode – Consensus A B C Tier-1 Paxos message D E F G Tier-2 H I J K Tier-3 (Stub) Users Users Users Users Consolidators run Paxos to agree upon a global view by extracting incomplete updates from the reported snapshots Department of Computer Science, UIUC
Stable Mode – Flood A B C Tier-1 Flooding message D E F G Tier-2 H I J K Tier-3 (Stub) Users Users Users Users Message contains the set of incomplete updates (I) and the set of ASes (S) that successfully responded to the snapshot Department of Computer Science, UIUC
Stable Mode • SFT Computation • SFT is computed using the global set of incomplete updates (I) and local logs • Routes involving ASes not present in S are not placed in the SFT What happens to those ASes? How does this strategy achieve consensus in an asynchronous system? Department of Computer Science, UIUC
Router State • Routing Information Base (RIB) • Stores for each prefix the most recent • Route update received from each neighbor • Locally selected best route • Route advertised to each neighbor • History • Stores for each prefix a chronological list of received and selected routes in the RIB • Stable Forwarding Table (SFT) • Stores next hop interfaces corresponding to stable routes Department of Computer Science, UIUC
Triggers • Each update carries a trigger • A trigger is a globally unique identifier for a set of causally related events propagating the network • It is a two-tuple: (AS number, trigger number) • Triggers ease tracking updates and reduces control overhead in consensus routing • A router ‘A’ stores all the received triggers in its local History • Triggers under processing are temporarily stored in a local set IA Department of Computer Science, UIUC
Distributed Coordination • During snapshot, router ‘A’ saves the sequence of triggers in local History as HA • Prepare a set of incomplete triggers (IA) that contains • All the triggers present in IA • Triggers waiting in the outgoing queues • Logged triggers received over incoming channels (after the start of the current snapshot round) • HA and IA are sent to the consolidators Department of Computer Science, UIUC
View Change Hasn’t finished computing (k+1)th SFT yet Use kth SFT B C Use (k+1)th SFT A D E Send packet to Y Source (X) Destination (Y) Department of Computer Science, UIUC
Transient Mode • Consensus routing switches to this mode when • The next-hop router along a stable route is unreachable • A stable route may not be available • Uses several known schemes • Routing deflection • Detour Routing • Backup route Department of Computer Science, UIUC
Route Deflection 1-5-D, 2-5-D, 3-5-D • After encountering a failed link, deflect the packet to a neighboring AS after consulting RIB • If no neighbor can be chosen, then deflect the packet back to the sending AS (backtracking) • However, backtracking alone is not sufficient to guarantee reachability (see figure) 5-D 5-D 5-D D D D D D Limitations of backtracking Department of Computer Science, UIUC
Other Transient Schemes • Detour Routing • After encountering a failed link, select a neighboring AS (arbitrarily) and tunnel transient packets to it • Tier-1 ASes are good choices in this selection • Backup Routes • Use pre-computed backup routes to forward packets during failure Department of Computer Science, UIUC
Evaluation • Simulation Methodology • CAIDA AS-level graphs gathered from RouteViews BGP tables • Includes 23,390 ASes and 46,095 links annotated with inferred business relationships of the linked ASes • Using XORP prototype to measure implementation overhead • Using PlanetLab nodes to measure the cost of consensus Department of Computer Science, UIUC
Link Failure • One of the links of a multi-homed stub AS is failed during each experiment Consensus routing provides significantly higher levels of connectivity than BGP Department of Computer Science, UIUC
Effect of Traffic Engineering • Withdraw a subprefix from all but one of the providers (3 or more) of a multi-homed AS Consensus routing does not affect routing in case of policy changes Department of Computer Science, UIUC
Overhead Control traffic required by consensus routing Delay incurred by consensus routing In terms of bandwidth and time, consensus routing incurs little overhead Department of Computer Science, UIUC
Discussion Points • Selection of consolidators • Will Tier-1 ASes (or other ASes) agree to perform this additional duty? • Slow ASes may face periods of disconnectivity • How to handle this situation? • What can we say about completeness and accuracy of this strategy? • Will ASes readily cooperate to handle transient packets? Department of Computer Science, UIUC
CAIDA Tools Presented by: Abdullah Al-Nayeem
CAIDA • The Cooperative Association for Internet Data Analysis (CAIDA) • San Diego Supercomputing Center (SDSC), UCSD • CAIDA provides data, tools and analyses on Internet traffic for better understanding of • current and future network topology, routing, security, performance and economic issues. Department of Computer Science, UIUC
CAIDA Tools • Measurement • Tools for active or passive measurement of Internet traffic and flow patterns • Utilities • Utilities to aid analysis of Internet traffic and flow patterns • Visualization • Tools to visualize Internet data Department of Computer Science, UIUC
Internet Measurement Infrastructure • Archipelago (Ark): CAIDA’s next-generation active measurement infrastructure • An evolution of the skitter infrastructure 33 active monitors at different counties. Department of Computer Science, UIUC
Scamper • Measurement tool used at Ark monitors • Teams of Scamper probers probe all routed /24's in a short period of time: • a random address in each /24 prefix is probed approximately every 48 hours (one probing cycle) • Supports ICMP-Paris, TCP, UDP traceroute • Features: • Measures forward IP paths • Measures round-trip time • Discovers maximum transmission unit (MTU) length Department of Computer Science, UIUC
Scamper Datasets • IPv4 Routed /24 Topology Dataset • Useful for understanding the topology of internet • IPv4 Routed /24 AS Links Dataset • contains Autonomous System (AS) links derived from the IP paths of the Topology Dataset • RouteViews BGP data is used to know the AS Department of Computer Science, UIUC
Visualization of IPv4 Internet Topology • 1-17 Jan, 2008 • 4,853,991 IPv4 address • 5,682,419 IP links • 17,791 Ases • Outdegree of an AS is the number of next-hop ASes that were observed accepting traffic from this AS Department of Computer Science, UIUC
RRDTool • Round Robin Database tool • A system to store and display time-series data • Network bandwidth, machine-room temperature, server load average, etc. • Features: • Archives of fixed size for unlimited data • Overwrite old spots if full • Limitations: • Can’t add data for past events • Can’t add data twice at the same timestamp Department of Computer Science, UIUC
RRDTool (2) • Example: Statistics for network interfaces Department of Computer Science, UIUC
Beluga • Provides a real-time graph of RTTs and packet loss to an end host Stanford to m-root-server (Tokyo) Department of Computer Science, UIUC
Walrus • Directed-graph visualization tool in 3D space • A meaningful spanning tree is required, for better visualization. Department of Computer Science, UIUC
Thanks Questions and Comments?