Rethinking the Internet Architecture Process, Architecture, and Troubleshooting

Rethinking the Internet ArchitectureProcess, Architecture, and Troubleshooting Scott Shenker (joint work with many people, including Katerina Argyraki, Hari Balakrishnan, David Cheriton, Petros Maniatis, Ion Stoica, Mike Walfish)

Process Why are we doing this, anyway?

Why the Clean Slate Mania? • Internet in crisis? • lack of functionality not a crucial problem • lack of reliability is most important problem • Research community in crisis? • little practical impact on architecture • narrowed focus, stopped asking the big questions • NSF’s response: FIND and GENI • but not enough by itself....

You Can Lead an Academic to Architecture, but.... • Normal academic behavior won’t produce architecture • Publication requires differentiation and/or indifference • Architecture comes from critique and synthesis • work on ideas other than your own..... • Can’t just design, simulate and abandon • must also experiment and deploy..... • .....then discuss and synthesize • Process change harder than technical issues • adoption is much harder than both!

Some Thoughts on Architecture material covered in several papers (apologies to those who have heard all this before) not comprehensive architecture, many issues ignored

What’s Wrong with the Internet? • Internet is everywhere, used for (almost) everything • Main limiting factor seems to be lack of reliability • can’t do telesurgery, air traffic control, etc. • Hard to improve reliability of packet delivery within current architecture • Vulnerable to attacks, misconfigurations and failures

Packet Delivery Problems • Access link failures • multihome • Routing failures • security, policy, configuration, convergence, multipath,... • Congestion control failures • FQ, XCP, RCP, .... • DoS • default-off, capabilities, filters,...

Packet Delivery Problems • Technical solutions are largely at hand • not perfect, but huge improvement over status quo • No overarching synthetic architecture has emerged • symptom of process failure, or just too early? • But packet delivery won’t be the focus of this talk.... • because only experts see it as the major problem

Normal User’s Perspective Other forms of failure dominate: • out-of-date email addresses • broken links • misleading urls and/or inauthentic data • applications blocked by NATs, etc. • email unusable or unreliable due to spam • ......

Why? Three Important Changes... • Host-to-host  accessing data and services • End-to-end  middleboxes • Appropriate communication  spam

Three Important Changes • Host-to-host  accessing data and services • End-to-end  middleboxes • Appropriate communication  spam

Not just host-oriented apps.... • Of course, packets always flow from host to host • modulo middleboxes.... • But which host are the packets sent to? • This is controlled by what hostname is used • So adjusting to data-oriented apps involves re-evaluating the Internet naming system • data, service specified by host/path pair

Problems with host/path names • Data movement causes broken links • names should be persistent • Replication unnecessarily difficult • Akamai expensive, and can’t replicate at object granularity • Google, P2P, etc. do this now.... • DNS names lead to legal/political battles • increasingly important, witness ICANN debacle • Names don’t facilitate authentication • can’t easily verify that data originated with intended source

Fix #1: Name Data/Services Directly • Network locations: IP addresses • Hosts: endpoints identifiers (EIDs) • Data/Services: service identifiers (SIDs) • direct naming supports fine-grained migration/replication • User-level descriptors: • search terms • canonical names (AOL keywords) • .......

Application SIDs App session Bind to EID (HIP) Transport IP hdr EID TCP SID … IP Fix #2: Use Names in Appropriate Layer User-level descriptors(e.g., search) App-specific search/lookup returns SID App session Resolves SID to EIDOpens transport conns Transport Resolves EID to IP IP

Fix #3: Names Should be Flat!0xf436f0ab527bac9e8b100afeff394300 • A name can be persistent if and only if it doesn’t embed any mutable information about its referent • Flat names embed no information, so they can be used to persistently name anything • Enables inter-domain migration, etc. • Once you have a large flat namespace, you never need other global handles • no distinction between EIDs, SIDs, etc.

Disadvantages of Flat Names • Hard to resolve • No local control • No locality • Not human friendly all can be handled, but flat names do require new resolution infrastructure

Fix #4: Make Names Self-certifying • Name = Hash(pubkey, salt) • Value = <pubkey, salt, data, signature> • can verify name related to pubkey and pubkey signed data • Can receive data from caches or other 3rd parties without worry • much more opportunistic data transfer

Proposed Naming System • Flat, self-certifying identifiers for all entities • Used in “layered” fashion so that each protocol binds to the correct level of abstraction • Names are persistent, verifiable, and support easy replication and migration • Requirement: industrial-strength flat name resolver • names, key revocation (later, another use)

Not just end-to-end.... • Middleboxes provide important functionality • NATs, firewalls, proxies, caches, app accelerators, etc. • But processing between endpoints violates pure end-to-end religion, and causes many practical problems • e.g., NATs interfere with many applications, • How can architecture support middleboxes better? • eliminate problems and make them architecturally sound

Delegation via Resolution • Names usually resolve to “location” of entity • Delegation principle: A network entity should be able to direct resolutions of its name not only to its ownlocation, but also to chosen delegates • Semantics: • where am I where should packets be sent to reach me • This allows packets to be directed towards middleboxes in a clean and coherent manner

Dest EID Dest EID Mapping Mapping Packet structure Packet structure d d ipd ipf ipd TCP hdr ipf EID d TCP hdr Firewall IP ipf Current (Bad) Middleboxes Example Architecturally-Sound Middleboxes EID d IP ipd EID s • Delegate can be anywhere, not necessarily on path • Can apply to app-layer middle boxes • Including SID, EID in packet is crucial

Possible Impacts • More general services: more complex services (like Riverbed, transcoding, etc.) can fit within framework • Remote services, not boxes: since middleboxes need not be on-path, services like firewalls, virus-scanners, etc. can be provided as remote services • Rethinking transport: with intermediaries between endpoints, basic notion of the transport layer should be rethought, combining ideas from DTN, DOT, etc.

Restraining Usage • Can’t be at packet level, must be app-dependent • But don’t want separate mechanism for each app • Email, IM, wiki, etc. • Proposal: quota system • quotas allocated in application-dependent manner • quotas enforced through single mechanism • stamp for each usage, canceled through mechanism • see NSDI 06 paper for details.... • Uses flat name resolution

Summary: Other Forms of Failure..... • broken links and pointers: persistent names • inauthentic data: self-certifying names • applications blocked by NATs, etc.: delegation • spam and other clutter: quota enforcement No change to IP or routers!

Troubleshooting and Debugging because things inevitably fail.....

User’s Perspective • Want to know who to yell at • identify responsible entity (at appropriate granularity) • Want their complaints to be taken seriously • provide credible and actionable report • Want the problem fixed, now • detailed diagnostic tools • this is traditional focus of troubleshooting

User’s Perspective • Want to know who to yell at • identify responsible entity (at appropriate granularity) • Want their complaints to be taken seriously • provide credible and actionable reports • Want the problem fixed • detailed debugging tools • this is traditional focus of work in this area

Vision • Incorporate coherent set of monitoring tools into architecture that: • record necessary information • process information to answer relevant questions • Key points: • not just statistics (e.g., Netflow), but answers • focus broader than just detailed diagnostics • Three examples

Ex. #1: Monitoring ISPs • Monitor boxes on peering links record packet digests • no internal information revealed • Boxes exchange information to determine where packets are dropped and/or delayed • Information ends up at source ISP or end user • Overhead: ~2-4% of packet bandwidth • Can be applied within enterprises, etc.

Ex. #2: Multilayer Tracing • Traceroute is useful, but limited to IP • XTrace (just started) is a generalized version: • operates at multiple layers • follows recursive packet generation (DNS queries, etc.) • can implement policies about when to respond • Requirements: • layer must be able to handle and propagate metadata • module on box to intercept and report on packets

Ex. #3: Distributed Debugging • When bugs occur in operation, it can be extremely difficult to locate and reproduce • We are developing liblog, a log-and-replay debugging tool (early) that is always turned on • Lots of log-and-replay debuggers, ours meets a special set of requirements....(not described here)

app app app liblog liblog liblog Log 1 Log 1 Log 3 Log 2 Log 3 Log 2 app/liblog GDB 1 5 6 GDB console app/liblog 3 4 8 GDB app/liblog 7 9 2 Logging and Replay • Each process logs its execution to a local file • Logs are collected at central location and replayed Node 1 Node 2 Node 3 Replay Node

Extensions • liblog generates too much data • hard to sift through for large systems • Next step: setting global watchpoints and breakpoints • Can specify in terms of general expressions (python) • routing loops, state inconsistencies, etc. • No operational experience yet

Troubleshooting and Debugging • Automated end-user reporting tools would be useful to both users and ISPs • lots of low-hanging fruit • Not clear ISPs will take the lead on troubleshooting • ISPs may not be eager to admit fault • but they should be eager to reduce phonebank expenses • Experience needed with distributed debugger in networking context

Summary • Biggest challenge is to get community talking to each other rather than past each other • Reliability more pressing than functionality • have tools to provide better packet delivery • then considered wider set of failure modes • can handle without IP/router involvement • Troubleshooting should be part of “architecture” • nowhere near coherent yet • looking for basic building blocks

Rethinking the Internet Architecture Process, Architecture, and Troubleshooting

Rethinking the Internet Architecture Process, Architecture, and Troubleshooting

Presentation Transcript

Internet Architecture

Aligning Business Process Architecture and Enterprise Architecture:

Internet Multimedia Architecture

Rethinking enterprise and infrastructure architecture

The Internet Architecture

Internet Architecture and Assumptions

A Case for Rethinking the Internet Architecture: Some Promising Approaches

Future Internet Architecture

Internet Architecture

Internet Architecture

Rethinking the Internet Architecture

Provisioning Services Stream Process Architecture and Advanced Troubleshooting Concepts

Architecture Review The Process

Internet Architecture

Internet Architecture and Assumptions

Information Architecture and Internet

Internet Architecture

Internet architecture and governance

Internet Architecture

Diversified Internet Architecture

Performance and Internet Architecture