890 likes | 1.03k Views
Midterm Review. EE122 Fall 2011 Scott Shenker http:// inst.eecs.berkeley.edu /~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica , Vern Paxson and other colleagues at Princeton and UC Berkeley. Announcements. Available after class I hate these review lectures…. Agenda.
E N D
Midterm Review EE122 Fall 2011 Scott Shenker http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Announcements • Available after class • I hate these review lectures….
Agenda • Finish Web caching • Midterm review
Improving HTTP Performance:Caching • Many clients transfer same information • Generates redundant server and network load • Clients experience unnecessary latency Server Backbone ISP ISP-1 ISP-2 Clients
Improving HTTP Performance:Caching: How • Response header: • Expires – how long it’s safe to cache the resource • No-cache – ignore all caches; always get resource directly from server • If entry has not expired, cache returns it • Otherwise, it issues an if-modified-since • Modifier to GET requests: • If-modified-since – returns “not modified” if resource not modified since specified time
Improving HTTP Performance:Caching: Why • Motive for placing content closer to client: • User gets better response time • Content providers get happier users • Time is money, really! • Network gets reduced load • How well does caching work? • Very well, up to a limit • Large overlap in content • But many unique requests • sound familiar?
Improving HTTP Performance:Caching with Reverse Proxies Cache documents close to server decrease server load • Typically done by content providers • Only works for staticcontent Server Reverse proxies Backbone ISP ISP-1 ISP-2 Clients
Improving HTTP Performance:Caching with Forward Proxies Cache documents close to clients reduce network traffic and decrease latency • Typically done by ISPs or corporate LANs Server Reverse proxies Backbone ISP ISP-1 ISP-2 Forward proxies Clients
Improving HTTP Performance:Caching w/ Content Distribution Networks • Integrate forward and reverse caching functionality • One overlay network (usually) administered by one entity • e.g., Akamai • Provide document caching • Pull: Direct result of clients’ requests • Push: Expectation of high access rate • Also do some processing • Handle dynamic web pages • Transcoding
Improving HTTP Performance:Caching with CDNs (cont.) Server CDN Backbone ISP ISP-1 ISP-2 Forward proxies Clients
Improving HTTP Performance:CDN Example – Akamai • Akamai creates new domain names for each client content provider. • e.g., a128.g.akamai.net • The CDN’s DNS servers are authoritative for the new domains • The client content provider modifies its content so that embedded URLs reference the new domains. • “Akamaize” content • e.g.: http://www.cnn.com/image-of-the-day.gif becomes http://a128.g.akamai.net/image-of-the-day.gif • Requests now sent to CDN’s infrastructure…
Hosting: Multiple Sites Per Machine • Multiple Web sites on a single machine • Hosting company runs the Web server on behalf of multiple sites (e.g., www.foo.com and www.bar.com) • Problem: GET /index.html • www.foo.com/index.html or www.bar.com/index.html? • Solutions: • Multiple server processes on the same machine • Have a separate IP address (or port) for each server • Include site name in HTTP request • Single Web server process with a single IP address • Client includes “Host” header (e.g.,Host: www.foo.com) • Required header with HTTP/1.1
Hosting: Multiple Machines Per Site • Replicate popular Web site across many machines • Helps to handle the load • Places content closer to clients • Helps when content isn’t cacheable • Problem: Want to direct client to particular replica • Balance load across server replicas • Pair clients with nearby servers
Multi-Hosting at Single Location • Single IP address, multiple machines • Run multiple machines behind a single IP address • Ensure all packets from a single TCP connection go to the same replica Load Balancer 64.236.16.20
Multi-Hosting at Several Locations • Multiple addresses, multiple machines • Same name but different addresses for all of the replicas • Configure DNS server to return different addresses 12.1.1.1 64.236.16.20 Internet 173.72.54.131
My General Philosophy on Tests • I am not a sadist • I am not a masochist • For those of you who only read the slides at home: • If you don’t attend lectures, then it is your own damn fault if you missed something…. • I believe in testing your understanding of the basics, not tripping you up on tiny details or making you calculate pi to 15 decimal places
General Guidelines • Know the basics well, rather than focus on details • Study lecture notes and problem sets • Remember: you can use a crib sheet…..10pt font • Read text only for general context and to learn certain details • Just because I didn’t cover it in review doesn’t mean you don’t need to know it! • Get plenty of sleep
Things You Don’t Need to Know • The details of how to fragment packets • The details of any protocol header • Know semantics, but not syntax • Any details of DNS, HTTP (thank Ganesh) • Just knowthat when you access a web page, you do a DNS request and then an HTTP request • DNS request, DNS reply, SYN, SYNACK, ACK, HTTP Request, HTTP Reply, FIN, FINACK, ACK
First half of course: Basics • General background (3 lectures) • Basic design principles • Idealized view of network (4 lectures) • Routing • Reliability • Making this vision real (5 lectures) • IP, TCP, DNS, Web • Emphasize concepts, but deal with unpleasant realities
Overview of the Internet • The Internet is a large complicated system that must meet an unprecedented variety of challenges • Scale, dynamic range, diversity, ad hoc, failures, asynchrony, malice, and greed • An amazing feat of engineering • Went against the conventional wisdom • Created a new networking paradigm • In hindsight, some aspects of design are terrible • Will revisit when we do the clean slate design • But enormity of genius far outweighs the oversights
Internet’s Five Basic Design Decisions • Packet-switching • Best-effort service model • A single internetworking layer • Layering • The end-to-end principle (and fate-sharing)
Packet-Switching vs. Circuit-Switching • Reliability advantage: since routers don’t know about individual conversations, when a router or link fails, it iseasy to fail over to a different path • Efficiency advantage of packet-switching over circuit switching: Exploitation of statistical multiplexing • Deployabilityadvantage: easier for different parties to link their networks together because they’re not promising to reserve resources for one another • Disadvantage: packet-switching must handle congestion • More complex routers (more buffering, sophisticated dropping) • Harder to provide good network services (e.g., delay and bandwidth guarantees)
What service should Internet support? • Strict delay bounds? • Some applications require them • Guaranteed delivery? • Some applications are sensitive to packet drops • No applications mind getting good service • Why not require Internet support these guarantees?
Important life lessons • People (applications) don’t always need what they think they need • People (applications) don’t always need what we think they need • Flexibility often more important than performance • But typically only in hindsight! • Example: cell phones vs landlines • Architect for flexibility, engineer for performance
Applying lessons to Internet • Requiring performance guarantees would limit variety of networks that could attach to Internet • Many applications don’t need these guarantees • And those that do? • Well, they don’t either (usually) • Tremendous ability to mask drops, delays • And ISPs can work hard to deliver good service without changing the architecture
Kahn’s Rules for Interconnection • Each network is independent and must not be required to change (why?) • Best-effort communication (why?) • Boxes (routers) connect networks • No global control at operations level (why?)
Tasks in Networking (bottom up) • Electrons on wire • Bits on wire • Packets on wire • Deliver packets across local network • Local addresses • Deliver packets across country • Global addresses • Ensure that packets get there • Do something with the data
Resulting Layers • Electrons on wire (contained in next layer) • Bits on wire (Physical) • Packets on wire (contained in next layer) • Deliver packets across local network (Link) • Local addresses • Deliver packets across country (Internetwork) • Global addresses • Ensure that packets get there (Transport) • Do something with the data (Application)
Decisions and Their Principles • How to break system into modules • Dictated by Layering • Where modules are implemented • Dictated by End-to-End Principle • Where state is stored • Dictated by Fate-Sharing
Application Application Who Does What? • Five layers • Lower three layers implemented everywhere • Top two layers implemented only at hosts • What is top layer of router doing? Transport Transport Network Network Network Datalink Datalink Datalink Physical Physical Physical Host A Router Host B What about switches?
User A User B Appl: Get index.html Trans: Connection ID Net: Source/Dest Link: Src/Dest Layer Encapsulation Common case: 20 bytes TCP header + 20 bytes IP header + 14 bytes Ethernet header = 54 bytes overhead
General Rules of System Design • System not scalable? • Add hierarchy • DNS, IP addressing • System not flexible? • Add layer of indirection • DNS names (rather than using IP addresses as names) • System not performing well? • Add caches • Web and DNS caching
The Paradox of Internet Traffic • The majority of flows are short • A few packets • The majority of bytes are in long flows • MB or more • And this trend is accelerating…
A Common Pattern….. • Distributions of various metrics (file lengths, access patterns, etc.) often have two properties: • Large fraction of total metric in the top 10% • Sizable fraction (~10%) of total fraction in low values • Not an exponential distribution • Large fraction is in top 10% • But low values have very little of overall total • Lesson: have to pay attention to both ends of dist.
“Valid” Routing State • Global routing state is “valid” if it produces forwarding decisions that always deliver packets to their destinations • Valid is my terminology, not standard • Goal of routing protocols: compute valid state • But how can you tell if routing state if valid?
Necessary and Sufficient Condition • Global routing state is valid if and only if: • There are no dead ends (other than destination) • There are no loops
How Can You Avoid Loops? • Restrict topology to spanning tree • If the topology has no loops, packets can’t loop! • Computation over entire graph • Can make sure no loops • Link-State • Minimizing metric in distributed computation • Loops are never the solution to a minimization problem • Distance vector • Won’t review LS/DV, but will review learning switch
Easiest Way to Avoid Loops • Use a topology where loops are impossible! • Take arbitrary topology • Build spanning tree (algorithm covered later) • Ignore all other links (as before) • Only one path to destinations on spanning trees • Use “learning switches” to discover these paths • No need to compute routes, just observe them
Flooding on a Spanning Tree • If you want to send a packet that will reach all nodes, then switches can use the following rule: • Ignoring all ports not on spanning tree! • Originating switch sends “flood” packet out all ports • When a “flood” packet arrives on one incoming port, send it out all other ports • This works because the lack of loops prevents the flooding from cycling back on itself • Eventually all nodes will be covered, exactly once
This Enables Learning! • There is only one path from source to destination • Each switch can learn how to reach a another node by remembering where its flooding packets came from! • If flood packet from Node A entered switch from port 4, then to reach Node A, switch sends packets out port 4
Learning from Flood Packets • Node A can be reached • through this port • Node A can be reached • through this port Node A Once a node has sent a flood message, all other switches know how to reach it….
Self-Learning Switch When a packet arrives • Inspect sourceID, associate with incoming port • Store mapping in the switch table • Use time-to-live field to eventually forget mapping Packet tells switch how to reach A. B A C D