890 likes | 898 Views
Learn about the benefits of caching in HTTP, including reverse proxies, forward proxies, and Content Distribution Networks (CDNs) like Akamai. Explore various methods of hosting multiple sites per machine and load balancing across server replicas.
E N D
Midterm Review EE122 Fall 2011 Scott Shenker http://inst.eecs.berkeley.edu/~ee122/ Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Announcements • Available after class • I hate these review lectures….
Agenda • Finish Web caching • Midterm review
Improving HTTP Performance:Caching • Many clients transfer same information • Generates redundant server and network load • Clients experience unnecessary latency Server Backbone ISP ISP-1 ISP-2 Clients
Improving HTTP Performance:Caching: How • Response header: • Expires – how long it’s safe to cache the resource • No-cache – ignore all caches; always get resource directly from server • If entry has not expired, cache returns it • Otherwise, it issues an if-modified-since • Modifier to GET requests: • If-modified-since – returns “not modified” if resource not modified since specified time
Improving HTTP Performance:Caching: Why • Motive for placing content closer to client: • User gets better response time • Content providers get happier users • Time is money, really! • Network gets reduced load • How well does caching work? • Very well, up to a limit • Large overlap in content • But many unique requests • sound familiar?
Improving HTTP Performance:Caching with Reverse Proxies Cache documents close to server decrease server load • Typically done by content providers • Only works for staticcontent Server Reverse proxies Backbone ISP ISP-1 ISP-2 Clients
Improving HTTP Performance:Caching with Forward Proxies Cache documents close to clients reduce network traffic and decrease latency • Typically done by ISPs or corporate LANs Server Reverse proxies Backbone ISP ISP-1 ISP-2 Forward proxies Clients
Improving HTTP Performance:Caching w/ Content Distribution Networks • Integrate forward and reverse caching functionality • One overlay network (usually) administered by one entity • e.g., Akamai • Provide document caching • Pull: Direct result of clients’ requests • Push: Expectation of high access rate • Also do some processing • Handle dynamic web pages • Transcoding
Improving HTTP Performance:Caching with CDNs (cont.) Server CDN Backbone ISP ISP-1 ISP-2 Forward proxies Clients
Improving HTTP Performance:CDN Example – Akamai • Akamai creates new domain names for each client content provider. • e.g., a128.g.akamai.net • The CDN’s DNS servers are authoritative for the new domains • The client content provider modifies its content so that embedded URLs reference the new domains. • “Akamaize” content • e.g.: http://www.cnn.com/image-of-the-day.gif becomes http://a128.g.akamai.net/image-of-the-day.gif • Requests now sent to CDN’s infrastructure…
Hosting: Multiple Sites Per Machine • Multiple Web sites on a single machine • Hosting company runs the Web server on behalf of multiple sites (e.g., www.foo.com and www.bar.com) • Problem: GET /index.html • www.foo.com/index.html or www.bar.com/index.html? • Solutions: • Multiple server processes on the same machine • Have a separate IP address (or port) for each server • Include site name in HTTP request • Single Web server process with a single IP address • Client includes “Host” header (e.g.,Host: www.foo.com) • Required header with HTTP/1.1
Hosting: Multiple Machines Per Site • Replicate popular Web site across many machines • Helps to handle the load • Places content closer to clients • Helps when content isn’t cacheable • Problem: Want to direct client to particular replica • Balance load across server replicas • Pair clients with nearby servers
Multi-Hosting at Single Location • Single IP address, multiple machines • Run multiple machines behind a single IP address • Ensure all packets from a single TCP connection go to the same replica Load Balancer 64.236.16.20
Multi-Hosting at Several Locations • Multiple addresses, multiple machines • Same name but different addresses for all of the replicas • Configure DNS server to return different addresses 12.1.1.1 64.236.16.20 Internet 173.72.54.131
My General Philosophy on Tests • I am not a sadist • I am not a masochist • For those of you who only read the slides at home: • If you don’t attend lectures, then it is your own damn fault if you missed something…. • I believe in testing your understanding of the basics, not tripping you up on tiny details or making you calculate pi to 15 decimal places
General Guidelines • Know the basics well, rather than focus on details • Study lecture notes and problem sets • Remember: you can use a crib sheet…..10pt font • Read text only for general context and to learn certain details • Just because I didn’t cover it in review doesn’t mean you don’t need to know it! • Get plenty of sleep
Things You Don’t Need to Know • The details of how to fragment packets • The details of any protocol header • Know semantics, but not syntax • Any details of DNS, HTTP (thank Ganesh) • Just knowthat when you access a web page, you do a DNS request and then an HTTP request • DNS request, DNS reply, SYN, SYNACK, ACK, HTTP Request, HTTP Reply, FIN, FINACK, ACK
First half of course: Basics • General background (3 lectures) • Basic design principles • Idealized view of network (4 lectures) • Routing • Reliability • Making this vision real (5 lectures) • IP, TCP, DNS, Web • Emphasize concepts, but deal with unpleasant realities
Overview of the Internet • The Internet is a large complicated system that must meet an unprecedented variety of challenges • Scale, dynamic range, diversity, ad hoc, failures, asynchrony, malice, and greed • An amazing feat of engineering • Went against the conventional wisdom • Created a new networking paradigm • In hindsight, some aspects of design are terrible • Will revisit when we do the clean slate design • But enormity of genius far outweighs the oversights
Internet’s Five Basic Design Decisions • Packet-switching • Best-effort service model • A single internetworking layer • Layering • The end-to-end principle (and fate-sharing)
Packet-Switching vs. Circuit-Switching • Reliability advantage: since routers don’t know about individual conversations, when a router or link fails, it iseasy to fail over to a different path • Efficiency advantage of packet-switching over circuit switching: Exploitation of statistical multiplexing • Deployabilityadvantage: easier for different parties to link their networks together because they’re not promising to reserve resources for one another • Disadvantage: packet-switching must handle congestion • More complex routers (more buffering, sophisticated dropping) • Harder to provide good network services (e.g., delay and bandwidth guarantees)
What service should Internet support? • Strict delay bounds? • Some applications require them • Guaranteed delivery? • Some applications are sensitive to packet drops • No applications mind getting good service • Why not require Internet support these guarantees?
Important life lessons • People (applications) don’t always need what they think they need • People (applications) don’t always need what we think they need • Flexibility often more important than performance • But typically only in hindsight! • Example: cell phones vs landlines • Architect for flexibility, engineer for performance
Applying lessons to Internet • Requiring performance guarantees would limit variety of networks that could attach to Internet • Many applications don’t need these guarantees • And those that do? • Well, they don’t either (usually) • Tremendous ability to mask drops, delays • And ISPs can work hard to deliver good service without changing the architecture
Kahn’s Rules for Interconnection • Each network is independent and must not be required to change (why?) • Best-effort communication (why?) • Boxes (routers) connect networks • No global control at operations level (why?)
Tasks in Networking (bottom up) • Electrons on wire • Bits on wire • Packets on wire • Deliver packets across local network • Local addresses • Deliver packets across country • Global addresses • Ensure that packets get there • Do something with the data
Resulting Layers • Electrons on wire (contained in next layer) • Bits on wire (Physical) • Packets on wire (contained in next layer) • Deliver packets across local network (Link) • Local addresses • Deliver packets across country (Internetwork) • Global addresses • Ensure that packets get there (Transport) • Do something with the data (Application)
Decisions and Their Principles • How to break system into modules • Dictated by Layering • Where modules are implemented • Dictated by End-to-End Principle • Where state is stored • Dictated by Fate-Sharing
Application Application Who Does What? • Five layers • Lower three layers implemented everywhere • Top two layers implemented only at hosts • What is top layer of router doing? Transport Transport Network Network Network Datalink Datalink Datalink Physical Physical Physical Host A Router Host B What about switches?
User A User B Appl: Get index.html Trans: Connection ID Net: Source/Dest Link: Src/Dest Layer Encapsulation Common case: 20 bytes TCP header + 20 bytes IP header + 14 bytes Ethernet header = 54 bytes overhead
General Rules of System Design • System not scalable? • Add hierarchy • DNS, IP addressing • System not flexible? • Add layer of indirection • DNS names (rather than using IP addresses as names) • System not performing well? • Add caches • Web and DNS caching
The Paradox of Internet Traffic • The majority of flows are short • A few packets • The majority of bytes are in long flows • MB or more • And this trend is accelerating…
A Common Pattern….. • Distributions of various metrics (file lengths, access patterns, etc.) often have two properties: • Large fraction of total metric in the top 10% • Sizable fraction (~10%) of total fraction in low values • Not an exponential distribution • Large fraction is in top 10% • But low values have very little of overall total • Lesson: have to pay attention to both ends of dist.
“Valid” Routing State • Global routing state is “valid” if it produces forwarding decisions that always deliver packets to their destinations • Valid is my terminology, not standard • Goal of routing protocols: compute valid state • But how can you tell if routing state if valid?
Necessary and Sufficient Condition • Global routing state is valid if and only if: • There are no dead ends (other than destination) • There are no loops
How Can You Avoid Loops? • Restrict topology to spanning tree • If the topology has no loops, packets can’t loop! • Computation over entire graph • Can make sure no loops • Link-State • Minimizing metric in distributed computation • Loops are never the solution to a minimization problem • Distance vector • Won’t review LS/DV, but will review learning switch
Easiest Way to Avoid Loops • Use a topology where loops are impossible! • Take arbitrary topology • Build spanning tree (algorithm covered later) • Ignore all other links (as before) • Only one path to destinations on spanning trees • Use “learning switches” to discover these paths • No need to compute routes, just observe them
Flooding on a Spanning Tree • If you want to send a packet that will reach all nodes, then switches can use the following rule: • Ignoring all ports not on spanning tree! • Originating switch sends “flood” packet out all ports • When a “flood” packet arrives on one incoming port, send it out all other ports • This works because the lack of loops prevents the flooding from cycling back on itself • Eventually all nodes will be covered, exactly once
This Enables Learning! • There is only one path from source to destination • Each switch can learn how to reach a another node by remembering where its flooding packets came from! • If flood packet from Node A entered switch from port 4, then to reach Node A, switch sends packets out port 4
Learning from Flood Packets • Node A can be reached • through this port • Node A can be reached • through this port Node A Once a node has sent a flood message, all other switches know how to reach it….
Self-Learning Switch When a packet arrives • Inspect sourceID, associate with incoming port • Store mapping in the switch table • Use time-to-live field to eventually forget mapping Packet tells switch how to reach A. B A C D