730 likes | 914 Views
Advanced Internet and Web Systems. C. Edward Chow. Outline of the Talk. Syllabus Introduction to WWW Systems Survey of Web Cluster Systems Survey of Caching Techniques Server Selection and Load Balancing. Introduction to WWW Systems. Web Server Hosting web pages. Retrieving web pages
E N D
Advanced Internet and Web Systems C. Edward Chow
Outline of the Talk • Syllabus • Introduction to WWW Systems • Survey of Web Cluster Systems • Survey of Caching Techniques • Server Selection and Load Balancing chow
Introduction to WWW Systems Web Server Hosting web pages Retrieving web pages using HTTP protocol Web Authoring System create web pages Internet Web Client Browser Publish web pages Scanner Video capture Sound card Web page: document written in HTML chow
What is Unique in WWW? • Hyperlink: Use Hypertext Markup Language HTML to describe the document in ASCII text (extended to iso-8859-1) • Naming scheme: Name object in the web with Universal Resource Locator (URL) with syntax:protocol://domain_name/<uri or path name> • HTTP: HyperText Transfer Protocola simple request-response protocol for transferring HTML documents • ASCII text based (not binary, therefore easy to debug) chow
Web Authoring System • Text Editor: type in HTML <tag> and content • HTML Editor: like normal word processor, user did not have know a lot about HTML syntax, e.g., • Netscape Page Composer, MS Front Page • Front Page takes another step by providing templates and hyperlink management functions • Dreamweaver allows site management (upload/download); editor understands PHP, XSLT, XML, CSS, JavaScript syntax. • Most desktop publishing software and word processor have built-in converters to convert from their internal format to HTML format. For example • FrameMaker, Office2007 chow
Web Delivery Systems • Delivery web documents efficiently and reliably to the web clients. • Content Distribution and Content Delivery • Performance is decided by • Web server performance • Network path performance • Client browser performance. • Use multiple physical servers (server farm), and multiple server farms in wide area. • New generation of proxy servers/content switches emerge. chow
Host Server Sprint UUnet Gloobix QWest Clients Content Delivery Network (CDN) Slow Response Huge Requests @Home Clients PSINet Server Crash MindSpring Clients chow
Content Delivery Problems http://www.akamai.com chow
Host Server QWest Use Client Cache/Client Side Cache Server Fewer Requests Clients @Home PSINet Fast Response Sprint UUnet Client Cache Gloobix MindSpring Client Side Cache Server Clients Clients chow
Fewer Requests Host Server UUnet Sprint Server Gloobix QWest MirrorSite Use Mirror Sites Need improvement by guiding the selection of mirror servers with server load/network bandwidth measurement Mirror Site Clients @Home PSINet Clients MindSpring Fast Response Clients chow
Cache Server Cache Server Cache Server Cache Server Fewer Requests Host Server Sprint UUnet Server Gloobix QWest MirrorSite Mirror Site Edge Network Cache Servers Fast Response Clients @Home PSINet Client Cache MindSpring Edge Network Cache Server Client Side Cache Server Clients Clients chow
Architecture solutions for scalable Web-server systems (Fig. 1) chow
Fig. 2. Model architecture for a locally distributed Web system chow
Content Distribution • Secure, automate content/application distribution to single (multiple server)/wide area Internet sites. • Provide replication, synchronization, staged rollout and roll back. • With revision control, transmit only updates. • User-defined file distribution profiles/rules chow
Content Delivery Problem • Cache Location Problem: Where to put cache servers? • How many are needed? • When/where/how to push/delivery the content? • How about dynamic content? chow
Akamai Edge Delivery Service • Peering Bottleneck Problem: Access traffic evenly spread over 7400+ networks (no one over 5%; most << 1%) Need to put edge servers in many networks. • Akamai delivers between 10-20% Internet traffic, 10B interactions/day. • 1 hop to 85% of the world’s Internet users. • http://www.akamai.com/html/technology/nocc.html • http://www.akamai.com/html/technology/medium_res.asx chow
Site II losangeles.domain.com Internet Internet Site I newyork.domain.com Router 3-DNS BIG-IP BIG-IP Local DNS GLOBAL-SITE Webmaster Site III tokyo.domain.com Server Array User london.domain.com F5 Web System Product chow
BIG/ip - Delivers High Availability • E-commerce - ensures sites are not only up-and-running, but taking orders • Fault-tolerance - eliminates single points of failure • Content Availability - verifies servers are responding with the correct content • Directory & Authentication - load balance multiple directory and/or authentication services (LDAP, Radius, and NDS) • Portals/Search Engines – Using EAV administrators perform key-word searches • Legacy Systems - Load balance services to multiple interactive services • Gateways – Load balance gateways (SAA, SNA, etc.) • E-mail (POP, IMAP, SendMail) - Balances traffic across a large number of mail servers chow
3DNS Intelligent Load Balancing • Intelligent Load Balancing • QoS Load Balancing • Quality of Service load balancing is the ability to select apply different load balancing methods for different users or request types • Modes of Load Balancing • Round Robin Ratio • Least Connections Random • User-defined Quality-of-Service Round Trip Time • Completion Rate (Packet Loss) BIG/ip Packet Rate • Global Availability HOPS • Topology Distribution Access Control • LDNS Round Robin Dynamic Ratio • E-Commerce chow
GLOBAL-SITE Replicate Multiple Servers and Sites • File archiving engine and scheduler for automated site and server replication • BIG-IP controls server availability during replication and synchronization • Gracefully shutdown for update • update in group/scheduled manner • FTP provides transferring files from GLOBAL-SITE to target servers (agent free, scalable) • RCE for source control • No client side software • Complete, turnkey system (appliance)(adapt from F5 presentation) chow
Intel NetStructure • Routing based on XML tag (e.g., given preferred treatment for buyers, large volume) • http://www.intel.com/network/solutions/xml.htm chow
Simple Web Access Example: Step1 • Someone requests a document using a browser (Web Client) on a computer connected to Internet • On a browser window Type in a URL, http://news.netcraft.com/archives/web_server_survey.html • Equivalent of %telnet www.netcraft.co.uk 80 > outGET /survey/ HTTP/1.0<cr><cr> • Here <cr> is “carriage return” entered by pressing “enter”key • The browser parses the URL, • obtains domain name of url, www.netcraft.co.uk • asks Domain Name Server (DNS) for translating the domain name to the IP address • with IP address the client computer set up a HTTP connection to the server chow
Computer Network Local Area Network (LAN): a private-owned network within a single building or campus of up to a few kilometer in size (Tanenbaum). Wide Area Network (WAN): a network that spans a large geographical area, often a country or continent, and connects LANs or MANs. It consists of transmission line (called circuits, channels, or trunks) and switching elements (called switching nodes, data switching exchanges or router). web client web server DNS DNS chow
Protocol and Protocol Layer • A set of rules for achieving a global objective exercised by geographically distributed nodes. (Robert Gallager, Prof. EE MIT) chow
Simple Web Access Example: Step2 Browser sends the following character string to serverGET /survey/ HTTP/1.0User-agent: Mosaic for X windows/2.4Accept: text/plainAccept: text/htmlAccept: image/* httpd server • parses the request according to HTTP protocol 1.0 • interprets rest of the metainfo for browser capabilities • Maps the /survey/ to c:/InetPub/wwwroot/survey/default.htma file path in its file system according to server configuration. • retrieves c:/InetPub/wwwroot/survey/default.htm or index.html • sends information back using HTTP/1.0 format chow
Simple Web Access Example: Step3 • Server replies information using HTTP/1.0 format HTTP/1.0 200 Document follows Date: Tue, 19 Jan 1999 18:10:20 GMT Server: NCSA/1.5 Content-type: text/html <html> <head><title>Netcraft Web Server Survey</title></head> • Server close file, set certain timeout and wait for next subsequent requests, such as images/midi files referenced in the web page. (called keep-alive connection). When time expires, disconnect the connection. chow
Simple Web Access Example: Step3a • Browser send GET /sample.htm HTTP/1.0 • Server replies HTTP/1.0 404 Object Not Found Content-Type: text/html <body><h1>HTTP/1.0 404 Object Not Found </h1></body> • Server close file, network connection, wait for next request chow
Simple Web Access Example: Step4 • Browser receives http response, a web document with HTML tags, from the server. • Browser parses/processes the HTML document, display the document content according the tags. • When other images/audio/video data are referenced by <img> <object> <applet> tags, the browser initiates the retrieval of those data. • Some of them will http requests to the same web servers. That is the reason why keep-alive connection improves the web server throughput. • A URL request may trigger many http requests to several web servers. chow
HTTP • HTTP1.0/1.1http://www.w3.org/Protocols/rfc2068/rfc2068 • A HTTP request consists of • method: GET, HEAD, POST, PUT, DELETE, • Universal Resource Identifier (URI) • Protocol version • other info to modify or supplement the request • If-Modified-Since: (only return object if it is newer the date • authorization: (user password or other authentication as required) • accept: application/postscript chow
HTTP Response • consists of • status line (success or failure) HTTP/1.1 400 Bad Request200 (Document Follow), 301 (Move Permanently), 302 (Move Temporarily), 304 (Not Modified), 401 (Unauthorized), 402 (payment required), 403 (Forbidden), 404 (Not Found), 500 (server error) • description of the information (metaheader) • Server, Date, Content-Length, Content-Type, Content-Encoded, Last Modified • actual info requested chow
Content-Type: MIME Type MIME Type File Extension text/plain txt, default (most server) text/html htm, html application/postscript ps application/ms-powerpoint ppt application/x-javascript js image/gif gif image/jpeg jpg audio/midi mid video/mpeg mpg x-world/x-vrml wrl chow
Configure MIME Types • For supporting new mime types, both web server and web client may need to be reconfigured. For web server, • Include new mime.type definition in the mime.types file of the configuration directory of the web server • By default, most servers deliver unknown type as text/plainbrowser then may display them as “gibberish” • Restart the web server For web client, • Specify external viewer associated with the mime type • Or, install the plug-in associate with the mime type chow
Brief Survey of Web Servers • http://www.w3c.org/hypertext/WWW/Servers.html • Jigsaw, http://www.w3c.org/Jigsaw/ • http://java.sun.com/products/java-servers/ • http://www.yahoo.com/computers_and_Internet/Internet/World_Wide_Web/HTTP/Servers • http://www.netcraft.co.uk/Survey/ • “Web Server Technologies” by Nancy J. Yeager and Robert E. McGrath, Morgan Kaufmann 1996. chow
CGI Script Example • Client type http://owl.uccs.edu/cgi-bin/chow/uptime.pl • or click on <A HREF =“http://owl.uccs.edu /cgi-bin/chow /uptime.pl”> Show the load on owl</A> in a web page. • uptime.pl #!/usr/bin/perl $UPTIME = '/usr/ucb/uptime'; select(STDOUT); $| =1; #make output unbufferedprint "Content-type: text/html\n\n"; if (-x $UPTIME) { exec($UPTIME); } else { print "cannot find uptime command on this system.\n"; exit(1); } chow
CGI Script Example (Step 2) • Web browser sends “GET /cgi-bin/chow/uptime.pl HTTP/1.0” to owl.uccs.edu • httpd server at owl parses the request and discovers that a perl script needs to be executed. • It locates the script in the file system. • Create the execution environment • starting a process with appropriate shell environment variable set • with STDIN from httpd program • with STDOUT to httpd chow
CGI Script Example (Step 3) • uptime.pl generates Content-type: text/plain 15:55 up 18 days, 7:15, 5 users, load average: 0.89, 0.81, 0.79 • It was sent over STDOUT back to httpd • httpd add HTTP/1.0 200 OK Server: Netscape-Communications/1.1 Date: Tuesday, 27-Jan-98 23:12:45 GMT • httpd relays the text string back to the web browser chow
What problems can occur? • How to detect a script running infinite loop? • How to detect a hung script? chow
Handle Multiple Requests • Can’t afford sequential processing, since some requested documents are big. Three basic approaches: 1. Fork a new child process: Cloning a copy of httpd 2. Use multithread (if the OS or language support it)e.g., IIS, Java Web Server, Jigsaw 3. Spread the load among several helper programse.g., Apache • Apache allows the starting , min, max # of child web server processes to be specified in a configuration file. It can dynamically adjust to the load. chow
More than One Web Service on the Same Server Platform • Run different/same httpd programs on different ports http://www.server.org/intro.html (port 80 by default) http://www.server.org:8080/intro.html (port 8080) http://www.server.org:8081/intro.html (port 8081) • They may have different document trees, content, and access control, and serve different user groups (customer, sales, authorized) • Note that running program at any port < 1024 requires root privilege. chow
Virtual Hosting • To allow one server to server requests with multiple IP addresses. • It is a low cost option for clients that want own id and cannot afford a separate machine/connection. • Hosting other domain names on the same machine. • http://www.a.com/home.html • http://www.b.com/home.html • Require OS with virtual host support. • Assign Multiple IP numbers to the same interfaceusing the ifconfig command in UNIX or ipconfig in NT. chow
Assign Multiple IP Address to the Same Interface • On FreeBSD, execute ifconfig ep0 192.168.123.2 ifconfig ep0 192.168.123.3 alias netmask 0XFFFFFFFF ifconfig ep0 192.168.124.1 alias (netmask option is used to suppress error msg) • On Linux, execute ifconfig eth0:0 192.168.123.3 192.168.124.1 you may add # route add -host 192.168.123.3 dev eth0:0 # route add -host 192.168.124.1 dev eth0:0 chow
New Hosting Technique • Set up virtual machines for each customer • Related software packages: • User mode Linux • VMWare ESX and Virtual Center/Infrastructure. • MS VS 2005 • Utility Computing (On-Demand Computing) chow
Improving WWW Delivery Systems • Currently network is bottleneck. • The retrieval of web pages can be improved by • increasing network bandwidth, e.g., ADSL link • reducing round trip, e.g., use client side programming to check data with Java/Javascript • caching (both at client and proxy cache server) • increase # and processing power of web servers • load balancing by partitioning client-server requests chow
to Internet RRDNS DMZ Firewall Router/Firewall Web Server1 Internal Proxy Server Web Server9 HA NFS Server HA NFS Server Router/Firewall To Intranet Web Pages Large Web Sites • Mapping the request, e.g., ftp.netscape.com, evenly across a set of server, e.g., ftp[1-28].netscape.com chow