430 likes | 478 Views
Web Servers. Pre-lecture Survey: What is the #1 web server:. Apache Google MS IIS HTTP server nginx Sun Other. http://en.wikipedia.org/wiki/Web_servers. Generic Overview. Web Servers. A web server can be a: Computer Program
E N D
Pre-lecture Survey: What is the #1 web server: • Apache • Google • MS IIS HTTP server • nginx • Sun • Other
http://en.wikipedia.org/wiki/Web_servers Generic Overview
Web Servers • A web server can be a: • Computer Program • Responsible for accepting HTTP requests from clients (web browsers) • Returns HTTP responses with optional data contents • Usually web pages • HTML documents • Linked objects (images, etc.). • Computer • Running a computer program which provides the above functionality
Common Features • HTTP • Accepts HTTP requests from a client • Provides HTTP responses to the client • Typical “HTML” document can be: • File containing HTML statements • Raw text file • Image • Some other type of document • defined by MIME-types • In case of an error in a client request or trying to service the request: • Web server sends an error response • May include custom HTML • May have text messages • Better explain the problem to end user
Common Features • Logging • Web servers keep detailed information to log files • Client requests • Server responses • Allows the Webmaster to collect data • Running log analyzers
Additional Available Features • Authentication • Optional authorization before allowing access to some or all resources • Requires a user name and password • Handles: • Static content • Dynamic content • Support one or more related interfaces • SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc.
Additional Available Features • HTTPS support • VIA SSL or TLS • Allows secure (encrypted) connections • Uses port 443 instead of port 80 • Content compression • I.e. by gzip encoding • Reduces the size of the responses • Lower bandwidth usage, etc.
Additional Available Features • Virtual hosting • Serve many web sites using one IP address • Large file support • Serve files greater than 2 GB • Typical 32 bit OS restriction • Bandwidth throttling • Limit the speed of responses • Do not saturate the network • Able to serve more clients
Where does the requested material come from? Origin of the returned content
Content Origin • Origin of the returned content may be: • Static • Pre-existing data file • Content changes only if manually edited • Contents loaded on request • Dynamic • Content generated by another program • Script (programming language) • Creates/retrieves the requested information • Static content is usually delivered much faster than dynamic content • 2 to 100 times • Especially if the latter involves data pulled from a database
How does it find it? Path translation
Path translation • Web servers map the path component of a Uniform Resource Locator (URL) into: • Local file system resource • Static requests • Internal or external program name • Dynamic requests • For a static request the URL path specified by the client is relative to the Web server's root directory • This is not the same as the computers root directory
Path translation • Consider the following URL requested by a client Web Browser: • http://www.example.com/path/file.html • Client's Web browser translates it: • Where • http:// • Use the HTTP protocol • www.example.com • The Web server to connect to • This is translated to an IP address by DNS • Sent to 93:184.216.119:80 • Note port 80 is usually implicit • /path/file.html • The resource to access • Generates the following HTTP 1.1 request sent to the IP address: • GET /path/file.html HTTP/1.1Host: www.example.com
Path translation (cont.) • Web server host (www.example.com) • Sees the request is for port 80 • Sends request to the Web Server software • Appends the given path/file to the path of the servers Web root directory • Linux Apache typical roots: • /var/www/htdocs • /var/www • /var/www/html • Result would then be the local file system resource: • /var/www/htdocs/path/file.html • /var/www/path/file.html • /var/www/html/path/file.html • Web server: • Retrieves the file, if it exists • Processes it by the Web servers rules • Sends a response to the client's web browser • Response: • Describes the content of the returned data/file • Contains the data requested –or- a response
Performance • Web servers must: • Serve requests quickly! • From more than one TCP/IP connection at a time • Some main key performance parameters are: • number of requests per second • depends on the type of request, etc. • latency response time in milliseconds • for each new connection or request • throughput in bytes per second • Depends on • File size • Content cached or not • Available network bandwidth • etc. • concurrency level • How does a server respond to multiple client requests
Performance • Measured under: • Varying load of clients • Varying requests per client • Performance parameters may vary noticeably depending on the number of active connections • Specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware • many CPUs, disks, etc.
Load limits • Web servers have load limits • Can be set in a configuration file • Can handle only a limited number of concurrent client connections per IP address (and IP ports) • Usually between 2 and 60,000 • Default between 500 and 1,000 • Can serve only a certain maximum number of requests per second depending on: • Settings • HTTP request type • Content origin • Static • Dynamic • Served content cached or not • Hardware and software limits of the native OS • A web server near or over its limits • Becomes overloaded • Unresponsive
Overload causes • A sample daily graph of a web server's load, indicating a spike in the load early in the day.
Overload causes • Web servers may be overloaded because of: • Too much legitimate web traffic • Thousands or even millions of clients hitting the web site in a short interval of time • DDoS • Distributed Denial of Service attacks • Coordinated • Computer worms • Abnormal traffic because of millions of infected computers • Not coordinated • XSS viruses • Millions of infected browsers and/or web servers • Internet web robots • Traffic not filtered / limited on large web sites with very few resources (bandwidth, etc.) • Internet (network) slowdowns • Client requests are served more slowly and the number of connections increases so much that server limits are reached • Web servers (computers) partial unavailability • Required / urgent maintenance or upgrade • HW or SW failures • Back-end (i.e. DB) failures, etc. • Remaining web servers get too much traffic and they become overloaded
Overload symptoms • Symptoms of an overloaded web server include: • Requests are served with (possibly long) delays • from 1 second to a few hundred seconds • 500, 502, 503, 504 HTTP errors returned to clients • Sometimes also unrelated 404 error or even 408 error may be returned • TCP connections are refused or reset (interrupted) before any content is sent to clients • In very rare cases, only partial contents are sent • This behavior may well be considered a bug • Even if it stems from unavailable system resources
Anti-overload techniques • To partially overcome load limits and to prevent overload use techniques like: • Managing network traffic by using: • Firewalls • Block unwanted traffic • Bad IP sources • Bad patterns • HTTP traffic managers • Drop, redirect or rewrite requests having bad HTTP patterns • Bandwidth management and traffic shaping • Smooth down peaks in network usage • Deploying web cache techniques • Use different domains to serve different content (static and dynamic) by separate Web servers, i.e.: • http://images.example.com • Serves static images • http://www.example.com • Serves dynamic data requests
Anti-overload techniques • Techniques continued: • Use different domain names and/or computers to separate big files from small/medium files • Be able to fully cache small and medium sized files • Efficiently serve big or huge (over 10 - 1000 MB) files by using different settings • Using many Web servers (programs) per computer • Each bound to its own network card and IP address • Use many Web servers that are grouped together • Act or are seen as one big Web server • See Load balancer
Anti-overload techniques • Techniques continued: • Add more hardware resources • RAM, disks, NICs, etc. • Tune OS parameters • Hardware capabilities • Usage • Use more efficient computer programs for web servers, etc. • nginx • Use workarounds • Specially if dynamic content is involved
Historical notes • World's first web server • 1989 - Tim Berners-Lee proposed to CERN a new project • Ease the exchange of information between scientists • Using a hypertext system • 1990 - Berners-Lee wrote two programs: • Browser • WorldWideWeb • Web server • Ran on NeXTSTEP
Historical notes • First web server in USA • Installed December 12, 1991 • Bebo White at SLAC • After returning from a sabbatical at CERN • Between 1991 and 1994: • Simplicity and effectiveness of early technologies used to surf and exchange data through the World Wide Web helped to: • Port them to many different operating systems • Spread their use among lots of different social groups of people • First in scientific organizations • Then in universities • Finally in industry
Historical notes • 1994: Tim Berners-Lee constituted the World Wide Web Consortium (W3C) • Regulate the further development of the many technologies in a standardization process: • HTTP • HTML • etc. • Following years saw an exponential growth of the number of web sites and servers
Software • There are thousands of different web server programs available • Many specialized for very specific purposes • About 50 mainstream • The fact that a web server is not very popular does not necessarily mean • Lot of bugs • Poor performance • See Category:Web server software for a longer list of HTTP server programs.
Statistics • Most popular web servers, used for public web sites, are tracked by • Netcraft.com • Details given by • Netcraft Web Server Reports • According to this site: • Apache has been the most popular web server on the Internet since April of 1996 • July 2010 Netcraft Web Server Survey: • 54.90% web sites on the Internet use Apache • 25.87% web sites use IIS
Post-survey: What is the #1 web server: • Apache • Google • MS IIS HTTP server • nginx • Sun
Summary • Concentrated on HTTP servers • Apache and IIS are the main web serving tools • nginx is rising fast • Apache/Microsoft battling • Apache currently declining • IIS currently up • Usage tracked • Netcraft Web Server Survey