890 likes | 1.13k Views
Web Servers. Pre-lecture Survey: What is the #1 web server:. Apache Google MS IIS HTTP server nginx Sun Other . http://en.wikipedia.org/wiki/Web_servers. Generic Overview. Web Servers. A web server can be a: Computer Program
E N D
Pre-lecture Survey: What is the #1 web server: • Apache • Google • MS IIS HTTP server • nginx • Sun • Other
http://en.wikipedia.org/wiki/Web_servers Generic Overview
Web Servers • A web server can be a: • Computer Program • Responsible for accepting HTTP requests from clients (web browsers) • Returns HTTP responses with optional data contents • Usually web pages • HTML documents • Linked objects (images, etc.). • Computer • Running a computer program which provides the above functionality
Common Features • HTTP • Accepts HTTP requests from a client • Provides HTTP responses to the client • Typically an “HTML” document can be: • File containing HTML statements • Raw text file • Image • Some other type of document • defined by MIME-types • If an error in a client request or trying to service the request: • Web server sends an error response • May include custom HTML • May have text messages • Better explain the problem to end user
Common Features • Logging • Web servers keep detailed information to log files • Client requests • Server responses • Allows the Webmaster to collect data • Running log analyzers
Additional Features • Authentication • Optional authorization before allowing access to some or all resources • Requires a user name and password • Handles: • Static content • Dynamic content • Support one or more related interfaces • SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc.
Additional Features • HTTPS support • VIA SSL or TLS • Allows secure (encrypted) connections • Uses port 443 instead of port 80 • Content compression • I.e. by gzip encoding • Reduces the size of the responses • Lower bandwidth usage, etc.
Additional Features • Virtual hosting • Serve many web sites using one IP address • Large file support • Serve files greater than 2 GB • Typical 32 bit OS restriction • Bandwidth throttling • Limit the speed of responses • Do not saturate the network • Able to serve more clients
Where does the requested material come from? Origin of the returned content
Content Origin • Origin of the returned content may be: • Static • Pre-existing data file • Contents loaded on request • Dynamic • Content generated by another program • Script (programming language) • Creates/retrieves the requested information • Static content is usually delivered much faster than dynamic content • 2 to 100 times • Especially if the latter involves data pulled from a database
How does it find it? Path translation
Path translation • Web servers map the path component of a Uniform Resource Locator (URL) into: • Local file system resource • Static requests • Internal or external program name • Dynamic requests • For a static request the URL path specified by the client is relative to the Web server's root directory • This is not the same as the computers root directory
Path translation • Consider the following URL requested by a client Web Browser: • http://www.example.com/path/file.html • Client's Web browser translates it: • Where • http:// • Use the HTTP protocol • www.example.com • The Web server to connect to • This is translated to an IP address by DNS • Sent to 93:184.216.119:80 • /path/file.html • The resource to access • Generates the following HTTP 1.1 request sent to the IP address: • GET /path/file.html HTTP/1.1Host: www.example.com
Path translation (cont.) • Web server host (www.example.com) • Sees the request is for port 80 • Sends request to the Web Server software • Appends the given path to the path of the servers Web root directory • On Unix machines typically /var/www/htdocs or /var/www • Result would then be the local file system resource: • /var/www/htdocs/path/file.html • Web server: • Retrieves the file, if it exists • Processes it by the Web servers rules • Sends a response to the client's web browser • Response: • Describes the content of the file • Contains the file requested or a response
Performance • Web servers must: • Serve requests quickly! • From more than one TCP/IP connection at a time • Some main key performance parameters are: • number of requests per second • depends on the type of request, etc. • latency response time in milliseconds • for each new connection or request • throughput in bytes per second • Depends on • File size • Content cached or not • Available network bandwidth • etc. • Measured under: • Varying load of clients • Varying requests per client
Performance • Performance parameters may vary noticeably depending on the number of active connections • Concurrency level • Fourth parameter supported by a web server under a specific configuration • Specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware • many CPUs, disks, etc.
Load limits • Web server (program) has defined load limits • Can handle only a limited number of concurrent client connections per IP address (and IP ports) • Usually between 2 and 60,000 • Default between 500 and 1,000 • Can serve only a certain maximum number of requests per second depending on: • Settings • HTTP request type • Content origin • Static • Dynamic • Served content cached or not • Hardware and software limits of the native OS • A web server near or over its limits • Becomes overloaded • Unresponsive
Overload causes • A sample daily graph of a web server's load, indicating a spike in the load early in the day.
Overload causes • At any time web servers can be overloaded because of: • Too much legitimate web traffic • Thousands or even millions of clients hitting the web site in a short interval of time • DDoS • Distributed Denial of Service attacks • Coordinated • Computer worms • Abnormal traffic because of millions of infected computers • Not coordinated • XSS viruses • Millions of infected browsers and/or web servers • Internet web robots • Traffic not filtered / limited on large web sites with very few resources (bandwidth, etc.) • Internet (network) slowdowns • Client requests are served more slowly and the number of connections increases so much that server limits are reached • Web servers (computers) partial unavailability • Required / urgent maintenance or upgrade • HW or SW failures • Back-end (i.e. DB) failures, etc. • Remaining web servers get too much traffic and they become overloaded
Overload symptoms • Symptoms of an overloaded web server include: • Requests are served with (possibly long) delays • from 1 second to a few hundred seconds • 500, 502, 503, 504 HTTP errors returned to clients • Sometimes also unrelated 404 error or even 408 error may be returned • TCP connections are refused or reset (interrupted) before any content is sent to clients • In very rare cases, only partial contents are sent • This behavior may well be considered a bug • Even if it stems from unavailable system resources
Anti-overload techniques • To partially overcome load limits and to prevent overload use techniques like: • Managing network traffic by using: • Firewalls • Block unwanted traffic • Bad IP sources • Bad patterns • HTTP traffic managers • Drop, redirect or rewrite requests having bad HTTP patterns • Bandwidth management and traffic shaping • Smooth down peaks in network usage • Deploying web cache techniques • Use different domains to serve different content (static and dynamic) by separate Web servers, i.e.: • http://images.example.com • Serves static images • http://www.example.com • Serves dynamic data requests
Anti-overload techniques • Techniques continued: • Use different domain names and/or computers to separate big files from small/medium files • Be able to fully cache small and medium sized files • Efficiently serve big or huge (over 10 - 1000 MB) files by using different settings • Using many Web servers (programs) per computer • Each bound to its own network card and IP address • Use many Web servers that are grouped together • Act or are seen as one big Web server • See Load balancer
Anti-overload techniques • Techniques continued: • Add more hardware resources • RAM, disks, NICs, etc. • Tune OS parameters • Hardware capabilities • Usage • Use more efficient computer programs for web servers, etc. • nginx • Use workarounds • Specially if dynamic content is involved
Historical notes • World's first web server • 1989 - Tim Berners-Lee proposed to CERN a new project • Ease the exchange of information between scientists • Using a hypertext system • 1990 - Berners-Lee wrote two programs: • Browser • WorldWideWeb • Web server • Ran on NeXTSTEP
Historical notes • First web server in USA • Installed December 12, 1991 • Bebo White at SLAC • After returning from a sabbatical at CERN • Between 1991 and 1994: • Simplicity and effectiveness of early technologies used to surf and exchange data through the World Wide Web helped to: • Port them to many different operating systems • Spread their use among lots of different social groups of people • First in scientific organizations • Then in universities • Finally in industry
Historical notes • 1994: Tim Berners-Lee constituted the World Wide Web Consortium (W3C) • Regulate the further development of the many technologies in a standardization process: • HTTP • HTML • etc. • Following years saw an exponential growth of the number of web sites and servers
Software • There are thousands of different web server programs available • Many specialized for very specific purposes • About 50 mainstream • The fact that a web server is not very popular does not necessarily mean • Lot of bugs • Poor performance • See Category:Web server software for a longer list of HTTP server programs.
Statistics • Most popular web servers, used for public web sites, are tracked by • Netcraft.com • Details given by • Netcraft Web Server Reports • According to this site: • Apache has been the most popular web server on the Internet since April of 1996 • July 2010 Netcraft Web Server Survey: • 54.90% web sites on the Internet use Apache • 25.87% web sites use IIS
Who’s running the show? What are they? The big two: Popular Web Servers
Apache http://en.wikipedia.org/wiki/Apache_web_server We’re number one!
Apache • Apache HTTP Server, referred to simply as Apache: • A web server • Notable for playing a key role in the initial growth of the World Wide Web • Apache • First viable alternative to Netscape Communications Corporation web server • Currently known as Sun Java System Web Server • Evolved to rival other Unix-based web servers • Functionality and performance • Since April 1996 Apache has been the most popular HTTP server on the World Wide Web • September 2007: Apache served 50% of all websites
Apache • Project's name was chosen for two reasons: • Respect for the Native American Indian Apache tribe • Well-known for their endurance and their skills in warfare • Project's root is a set of patches to the codebase of NCSA HTTPd 1.3 • Making it "a patchy" server • Apache is developed and maintained by • Open community of developers • Under the auspices of the Apache Software Foundation • Available for a wide variety of OSs • Microsoft Windows • Novell NetWare • Unix-like operating systems • e.g. Linux and Mac OS X • z-OS (IBM mainframe) • and more… • Released under the Apache License • Apache is free software / open source software.
History • First version of the Apache web server created by Robert McCool • Heavily involved with the National Center for Supercomputing Applications web server • Known simply as NCSA HTTPd • When Rob left NCSA in mid-1994 • Development of httpd stalled • Left a variety of patches for improvements circulating through e-mails • Rob McCool was not alone in his efforts • Several other developers helped form the original "Apache Group": • Brian Behlendorf, Roy T. Fielding, Rob Hartill, David Robinson, Cliff Skol nick, Randy Terbush, Robert S. Thau, Andrew Wilson, Eric Hagberg, Frank Peters, and Nicolas Pioch
History • Version 2 of the Apache server was a substantial re-write of much of the Apache 1.x code • Strong focus on further modularization and the development of a portability layer, the Apache Portable Runtime • Apache 2.x core - several major enhancements over Apache 1.x: • UNIX threading • Better support for non-Unix platforms • New Apache API • IPv6 support • First alpha release of Apache March 2, 2000 • First general availability release on April 6, 2002 • Version 2.2 introduced a new authorization API that allows for more flexibility • Also features improved cache modules and proxy modules
Features • Apache supports a variety of features • Many implemented as compiled modules • Extend the core functionality • Range from server-side programming language support to authentication schemes: • Common language interfaces support • mod_perl, mod_python, Tcl, and PHP • Popular authentication modules include • mod_access, mod_auth, and mod_digest
Features • Other features include: • SSL and TLS support • mod_ssl • A proxy module • A useful URL rewriter • AKA a rewrite engine, implemented under mod_rewrite • Custom log files • mod_log_config • Filtering support • mod_include • mod_ext_filter • Apache logs can be analyzed via web browsers with free scripts • AWStats/W3Perl • Visitors