Web Servers

Web Servers

Pre-lecture Survey: What is the #1 web server: • Apache • Google • MS IIS HTTP server • nginx • Sun • Other

http://en.wikipedia.org/wiki/Web_servers Generic Overview

Web Servers • A web server can be a: • Computer Program • Responsible for accepting HTTP requests from clients (web browsers) • Returns HTTP responses with optional data contents • Usually web pages • HTML documents • Linked objects (images, etc.). • Computer • Running a computer program which provides the above functionality

Common Features

Common Features • HTTP • Accepts HTTP requests from a client • Provides HTTP responses to the client • Typical “HTML” document can be: • File containing HTML statements • Raw text file • Image • Some other type of document • defined by MIME-types • In case of an error in a client request or trying to service the request: • Web server sends an error response • May include custom HTML • May have text messages • Better explain the problem to end user

Common Features • Logging • Web servers keep detailed information to log files • Client requests • Server responses • Allows the Webmaster to collect data • Running log analyzers

Additional Available Features • Authentication • Optional authorization before allowing access to some or all resources • Requires a user name and password • Handles: • Static content • Dynamic content • Support one or more related interfaces • SSI, CGI, SCGI, FastCGI, JSP, PHP, ASP, ASP .NET, Server API such as NSAPI, ISAPI, etc.

Additional Available Features • HTTPS support • VIA SSL or TLS • Allows secure (encrypted) connections • Uses port 443 instead of port 80 • Content compression • I.e. by gzip encoding • Reduces the size of the responses • Lower bandwidth usage, etc.

Additional Available Features • Virtual hosting • Serve many web sites using one IP address • Large file support • Serve files greater than 2 GB • Typical 32 bit OS restriction • Bandwidth throttling • Limit the speed of responses • Do not saturate the network • Able to serve more clients

Where does the requested material come from? Origin of the returned content

Content Origin • Origin of the returned content may be: • Static • Pre-existing data file • Content changes only if manually edited • Contents loaded on request • Dynamic • Content generated by another program • Script (programming language) • Creates/retrieves the requested information • Static content is usually delivered much faster than dynamic content • 2 to 100 times • Especially if the latter involves data pulled from a database

How does it find it? Path translation

Path translation • Web servers map the path component of a Uniform Resource Locator (URL) into: • Local file system resource • Static requests • Internal or external program name • Dynamic requests • For a static request the URL path specified by the client is relative to the Web server's root directory • This is not the same as the computers root directory

Path translation • Consider the following URL requested by a client Web Browser: • http://www.example.com/path/file.html • Client's Web browser translates it: • Where • http:// • Use the HTTP protocol • www.example.com • The Web server to connect to • This is translated to an IP address by DNS • Sent to 93:184.216.119:80 • Note port 80 is usually implicit • /path/file.html • The resource to access • Generates the following HTTP 1.1 request sent to the IP address: • GET /path/file.html HTTP/1.1Host: www.example.com

Path translation (cont.) • Web server host (www.example.com) • Sees the request is for port 80 • Sends request to the Web Server software • Appends the given path/file to the path of the servers Web root directory • Linux Apache typical roots: • /var/www/htdocs • /var/www • /var/www/html • Result would then be the local file system resource: • /var/www/htdocs/path/file.html • /var/www/path/file.html • /var/www/html/path/file.html • Web server: • Retrieves the file, if it exists • Processes it by the Web servers rules • Sends a response to the client's web browser • Response: • Describes the content of the returned data/file • Contains the data requested –or- a response

Performance

Performance • Web servers must: • Serve requests quickly! • From more than one TCP/IP connection at a time • Some main key performance parameters are: • number of requests per second • depends on the type of request, etc. • latency response time in milliseconds • for each new connection or request • throughput in bytes per second • Depends on • File size • Content cached or not • Available network bandwidth • etc. • concurrency level • How does a server respond to multiple client requests

Performance • Measured under: • Varying load of clients • Varying requests per client • Performance parameters may vary noticeably depending on the number of active connections • Specific server model used to implement a web server program can bias the performance and scalability level that can be reached under heavy load or when using high end hardware • many CPUs, disks, etc.

Load limits

Load limits • Web servers have load limits • Can be set in a configuration file • Can handle only a limited number of concurrent client connections per IP address (and IP ports) • Usually between 2 and 60,000 • Default between 500 and 1,000 • Can serve only a certain maximum number of requests per second depending on: • Settings • HTTP request type • Content origin • Static • Dynamic • Served content cached or not • Hardware and software limits of the native OS • A web server near or over its limits • Becomes overloaded • Unresponsive

Overload causes

Overload causes • A sample daily graph of a web server's load, indicating a spike in the load early in the day.

Overload causes • Web servers may be overloaded because of: • Too much legitimate web traffic • Thousands or even millions of clients hitting the web site in a short interval of time • DDoS • Distributed Denial of Service attacks • Coordinated • Computer worms • Abnormal traffic because of millions of infected computers • Not coordinated • XSS viruses • Millions of infected browsers and/or web servers • Internet web robots • Traffic not filtered / limited on large web sites with very few resources (bandwidth, etc.) • Internet (network) slowdowns • Client requests are served more slowly and the number of connections increases so much that server limits are reached • Web servers (computers) partial unavailability • Required / urgent maintenance or upgrade • HW or SW failures • Back-end (i.e. DB) failures, etc. • Remaining web servers get too much traffic and they become overloaded

Overload symptoms

Overload symptoms • Symptoms of an overloaded web server include: • Requests are served with (possibly long) delays • from 1 second to a few hundred seconds • 500, 502, 503, 504 HTTP errors returned to clients • Sometimes also unrelated 404 error or even 408 error may be returned • TCP connections are refused or reset (interrupted) before any content is sent to clients • In very rare cases, only partial contents are sent • This behavior may well be considered a bug • Even if it stems from unavailable system resources

Anti-overload techniques

Anti-overload techniques • To partially overcome load limits and to prevent overload use techniques like: • Managing network traffic by using: • Firewalls • Block unwanted traffic • Bad IP sources • Bad patterns • HTTP traffic managers • Drop, redirect or rewrite requests having bad HTTP patterns • Bandwidth management and traffic shaping • Smooth down peaks in network usage • Deploying web cache techniques • Use different domains to serve different content (static and dynamic) by separate Web servers, i.e.: • http://images.example.com • Serves static images • http://www.example.com • Serves dynamic data requests

Anti-overload techniques • Techniques continued: • Use different domain names and/or computers to separate big files from small/medium files • Be able to fully cache small and medium sized files • Efficiently serve big or huge (over 10 - 1000 MB) files by using different settings • Using many Web servers (programs) per computer • Each bound to its own network card and IP address • Use many Web servers that are grouped together • Act or are seen as one big Web server • See Load balancer

Anti-overload techniques • Techniques continued: • Add more hardware resources • RAM, disks, NICs, etc. • Tune OS parameters • Hardware capabilities • Usage • Use more efficient computer programs for web servers, etc. • nginx • Use workarounds • Specially if dynamic content is involved

Historical notes

Historical notes • World's first web server • 1989 - Tim Berners-Lee proposed to CERN a new project • Ease the exchange of information between scientists • Using a hypertext system • 1990 - Berners-Lee wrote two programs: • Browser • WorldWideWeb • Web server • Ran on NeXTSTEP

Historical notes • First web server in USA • Installed December 12, 1991 • Bebo White at SLAC • After returning from a sabbatical at CERN • Between 1991 and 1994: • Simplicity and effectiveness of early technologies used to surf and exchange data through the World Wide Web helped to: • Port them to many different operating systems • Spread their use among lots of different social groups of people • First in scientific organizations • Then in universities • Finally in industry

Historical notes • 1994: Tim Berners-Lee constituted the World Wide Web Consortium (W3C) • Regulate the further development of the many technologies in a standardization process: • HTTP • HTML • etc. • Following years saw an exponential growth of the number of web sites and servers

Resume 2/27

Software

Software • There are thousands of different web server programs available • Many specialized for very specific purposes • About 50 mainstream • The fact that a web server is not very popular does not necessarily mean • Lot of bugs • Poor performance • See Category:Web server software for a longer list of HTTP server programs.

Statistics

Statistics • Most popular web servers, used for public web sites, are tracked by • Netcraft.com • Details given by • Netcraft Web Server Reports • According to this site: • Apache has been the most popular web server on the Internet since April of 1996 • July 2010 Netcraft Web Server Survey: • 54.90% web sites on the Internet use Apache • 25.87% web sites use IIS

Web Servers

Post-survey: What is the #1 web server: • Apache • Google • MS IIS HTTP server • nginx • Sun

Summary

Summary • Concentrated on HTTP servers • Apache and IIS are the main web serving tools • nginx is rising fast • Apache/Microsoft battling • Apache currently declining • IIS currently up • Usage tracked • Netcraft Web Server Survey

Web Servers

Web Servers

Presentation Transcript

Securing Apache Web Servers

Web Servers

Semantic Web Servers

Web Servers

Web Servers

Web Servers and URLs

Web Servers

Web Servers

Configuring web servers and web applications

Web Servers / Deployment

Comp2513 Web/Application Servers

Web Servers

Embedded Web Servers

Web Servers: Implementation and Performance

Web servers

Malicious Web Servers

Web Servers

Web Application Servers

Web Servers and URLs

Semantic Web Servers

CHAPTER 7 WEB SERVERS AND WEB BROWSERS

The Web Servers + Crawlers