340 likes | 352 Views
Internet / Intranet CIS-536. Class 4 Web Server Technology HTTP Protocol Log Files. Class 4 Agenda. Discuss Homework Overview of Web Servers and Server Technology HTTP The Protocol For Communication Between Web Browser and Server Log Files. Web Servers.
E N D
Internet / IntranetCIS-536 Class 4 Web Server Technology HTTP Protocol Log Files
Class 4 Agenda • Discuss Homework • Overview of Web Servers and Server Technology • HTTP • The Protocol For Communication Between Web Browser and Server • Log Files
Web Servers • A Basic Web Server is Just a File Server • Client Requests a File via HTTP Protocol • Server Delivers the File via HTTP Protocol • Server Maps URL to a Subdirectory • Web Server Needs Appropriate Permissions to Access Files/Directories • Supports Non-HTTP Protocols • FTP, Gopher, etc. • A Web Server is Not HTML Specific • Typically Identifies a Filetype by Extension • Or Directory Where File Exists
Additional Common Web Server Features • Additional Security Beyond That Provided by O/S • Scripting • Ability to Dynamically Create a Web Page • Run a Program Instead of Returning a File (CGI) • Return the Program Output as the Requested File • Administration • Log Files • Performance Monitoring
Advanced Web Server Features • Virtual Hosting • Allow Multiple URL’s to Map to Same Computer • Performance Optimization • Caching • Reliability • Scalability • Proxy Servers (For Security and Performance) • Fetch Documents That are on Other Computers • Cache Them Locally • Allows for Easy Scalability • Multiple Proxy Servers Can Cache Documents From One Source Computer • Embedded Scripting • Server Side Includes • Custom Scripting Languages • Server API
Web Servers – Added Functionality • Database Connectivity • SQL, MySQL • Directory Listings • Icons, etc. • Built-In Search Engines • Built-In ImageMap Handling • Multimedia Support • Session Emulation • Streaming Multimedia • Advanced Security • Encrypted HTTP • S-HTTP (Secure HTTP) – CommerceNet • SSL (Secure Sockets Layer) - Netscape • Web Server “Add-Ons” • CGI Substitutes / CGI Optimizations • Cold Fusion
Web Server History • All Web Servers Have a Common Root • httpd (NCSA) • UNIX Orientation • Many Features are Essentially UNIX Features • Apache • Website (O’Reilly) • Netscape Enterprise Server • Microsoft Internet Information Server • A Slew of Others
Apache • UNIX Origins – Now Ported to NT • Evolved From httpd • Freeware • Typical UNIX Application • Public Source Code • Many Defaults, Conventions • BUT: All is Configurable • No GUI Interface • Configured via Scripts, Shell Commands, Config Files • Various “Flavors” • Many Optional Features • API • ApacheSSL
IIS / Netscape • Microsoft IIS • Not Strictly Derived From httpd/Apache • Windows NT • However: Functionally Very Similar to Apache • Emulates Many UNIX Conventions • E.g. Forward Slashes • Configuration via GUI • Personal Web Server • Peer Web Server • Netscape • Multi-Platform • UNIX is Preferred Platform • Less “Open” Than Apache • More Secure?
UNIX File Structure • Forward Slashes (/) to Separate Filenames, Directories • Case Sensitive File Names • Windows is Not • No Limit on Filename Size / Extensions • Extensions are by Convention • Root is “/” • User Home Directory is: “~/” • Symbolic Links / Aliases • Directories Can Be Spread Over Multiple Drives • Can Create Non-Hierarchical Structure • File Permissions • Read, Write, Execute • Separate Permissions for Owner, Group, All • Directories are Special Cases of Files • Execute Permissions = Able to Browse Directory
Web Server Configuration • Directory Structure • Virtual Document Tree • Access to User Directories • UNIX: ~user • Symbolic Links • Be Careful: May Link You Out of Directory Structure • Case Sensitivity • Ownership Access • Server is a Process Started by a User. • Has the Permissions of the User Who Started It. • Default Documents • Allow Directory Browsing • Scripting • Who is Allowed to Run Scripts? • How are Scripts Identified?
Web Server File Access Control / Security • Directory • O/S Level Security • IP, Domain Level Security • Spoofing • Directory Access • .htaccess • Microsoft Front-Page Extensions • Encryption • S-HTTP • Web Protocols Only • SSL • TCP/IP Level • V1.0 – V2.X : Security Holes Found, Fixed • V3.0 Is Current • Uses Port 443 • Microsoft PCT • Response to Holes in SSL 2.0 • Now Use SSL
Server Administration • Need Sysadmin and O/S Expertise • Lots of “Holes” Gotchas Whenever Scripts are Allowed • FTP • Who is Allowed to Change Documents? • Who is Allowed to Change Server Configuration? • How do They Get Access? • Direct Access • Remote Access (e.g. FTP) • Log Files • Accessibility • Directory Structure • Management
HTTP • The Protocol For Requesting and Delivering Web Pages • Not Restricted to Returning HTML Files • Client Server Model • Request / Reponse • TCP/IP Protocol Using Port 80 • Supports Other Ports, Can Be Run Over Other Protocols • “Replaced” FTP as the Primary Method For Internet File Transfer • Stateless • Uses MIME Format to Encapsulate Data • Message Structure Similar to SMTP Mail Messages • Message Header (metadata) • Message Body (data) • Separated From Header by a Blank Line • Browser Only Displays Body, Not Header • No Restrictions on Message Size / Format (as with SMTP)
HTTP Versions • HTTP 1.0 - Commonly Used Version • HTTP 1.1 • Formalizes Many Extensions to Version 1.0 • Supports Persistent Connections • Supports Compression/Decompression • Supports Virtual Hosting • Single Server With Multiple IP Addresses • Supports Multiple Languages • Supports Byte Range Transfers • Useful For Re-Sending Interrupted Data Transfers • Similar to Process Used By XMODEM, etc.
HTTP OVERVIEW HTTP Request Client (Browser) Web Server File System HTTP Response HTML HTML CGI Server Application HTML
HTTP Commands • Simple Structure • Main Methods • GET <URI> HTTP/1.0 • Request the File Specified By the URL • URI is URL Without Protocol/Port • HEAD • Request the HTTP Header Information Only • Don’t Return the File Itself • POST • Sends Data to The Server • Typically Data From a Form • Defined, But Not Widely Implemented • PUT • DELETE • LINK • UNLINK
Common HTTP Header Fields • Additional “Parameters” to the HTTP Commands • Used in HTTP Requests: • Accept • Lists the MIME Types That Client Can Accept • E.g. Accept text/plain, text/html or Accept * • Accept-Charset • Lists Accepted Character Sets That Client Can Accept • ASCII, ISO-8859-1 Are Assumed • Accept-Encoding • Accept-Language • Authorization • Basic – UserName:Password (Base64 Encoding) • Cookie • From • E-mail Address of Requesting User • Not Typically Used For Privacy Reasons • Primarily Used By Automated Clients (e.g. Bots)
Common HTTP Header Fields (2) • Host • Virtual Host – One Server Handles Multiple Sites • If-Modified-Since • Only Return Data if it Has Been Modified Since This Date • Pragma • General Purpose For “Additional” Headers Not in Standard • Referrer • The URL That Referred One to This URL • User-Agent • Name/Version of the HTTP Client • Used in HTTP Responses: • Allow • Lists the Available Commands Supported by Server • Content-Encoding • Allows for Passing Data in Compressed Formats • Content-Language • Describes the Natural Language of the Intended Audience
Common HTTP Header Fields (3) • Content-Length • Size of the Message Body • Content-Type • The MIME Type For the Data • Date • Expires • HTTP Clients Should Not Cache Data After This Date • Last-Modified • Location • Used For Redirection • MIME-Version • Pragma • E.g. no-cache • Retry-After • When Server is Unavailable. Info On When to Try Back • Server • Name/Version of the HTTP Server
Common HTTP Header Fields (4) • Title • Descriptive Title of the File • WWW-Authenticate • When Authorization Denied, Tells Client Which Methods of Authentication are Supported • HTTP Status Codes • Returned By the Server In First Line of Response • Informational (100-199) • Successful (200-299) • Redirection (300-399) • Location in HTTP Header Specifies Redirection • Client Error (400-499) • Server Error (500-599)
Common Status Values • 200 – OK • 201 – Created (Post Request Was Fulfilled) • 204 - No Content (OK. Nothing For Client to Display • 300 - Multiple Choices • Requested Resource Available From Multiple Locations. • List of Locations Returned in the Response. • 301 - Moved Permanently • 302 - Moved Temporarily • 304 - Not Modified • Document Hasn’t Been Modified Since If-Modified Since Date • 400 - Bad Request • 401 – Unauthorized • 403 - Forbidden • 404 – Not Found • 500 – Internal Server Error • 501 – Not Implemented (Server Does Not Support ThisRequest) • 502 – Bad Gateway (Invalid Response From Server) • 503 – Service Unavailable
Cookies • Cookies Are Name Value Pairs • Stored by the Client • Passed in the HTTP Header • Cookies Have Associated Expiration • Session (Default) • Date / Time • Associated With a URL Path, Not a Page! • Allows Passing Parameters Between Web Pages • Thus Cookies are Used to Provide State Information to a Stateless Protocol
Web Server HTTP Functionality • Content Negotiation • Choose From Several Different Formats Based on Request • Language Negotiation • Choose From Versions of Same Document Based on Request • Support for HTTP-Put, HTTP-Delete • Keep-Alive • As-Is • Server Doesn’t Add HTTP Headers • Allows You to Create Specific Behavior • Redirect to Another Site • Never Saved in Browser’s Cache
Some Definitions • Hits • Each HTTP Request is a Hit • Accessing a Web Page May Result in Multiple Hits • E.g. Each Graphic is a Hit • Page Views • Accessing a Single Web Page is a Page View • E.g. Typing in a URL or Clicking on a Link • Visits • A Single Client’s Visit to Your Entire Site (Session) • May Include Multiple Page Views • What Constitutes a Second Visit From the Same Client? • Why is This Important? • Terms are Sometimes Used Interchangeably and Improperly • Compare Apples to Apples • Important for Commercial Web Sites • Advertising is Based on Site Access • Typically Sold on Page View Basis
Server Log Files • Many Variations to Web Server Log File Formats • Four Log Files • Access (Transfer) Log • Each Hit is Recorded • User, Date/Time, HTTP Request, etc. • Error Log • Date/Time, Error • Referrer Log • Referring Page, Destination Page • Agent (User) Log • Client’s Browser • Clearly a Need for Standardization • Linking the Four Log Files Together
Common Log Format • Host • IP Address (or Hostname) of Client • Some Servers Perform Lookup of IP Address • RFC931 • HTTP Request: From • Seldom Used. • Authuser • HTTP Request: Authorization • UserName if Username Authorization is Required • Time Stamp • HTTP Response: Date • E.g. [ 10/Jun/1998:14:23:34 -0700] • Request • The Actual HTTP Request • E.g. GET /index.htm HTTP/1.1
Common Log Format (2) • Status • The HTTP Response Status Code • Transfer Volume • HTTP Response: Content-Length
Extended Log File Format • Seven Common Log Format Fields Plus • Referrer • HTTP Request: Referrer • User Agent • HTTP Request: User-Agent • Identifies Browser • Other Common Fields • Cookies • Can Help Identify Users
Issues • Client vs. User • Typically Don’t Have User Level Information • Only Record IP Address of Computer Used For Access • If Fixed IP Address For a Single User’s Machine • This Can Identify the User • Dynamically Assigned IP Addresses • Identifies the Overall Domain (e.g. AOL.com) • Proxy Servers • All Client’s Have IP Address of Proxy Server • Multiple “Sessions” at Same Time • Impossible to Have Truly Accurate Information • Log File Analysis Software Has Algorithms to Identify Page Views, Visits • Client Level Caching Affects Logs • “ISP” Level Caching Affects Logs • E.g. AOL Maintains a Cache • No Requirement for Clients, ISPs to Follow Expiration Info
Log File Maintenance on Server • Log Files Grow Rapidly • Log Files Compress Very Nicely • Server Configurable • Generate Daily/Weekly/Monthly Logs • Maintenance Scripts to Cleanup Log Files • Compress • Archive • Cycle • E.g. Maintain Current Months Files
Log File Analysis • Big Business • Bread and Butter of Sites Driven By Advertising Revenue • Evaluation Factors • Log File Formats Supported • Ability to Link Multiple Logs • How Log Files are Accessed (e.g. via FTP) • Display Methodology • E.g. Available Via Web Pages • Lookup Capabilities • E.g. Map User-Agent to Browser • E.g. Resolve IP Addresses to Domains, Regions • Level of Analysis • E.g. Calculating Visits, Return Visitors • Configurability • Drill-Down Capabilities • Enterprise Capabilities • Ability to Manage Multiple Sites
Log File Analysis Options • Important to Understand the Core Log Files • Log File Analysis Programs Make Some Assumptions • Freeware • Commercial • Service Bureaus
Resources • HTTP • Server Comparison • http://webcompare.internet.com/chart.htm • Apache Server • www.apache.org • Website Server • http://website.ora.com • Microsoft IIS http://www.microsoft.com/NTWorkstation/downloads/Recommended/ServicePacks/NT4OptPk/Default.asp