210 likes | 343 Views
The Elbert HTTP Server. Attempting to implement RFC 2616 to the letter (or as close as I can) By: Shawn M. Jones. Language Used. Python was chosen because it is a relatively easy language to use for parsing strings
E N D
The Elbert HTTP Server Attempting to implement RFC 2616 to the letter (or as close as I can) By: Shawn M. Jones
Language Used • Python was chosen because it is a relatively easy language to use for parsing strings • Python also has an extensive standard library set, meaning most of this project could be completed without third-party libraries • Python also has an extensive third-party library set, meaning that many of the functionality I could need exists out there… somewhere.
Overall Architecture Layers below can call layers above Items at the same layer can call each other Utility classes not shown
Method Handler Inheritance Additional method handlers will be added in future assignments
Main Configuration File • XML was chosen because of the need to define multiple “stanzas” of the same type (e.g. for multiple virtual URIs) that contain multiple data elements (e.g. mapping virtual path to real path) • As the configuration gets more complex, this should pay off over simple definition strings • INI files would have been a good competitor, but often have to number elements in order to produce the “stanza effect”, making them error prone to hand editing
Demo • Yes, Virginia, there is actually a web server… http://mln-web.cs.odu.edu:7205/a1-test/2/index.html http://mln-web.cs.odu.edu:7205/.well-known/access.log http://mln-web.cs.odu.edu:7205/doesntexist
Testing Framework • Two types of tests have been developed for Elbert: • Unit tests – testing a single class or namespace of functions for its subset of functionality • Integration tests – spin up the web server, sending the socket the input data, verifying the output, and checking the access log to ensure that the correct data was written to it in Common Log Format
Work still to be done before the due date • URI parsing using third-party pyparsing library, should return a 400 Bad Request if the URI is not in an RFC 2396-compliant format • Ensure all rules for the Host: header vs. URI authority are implemented from RFC 2616. • Rigorously review RFC 2616 again and fill in missing integration tests, then fix broken unimplemented functionality • Garbage from the client (thank you Chrome!) should be 400 Bad Request, but is 500 right now. (need to modify state diagram) • Remove entity bodies for non-200 status codes • this was inserted after reading the RFC, but before understanding that the assignment didn’t call for it
Questions for Dr. Nelson Since we are sticking to only what is in assignment 1: • At what level in the “document home” directory do we untar the test data? Should the file a1-test/2/0.jpeg map to URI “/a1-test/2/0.jpeg” or “/2/0.jpeg”? • Should Elbert return a 404 whenever someone submits a GET on a directory path? Without directory listings, it’s not like there’s a resource on the other side. • Should Elbert just return the headers for status codes 400, 403, 404, 500, 501, and 505? • Should their Content-Length be 0? • What should Elbert return in the Host: header does not match (or resolve to) the IP address of the web server machine? • Does the order of the headers matter? I noticed Apache lists Connection: close in the middle. • Since we are not implementing chunking yet: for TRACE, should Elbert just include the Content-Length in the header, containing the size of the content that was sent? • Should we just ignore headers we don’t understand or issue a 400 Bad Request?
GET ./well-known/access.lograce condition? • There is a race condition in retrieving a current representation of the access log using GET. • Common Log Format has a size field containing the number of bytes from the returned entity. • In this case, the access log is the returned entity. • If the access log is retrieved before the entry of its GET operation, it the entity returned is incomplete because it does not contain the final entry. • The access log cannot be retrieved after it is logged because its size needs to be measured the make the log entry which is part of the entity we are retrieving! • Does it matter that the access log is missing the entry of its own GET operation?
HTTP Socket Handler • Receives requests on the socket • Sends responses back to the socket • Catches HTTPError exceptions for status codes 400, 403, 404, 501, 505, etc. and sends those in a response • Keeps track of Connection: identifiers (only close for now) • All other exceptions are caught and sent back as a status code 500 response • Logs responses to access log
HTTP Method Factory • This factory controls which methods are supported by the server. • An object subclassed from HTTPMethodHandler is returned to the caller. This object MUST implement the execute() method for HTTPMethod Handler. • If an unsupported method is encountered, this factory raises an HTTPError exception with a status code of 501.
HTTP Method Handler • If the HTTP version is not 1.1, then this class raises an HTTPError exception with a status code of 505. • If the request headers lack a “Host:” entry, or if the URI doesn’t validate based on RFC 2396, this class raises an HTTPError exception with a status code of 400. • The findRepresentation method exists for those HTTP methods that return a representation. • If findRepresentation cannot find a representation, it raises an HTTPError exception with a status code of 404. • If findRepresentation finds a representation, but the filesystem does not permit access, it raises an HTTP Error exception with a status code of 403.
Example Main Configuration File <config> <!-- software versioning information --> <softwareName>Elbert HTTP Server</softwareName> <softwareVersion>Assignment 1</softwareVersion> <!-- port to listen on --> <port>7205</port> <!-- number of processes to spin up --> <numWorkers>10</numWorkers> <!-- logs --> <standardLog>../log/access.log</standardLog> <debugLog>../log/debug.log</debugLog> <!-- additional configuration files --> <statusFile>../conf/statuses</statusFile> <mimeTypes>../conf/mime.types</mimeTypes> <!-- directory listing --> <directoryListingTemplate>../conf/pages/directoryListing.html</directoryListingTemplate> <directoryIconURI>/icons/folder.png</directoryIconURI> <fileIconURI>/icons/file.png</fileIconURI> <!-- document home directory --> <documentHome>../htdocs</documentHome> <!-- virtual URIs --> <virtualURI> <virtualPath>/.well-known/</virtualPath> <realPath>../log/</realPath> </virtualURI> </config>
Status Configuration File • Statuses could have been handled in the XML file, but instead it seemed easier to give them their own simpler configuration of the type: <code>:<message>:[<header template>]:[<status entity>] • This could be integrated back into the XML file in the future if more fields are needed
Example Status Configuration File 200:OK:: 400:Bad Request:../conf/headers/400.hdr:../conf/pages/400.html 403:Forbidden:../conf/headers/403.hdr:../conf/pages/403.html 404:Not Found:../conf/headers/404.hdr:../conf/pages/404.html 500:Internal Server Error:../conf/headers/500.hdr:../conf/pages/500.html 501:Not Implemented:../conf/headers/501.hdr:../conf/pages/501.html 505:HTTP Version Not Supported:../conf/headers/505.hdr:../conf/pages/505.html
Mime-type Configuration File • The mime.types configuration file was modeled after the same file included with most UNIX/Linux systems • It has the format:<mime-type id>\t[<extension>]* • The intention was to be able to refer to the /etc/mime.types file by the end of the class
Testing Framework • One intention is that the testing framework will grow, allowing me to see when changes to one section of the code cause others to stop working • The plan is to write the tests first, then code, (aka Test-Driven-Development) but that doesn’t always pan out