230 likes | 298 Views
A Director of distributed array of web servers. TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak Yoav Helfman. 1. Introduction. 1.1. General
E N D
A Director of distributed array of web servers TECHNION Department of Computer Science The Computer Communication Lab (236340) Summer 2002 Submitted by: David Schwartz Idan Zak Yoav Helfman Director 1.0
1. Introduction 1.1. General • Our goal was to develop a Layer-5 director that switches itself into a layer-4 directorafter making the "request routing" decision, based on the URL. Then it should assign a new connection to the requesting client by using NAPT(Network Address & Port Translation). • As a platform for the director we use Linux operating system. Director 1.0
2. General Layout Internet RIP1 Real Server1 VIP WAN/LAN CIP Load Balancer/Director Linux Box RIP2 Client CIP: Client IP Address & Port VIP: Virtual IP Address & Port RIP: Real Server IP Address & Port Real Server2 RIP3 • The Director is connected to two networks: the web servers farm network, and the network representing the outside world (Internet). Real Server3 Director 1.0
General Layout cont. • The Director reads HTTP requests (on port number 80) from the global network adapter • at this stage the Director and NAPT are working together, processes them (using a hash function in order to find the server that holds the URL) and sends the requests to the selected web server through the local network adapter. • From this moment on NAPT “takes initiative” performing the translation between the actual physical server and the client, actually the Layer 5 level has finished it’s part at this stage. Director 1.0
3. Modules. The project consists of four main modules: Layer 3/4 NAPT Layer 5 URL Director Director NAPT timeout manager Debug Director 1.0
Modules cont. • Layer 5 URL Director • Accept: Examines each “GET” request and makes new routing decision based on a hashing function of the URL. • Connect: Initiate a new connection to the selected server. • Layer 3/4 NAPT • Listener: Receives new packets and Classifies them. • Connection Establisher: Manages the NAPT table entries. • NAPT: Redirects the "packet's flow” to the real WEB server and back to the client. Director 1.0
Modules cont. • NAPT timeout manager • Timeout: Terminates inactive client-server connections and removes finished connections NAPT entries. • Debug • Print: Enables a real time Director tables view. Director 1.0
Modules cont. Packets Buffer NAPT Entries Layer 5 URL Director Incoming Packets Header ContentExtraction Packet Routing(Load Balancing) Forward Packet Director 1.0
Raw Sockets • We used raw sockets in order to intercept the raw data directly from layer 3 • Raw Sockets allows the user to receive the packets directly to the user level without passing through all the network layers on the way • A copy of the packets is sent to us by the Raw Sockets and the real packet continues it’s way to the TCP stack • Raw Socket intercept the packets before the packets are processed by the TCP/IP, therefore we can receive and send data even if the TCP/IP is blocked • The use of Raw Sockets is identical to intercepting the packets in the kernel level in terms of the data received Director 1.0
Algorithms used • Layer 5 Director - Accept • Initializes the layer 4 threads and tables • Calls accept() waiting for new connections • When a new connection arrives we create a new thread which connects to the client. • Loops back to accept() Director 1.0
Algorithms cont. • Layer 5 Director - Connect • Reads the request from the client • Calculates the length of the URL and decides which server to connect to. • Calls Connect() with the address of the server containing the requested page. • Builds a semi-complete NAPT entry and inserts it into the semi-complete table. • The thread finishes and exits Director 1.0
Algorithms cont. • Layer 3/4 Director - Listener • Creates a raw-socket and calls Recv() on the socket • After intercepting a packet we categorize it (only TCP packets are inspected - by looking at the protocol field in the IP header we can tell which packets are TCP): • SYN packet – discarded • SYN-ACK packet – inserted into the SYN-ACK queue. • All the other packets are inserted into the Data queue. Director 1.0
Algorithms cont. • Layer 3/4 Director – Connection Establisher • In order to extract the sequence numbers we examine the SYN-ACK packets which are stored in the SYN-ACK queue. • Removes a packet from the queue and searches for the semi-complete entry which has the same port and IP. • Updates the sequence numbers according to the direction of the packet (client-server or server-client) • Inserts the seq. no. into the ACK-3 queue (explained later) • If both directions are updated, the entry is removed from the semi-complete table and entered into the NAPT table. • Loop back to remove a new SYN-ACK packet Director 1.0
Algorithms cont. • Layer 3/4 Director - NAPT • Removes a packet from the Data queue. • Checks if the packet is the ACK packet from one of the handshakes (by comparing its sequence number to the sequence numbers stored in the ACK-3 queue. • Searches for an entry in the NAPT table which has the same port and IP. • If no entry is found the packet is discarded. • If an entry is found we fix the source and destination port and IP, the sequence numbers and the checksums. • We update the time field in the NAPT entry. • The packet is sent onwards (to the server or to the client). Director 1.0
Algorithms cont. • If an entry has received RST (from any direction), the entry is removed. • NAPT timeout manager- Timeout • Every 10 seconds the thread wakes up and goes over all the entries in the NAPT and semi-complete tables. • If an entry is found which has not been used in over 24 hours, it is removed from the tables. • If an entry has received both FINs (from each direction) and at least 60 seconds have passed, the entry is removed. Director 1.0
Algorithms cont. • Debug- Print • At any time we can examine all the tables and queues by hitting a number and pressing enter – a thread is waiting all the time to print the contents of the threads. Director 1.0
Tables and Queues • NAPT and Semi-Complete tables: • Each entry consists of: • Source and destination IP • Source and destination port • Client-director sequence and ack numbers • Director-server sequence and ack numbers • Time stamp • Socket file descriptors (client-director and director-server) • Flags - indicating whether we’ve received both FINs Director 1.0
Tables and Queues cont. • Functions: The table is implemented as a queue: • Enqueue – add a new entry at the head of the queue. • Dequeue – remove an entry from the end of the queue. • Find – finds an address by the given source and destination port and IP. Director 1.0
Tables and Queues cont. • Data and SYN-ACK queues • These queues hold the packet as received off the raw socket, with the link layer headers removed – just the IP and TCP layer headers are saved. • Functions: • Enqueue – add a new entry at the head of the queue • Dequeue – remove an entry from the end of the queue Director 1.0
Tables and Queues cont. • ACK-3 queues • This queue hold the sequence no. as received in the SYN-ACK packet. This queue is used to identify the 3rd packet of the handshake, so that it won’t be passed on to the server. • Functions: • Enqueue – add a new item at the head of the queue. • Remove – find remove an item from the queue if it exists. Director 1.0
Tables and Queues cont. • Address table • This table is used for storing the addresses of the servers and the clients for the use of the raw socket. • The table consists of the IP and port of the address, and a struct sockaddr_ll. • Functions: • Enqueue – add a new address to the table. • Remove – find remove an address from the table. • Find – finds an address by the given port and IP Director 1.0
Notes • In order to avoid having the kernel automatically send an ack for every TCP packet received we used the built in linux firewall: • After sending the SYN-ACK packet to the client we insert a rule in to the firewall that blocks all TCP traffic to this client (port and IP). • After calling connect() we add a rule that blocks all TCP traffic to the server we just connected to (port and IP). • When the entry is removed from the NAPT table, the rule is removed from the firewall too. • Although we are blocking all output traffic to some servers, we can still send raw data to those server using the Raw Sockets. Director 1.0