200 likes | 364 Views
Introduction to Content-aware Switch. Presented by Li Zhao. Content-aware Switch (CS). www.yahoo.com. Internet. Image Server. IP. TCP. APP. DATA. Application Server. Switch. GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com…. HTML Server. Front-end of a web cluster
E N D
Introduction to Content-aware Switch Presented by Li Zhao
Content-aware Switch (CS) www.yahoo.com Internet Image Server IP TCP APP. DATA Application Server Switch GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com… HTML Server • Front-end of a web cluster • Route packets based on layer 5/7 (content) information
Why use CS • Servers can be specialized for certain types of request • Content segregation • Exploit locality • Affinity-based routing • Increase the performance because of the improved hit rate • Partial replication of server file set • Partition the server’s file set over different nodes
Content-aware Switch Architecture • Two way architecture • Server returns the • response to the switch • One way architecture • Server returns the • response to the client client switch server
Layer-7 Two-way Mechanisms • TCP gateway An application level proxy running on the web switch mediates the communication between the client and the server • TCP splicing reduce the overhead in TCP gateway. Packet forwarding occurs at network level between the network interface driver and the TCP/IP stack, is carried out directly by OS user kernel user kernel
SYN(CSEQ) step4 SYN(SSEQ) step5 ACK(CSEQ+1) step6 DATA(CSEQ+1) ACK(SSEQ+1) TCP Splicing client server content switch SYN(CSEQ) step1 SYN(DSEQ) step2 ACK(CSEQ+1) DATA(CSEQ+1) step3 ACK(DSEQ+1) step7 DATA(DSEQ+1) DATA(SSEQ+1) ACK(CSEQ+LenR+1) ACK(CSEQ+lenR+1) step8 ACK(DSEQ+ lenD+1) ACK(SSEQ+lenD+1) lenR: size of http request. . lenD: size of return document
TCP Splicing w/ Pre-forked Connections switch client server SYN(PSEQ) step1 SYN(SSEQ) step2 ACK(PSEQ+1) ACK(SSEQ+1) step3 SYN(CSEQ) step4 SYN(DSEQ) step5 ACK(CSEQ+1) DATA(CSEQ+1) step6 ACK(DSEQ+1) DATA(PSEQ+1) step7 ACK(SSEQ+1) DATA(DSEQ+1) DATA(SSEQ+1) step8 ACK(CSEQ+LenR+1) ACK(PSEQ+lenR+1) ACK(DSEQ+ lenD+1) ACK(SSEQ+lenD+1) step9 lenR: size of http request. . lenD: size of return document
SYN(CSEQ) SYN(CSEQ) step1 SYN(SSEQ) SYN(SSEQ) step2 ACK(CSEQ+1) ACK(CSEQ+1) step3 DATA(CSEQ+1) DATA(CSEQ+1) ACK(SSEQ+1) ACK(SSEQ+1) step4 DATA(SSEQ+1) DATA(SSEQ+1) ACK(CSEQ+LenR+1) ACK(CSEQ+lenR+1) step5 lenD+1) ACK(SSEQ+ ACK(SSEQ+lenD+1) Pre-Allocate Server Scheme Pre-allocated server client content switch • Use a guess routing decision based on IP/Port#/History • Advantage: • Faster than TCP splicing. • Reduce session processing overhead no need to convert server sequence #
SYN(CSEQ) step4 SYN(RSEQ) step5 ACK(CSEQ+1) step6 DATA(CSEQ+1) ACK(SSEQ+1) Degenerated to TCP Splicing If Guess Wrong Pre-allocated server client content switch SYN(CSEQ) SYN(CSEQ) step1 SYN(SSEQ) SYN(SSEQ) step2 ACK(CSEQ+1) ACK(CSEQ+1) step3 DATA(CSEQ+1) FIN(CSEQ+1) ACK(SSEQ+1) Right server step4 DATA(SSEQ+1) DATA(RSEQ+1) ACK(CSEQ+LenR+1) ACK(CSEQ+lenR+1) step5 lenD+1) ACK(DSEQ+ ACK(SSEQ+lenD+1) Sequence # conversion needed
Case Study • Linux-based content aware switch [Yang99] • IBM Layer 5 [Pradhan00]
Results • Overhead of the switch • 89usec reduced pre-forked connections • CS vs. Layer 4 switch • Affinity-based routing vs. WRR • Content-segregation vs. WRR • CGI: 27% • Static: 36%
IBM Switch Architecture • Switch core • Port controller: • Identify packets (layer 5) and send them to CPU • Processing all other packets • CPU: PowerPC 603e • Parse http request • URL based routing
Flow Diagram on Layer 5 System • Client ports vs. server ports • Classifier: Identify packets
Results • CS vs. Layer 4 switch • Entire set of files are replicated • Some servers share files by NFS • Partitioned file set
Layer-7 one-way mechanisms • TCP handoff The switch hands off the TCP connection endpoint to the server • TCP connection hop • Software-based proprietary solution • encapsulating the IP packet in an RPX packet and sending it to the server.
TCP Handoff client content switch server SYN(CSEQ) step1 SYN(DSEQ) step2 ACK(CSEQ+1) DATA(CSEQ+1) step3 ACK(DSEQ+1) Migrate(Data, CSEQ, DSEQ) step4 DATA(DSEQ+1) step5 ACK(CSEQ+lenR+1) step6 ACK(DSEQ+ lenD+1) ACK(DSEQ+lenD+1) • Migrate the created TCP connection from the switch to the back-end sever • Create a TCP connection at the back-end without going through the TCP three-way handshake • Retrieve the state of an established connection and destroy the connection without going through the normal message handshake required to close a TCP connection • Once the connection is handed off to the back-end server, the switch must forward packets from the client to the appropriate back-end server
References • [Pradhan00] G.Apostolopoulos, et. al, Design, Implementation and Performance of a Content-Based Switch, proceedings of IEEE INFOCOM-2000 • [Pai98] V.S. Pai, et. al, Locality-Aware Request Distribution in Cluster-based Network Servers. In Proceedings of the 8th Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, Oct.1998 • [Aron00] Mohit Aron et. al, Scalable Content-aware Request Distribution in Cluster-based Network Servers, Proc. of the 2000 Annual Usenix Technical Conference, June 2000 • [Edward] C. Edward Chow Chow, Introduction to content switch • [Valeria01] Valeria Cardellini, et. al, The state of the Art in Locally Distributed Web-server Systems, IBM research report • [Yang99] Chu-Sing Yang, et. Al, Efficient support for content-based rouging in web server clusters, Proc. Of USITS’ 99