410 likes | 418 Views
Protocol Implementation. Modularity and Costs. Outline Of Talk. Importance of Modularity The cost of modularity The x-kernel architecture Integrated Layer Processing Further Improvements Conclusions. Modularity. Implies layering , a structured approach Easy to implement
E N D
Protocol Implementation Modularity and Costs
Outline Of Talk • Importance of Modularity • The cost of modularity • The x-kernel architecture • Integrated Layer Processing • Further Improvements • Conclusions
Modularity Implies layering , a structured approach • Easy to implement • Easy to modify • Simple design • Easy to debug • Simple to reuse
Overhead • Costly data manipulation • Header processing of successive layers • Interfacing successive layers
Good Protocol Implementation • Needs to be modular and facilitate addition and modification of other protocols. • Not compromise performance for modularity since performance is what everybody strives for in the current world.
Techniques • The x-kernel • [Peterson et al. ] • Integrated Layer Processing • [Clark et al. 1990] • [Peterson et al. 1993] • [Braun et al. 1995] • Further Improvements • [Van Renesse 1996]
The x-kernel : Motivation • To facilitate implementation of efficient communication protocols • Previous abstractions for protocols : • Berkeley Unix “Sockets” : 3 interfaces • System V Unix “Streams” : multiplexors reqd • Adding new protocols : • V-Kernel : a priori knowledge • Mach : Protocols as applications
The x-kernel • Provides OS services, but in a manner that is network oriented • Provides a uniform set of abstractions for encapsulating protocols • Makes it easy to predict protocol performance • Aids in writing efficient protocols
The x-kernel Architecture Protocol : A specification of a communication abstraction through which a collection of participants exchange a set of messages. 3 communication objects : Abstraction => Protocol Object Participant => Session Object Message => Message Object
X-kernel Configuration TCP UDP RPC TCP UDP RPC IP IP ETH ETH Message Object Session Object Protocol Object
X-kernel : Implementation • Support routines : • Buffer Manager : Manipulate messages • Map Manager : Map addresses, etc to capabilities • Event Manager : Alarm Clock facility, time outs • Object Oriented design • Heap allocated data structures • Store state of the object • Array of ptrs to functions that implement the methods
X-kernel : Protocol Objects • Creates session objects • Demultiplex messages to session objects • Operations • “open” : Starts a session at lower layer. • “open_enable” : Gives permission to lower layer to start a session on its behalf. • “open_done” : Initializes a session if it has permission from upper layer. • “demux” : demultiplexes message to an appropriate session.
X-kernel : Session Objects • Session is an instance of a protocol (OOD) • End point of a network connection • Initialized during connection and destroyed when connection is closed • Its state has the capabilities of other sessions and protocols on which it depends • 2 operations • “push” : send a message to a lower session • “pop” : pass a message to an upper session. Invoked by a protocol demux
X-kernel : Message Objects • When going down from the user process : • A system call makes it a kernel process • This shepherds the message through a series of sessions • When it arrives at the n/w kernel boundary : • A kernel process is dispatched to shepherd it through a series of protocol and session objects. • When it reaches the user boundary, the shepherd does an upcall and continues as a user process • This is the process per message approach.
X-Kernel : Example push push push S_1_tcp S_2_tcp S_3_udp pop pop pop “c2,s2” “c1,s1” “c3,s3” P_UDP P_TCP demux demux Open_done Open_done push push “17” “6” S_udp_Ip S_tcp_Ip pop pop “17,h2” “6,h1” P_IP demux Open_done
X-kernel : Overview • It is significantly faster than Unix at the coarse-grained level • Cost of individual protocols is comparable to that in Unix • Addition of new protocols could drastically improve the performance, as in RPC
X-kernel : Overview (Contd… ) • A kernel could be configured with only those protocols needed by the application, eg. Emerald kernel • Under ideal conditions, message delivery could take place without a context switch Disadvantage : There is no protection between protocols…
Integrated Layer Processing • The current protocol structure is not ideal • Load and Stores are expensive, implies caches : • but not suitable for layered message processing • expensive
ILP History : • Clark and Tennenhouse , 1990 • Considered isolated data manipulations • Generalization of Loop Fusion : for (I=0; I <1000; I++) for(I=0; I < 1000; I++) { msgData[I] ++; temp = msgData[I]; for (I=0; I < 1000; I++) temp ++; msgData[I] = ~msgData[I]; temp = ~temp; msgData[I] = temp; } (a) (b)
Original ILP Shortcomings : • Only considered data manipulations • Made no optimizations based on data cached between manipulations in successive layers • They only measured unrolled loops
ILP : Peterson et al. 1993 • An integrated protocol implementation : processing of message in successive protocols is overlapped to achieve a good locality of data reference
ILP • PossibleProblems: • Accommodating awkward data manipulations : different sized data , change of data quantity • Reconciling different views of data : Different layers don’t share same view of data • Satisfying ordering constraints : TCP can’t compute header without checksum computation • Preserving modularity : Easy to modify, debug..
Integrating Data Manipulation Solution : Word Filters • Process one m/c word each time it is invoked • O/p a word implies invoke word filter for next data manipulation • O/p any number of words each time • Use of control constructs and state variables • Flush function when no I/p data ………… analogy to pipelined architecture
ILP : Data Manipulation • A fixed unit of data between layers : • Avoid runtime interpretation • Simple interface • Moreefficient : If you know the requirement of the next layer [Braun et al. 1995] • That is, length of exchanged processing units is LCM of the respective lengths of the 2 filters and the width of the memory bus.
Word Filters : Implementation • Function call overhead : Implies in-lined functions • Filter state variables implemented in memory: Much better if we could force them to be stored in registers. • A word filter is a macro, inserted into the preceding filter where it is invoked, up to the place where it is called. • Implies state variables are stored in registers as local variables.
Word Filters : Performance Significant throughput improvement : • Elimination of load and store instructions • Reduces loop overhead • Eliminates some buffer allocation overhead. Scalability : • Limiting Factor : Register Availability Still performs better than the integrated case
Word Filters : Scalability The knee occurs when the number of local variables exceeds the allotted number of registers. Still ILP performs much better.
ILP • PossibleProblems: • Accommodating awkward data manipulations : different sized data , change of data quantity • Reconciling different views of data : Different layers don’t share same view of data • Satisfying ordering constraints : TCP can’t compute header without checksum computation • Preserving modularity : Easy to modify, debug..
Segregated Messages • Hierarchical Encapsulation : header is opaque to lower layers • Implies modularity • Complicates integration • Add trailers instead of headers • Increases overhead on receiving side • Solution : Segregated Messages
DATA HDR DATA LEVEL N HDR HDRS DATA HDR DATA LEVEL N-1 HDRS DATA HDR LEVEL N-2 DATA Segregated Messages Encapsulated Messages Segregated Messages
Segregated Messages • Hence only the application data manipulation is integrated. • Problem : If application data is in the middle of a unit to be manipulated all at once. • Braun et al. proposed a modification…
Modified Segregated Messages DATA HDR DATA ALIGN 4 bytes 4 bytes 16 bytes Only applicable to Non-ordering constrained functions 8 bytes Encryption takes place 8 bytes at a time. C A B
Modified Segregated Messages Method : • Divide a packet into several parts • Align the last part • Process the part containing the header last • Process the other parts in order Only applicable if : • The functions are non-ordering constrained • Header size is known before the ILP loop is entered.
ILP • PossibleProblems: • Accommodating awkward data manipulations : different sized data , change of data quantity • Reconciling different views of data : Different layers don’t share same view of data • Satisfying ordering constraints : TCP can’t compute header without checksum computation • Preserving modularity : Easy to modify, debug..
Satisfying Ordering Constraints Execute send and deliver in 3 phases : FINAL INITIAL INTEGRATED DATA MANIPULATION INITIAL FINAL INITIAL FINAL
Satisfying Ordering Constraints Final stage is more like a commit phase…
Barriers to Integration • Control Transfer • Message Reassembly • Random Access • Runtime Protocol Path
Protocol Accelerator (PA), Van Renesse 1996. It introduces the following modifications : Header fields that don’t change are sent only once Rest of header info is carefully packed, ignoring layer boundaries Reducing overhead in “send” and “deliver” critical paths. Further Improvements
Conclusions • Where exactly should we use the x-kernel ?? • Can we automate the process of generating integrated optimized protocols ?? • Most of the non data manipulation modifications are application specific. • The data manipulation ILP techniques will we valid as long as memory access is a greater overhead as compared to computation.