210 likes | 309 Views
CSTS WG Prototyping for Forward CSTS Performance. Boulder November 2011 Martin Karch. Prototyping for Fwd CSTS Performance. Structure of the Presentation Background and Objective Measurement Set-Up Results Summary. Background and Objective. Reports from NASA:
E N D
CSTS WG Prototyping for Forward CSTS Performance Boulder November 2011 Martin Karch
Prototyping for Fwd CSTS Performance • Structure of the Presentation • Background and Objective • Measurement Set-Up • Results • Summary
Background and Objective • Reports from NASA: • Prototyping experimental Synchronous Forward Frame Service (CLTU Orange Book) • CLTU Service approach limits throughput seriously • Target rate (25 mbps) could only be reached if • 7 frames of 220 bytes length • 1 data parameter of one single CLTU • No radiation reports for individual CLTU (only the complete one) • No investigation yet available • Why can throughput not be reached when frames are transferred in a single CLTU? • What is the cost of acknowledging every frame?
Background and Objective • Based on the Reports: • Suggest blocking of several frames into one data parameter for • Potential future Forward Frame CSTS • Process Data Operation/Data Processing Procedure (FW) • Objective of Prototyping • Verify the blocking of data items significantly increases the throughput • Investigate if bottleneck is in service provisioning (actual protocol between user and provider) • Results shall support selection of most appropriate approach for the CSTS FW Forward Specification • Measurements are made for Protocol Performance
Measurement Set-Up 2 machines equipped with Xeon 4C X3460 2.8GHz/1333MHz/8MB 4GB memory Linux SLES 11 64 bit Isolated LAN 1 Gbit cable connection (no switch) SGM (SLE Ground Models) NIS (Network Interface System)
Measurement Set-Up • Provider • SGM • SLE Ground Models • Simulation Environment • SGM changed such that • Receiving Thread puts CLTUs on a Queuefor Radiation • A ‘Radiation Thread’ removes CLTUs and discards them • No further simulation of Radiation process (radiation duration) • User • NIS • Network Interface System • Simulation Environment • NIS is modified to • Create (as fast as possible) CLTU operation objects • Immediately passes them to SLE API for transmission • No interface to a Mission Control System (MCS)
Measurement Set-Up • Basis for all Steps: • SGM based provider • NIS based user • Step 1 Measurements: • Variation of CLTU length • Simulates sending many small CLTUs • In one TRANSFER DATA Invocation (1st approximation) • Step 2 Measurements: • SLE API modified • Aggregate configurable number of CLTU (SEQUENCE OF Cltu) • With minimum annotation (CLTU Id, sequence count) • Send return when last data unit is acknowledged
Step1 / Measurement 1 • Linear curve • Proportional to CLTU size • Constant Processing Time • Independent of CLTU size • SGM + NIS model optimised yes • SLE API optimised no • Nagle + delayed ack on • RTT 0.1 ms
Step1 / Measurement 2 • SGM + NIS model optimised yes • SLE API optimised yes • Nagle + delayed ack on • RTT 0.1 ms
Step1 / Measurement 3 • SGM + NIS model optimised yes • SLE API optimised yes • Nagle + delayed ack off • RTT 0.1 ms
Step1 / Measurement 4 • Processing Time still constant • Transfer-time increased • SGM + NIS model optimised yes • SLE API optimised yes • Nagle + delayed ack off • RTT 400 ms
Step1 / Measurement 5.1 • Msmnt 5.1: Reference Measurement for Measurements with variations of RTT using IPerf • Msmnt 5.2: Measurements using SGM + NIS
Step1 / Measurement 5.2 • Shows influence of transmission time only • Delay is dominating factor • As expected (1/RTT) • Ratio Msmnt/Iperf = 0.165 (1544) • Ratio Msmnt/Iperf = 0.153 (1000) • SGM + NIS model optimised yes • SLE API optimised yes • Nagle + delayed ack off • RTT variable
Step1 / Measurement 5 (2) • Operates with Maximum Send and Receive Buffer • Question: • How big must the window size be to achieve similar throughput values like above ( for the example of 40 Mbit/sec) • Maximum Data Rate = Buffer size/RTT
Step 1 Measurements Summary • Linear increase of data rate with CLTU length • sending as fast as possible • no network delay • Constant Processing Time • Best results with • Optimised Code • 5 to 10 % performance increase (optimised SLE API only) • Nagle and Delayed Ack. switched off • (factor 2.5 lower when Nagle Alg. and Delayed Ack. are both on) • No network delay • Network delay 200 ms (400 RTT) • Performance decrease of a factor of 400 compared to Measurement 2 (the best one) • Maximum Data Rate = Buffer size/RTT • We have to take care on the size of the CLTU
What is the Cost of Confirmed Operations Data unit size = 8000 byte CLTU: 207.57 Mbps RAF: 318.32 Mbps ( Frame size 8000 byte, 1 frame/buffer) Increase by 53% Data unit size = 2000 byte CLTU: 53,36 Mbps RAF: 85.64 Mbps ( Frame size 2000 byte, 1 frame/buffer) Increase by 60%
Effects of Buffering (RAF) Frame size = 2000 byte, 1 Frame/buffer: 85.64 Mbps Concatenation of 80 frames of 100 byte into a buffer back-to-back and then passed to the API as one frame: 322.43 Mbps
RAF Measurment Configuration Frame Generator frame SLE Service Provider Application SLE Service User Application frame frame SLE API SLE API transfer buffer TCP (local) transfer buffer Communication Server TCP
Cost of ASN.1 Encoding (RAF) • Result of profiling for RAF, frame size 100 byte, 80 frames per buffer: • Encoding of Transfer Buffer including all contained frames: 6.42% • Encoding of Transfer Buffer Invocation alone: 2.31% • Effects might be caused by increased interactions / interrupts, etc.
Summary of Observations • Size of the data unit transferred has a significant impact • Almost constant end to end processing time independent of buffer size • Liner increase of net bitrate with data unit size • Large impact on network delay due to TCP (expected) • Significant additional cost of using confirmed operations • Buffering of frames vs, transfer in individual frames • 4 frames of 2K per buffer vs single 2K frames: factor 1.9 • BUT: throughput for a single large data unit is much larger than buffer of same size containing multiple small units • ASN.1 encoding for worst case test accounts for 6.4% of overall local processing time