490 likes | 599 Views
APVx600 Serials Performance Introduction. 2011-01-16. Topics. APVx600 Serials Platform Introduction Performance Numbers of APVx600 Real-world Numbers vs. Lab Numbers FAQ. Topics - Details. APVx600 Serials Platform Introduction Performance Numbers of APVx600 Performance Benchmarks
E N D
APVx600 Serials PerformanceIntroduction 2011-01-16
Topics • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Real-world Numbers vs. Lab Numbers • FAQ
Topics - Details • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Performance Benchmarks • Performance Difference between AMP builds • 1G Multi-Q Improvement in AMP 8.3 • Real-world Numbers vs. Lab Numbers • Summary • SLB – BesTV Scenario • LLB – JSBC Scenario • FAQ
Topics • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Real-world Numbers vs. Lab Numbers • FAQ
APV 1600/1600 Turbo/2600 APV 1600 APV 2600
APV 4600 Hardware APV 4600
APV 5600/6600 Hardware APV 5600 APV 6600
APV 8600/9600 Hardware APV 8600 APV 9600
Topics • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Performance Benchmarks • Performance Difference between AMP builds • 1G Multi-Q Improvement in AMP 8.3 • Real-world Numbers vs. Lab Numbers • FAQ
Performance Numbers of APVx600 • Performance Numbers of APVx600 • Performance Benchmarks • Performance Difference between AMP builds • 1G Multi-Q Improvement in AMP 8.3
Performance Numbers of APVx600 • Performance Numbers of APVx600 • Performance Benchmarks • Performance Difference between AMP builds • 1G Multi-Q Improvement in AMP 8.3
Performance Difference between AMP builds • Most maximum performance numbers of AMP 8.1.x, 8.2.x and 8.3 builds are similar. • AMP 8.3 Multi-Q improves unit performance with small number of ports used
Performance Numbers of APVx600 • Performance Numbers of APVx600 • Performance Benchmarks • Performance Difference between AMP builds • 1G Multi-Q Improvement in AMP 8.3
1G Multi-Q Improvement in AMP 8.3 • In AMP 8.2 and previous builds only 10G port supports multi-Q processing. • In AMP 8.3.0 1G port (IGB driver; Intel 82576 and 82580 NIC *) also supports multi-Q processing: • In AMP 8.2 and previous builds packets received from one 1G port can only be processed by one CPU core; • In AMP 8.3.0 one 1G port can be processed by multiple CPU cores.
1G Multi-Q Improvement in AMP 8.3 • Advantages: • CPU usage of one core will not be the bottleneck *1; • Upgrade from AMP 8.1/8.2 to AMP 8.3, CPU usage should be less*2; • Performance of single or a small number of 1G ports will be improved especially when the port number is less than CPU cores. • Note: • Multi-Q support does not improve performance of the whole unit because total CPU resource does not increase.
1G Multi-Q Improvement in AMP 8.3 • Performance Numbers – 3-Ports CPS & Throughput: The actual ports (3 ports) used are much less than the available CPU Cores (12 cores). 1:1: one request per TCP connection AVG CPU: CPU usage displayed in “show statistics system”. MAX CPU: the max CPU usage displayed in “top –S”; mostly all CPU cores used are at the same level.
1G Multi-Q Improvement in AMP 8.3 • Performance Numbers – CPU usage at the same throughput level: 1:1: one request per TCP connection MAX CPU: the max CPU usage displayed in “top –S”; mostly all CPU cores used are at the same level.
Topics • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Real-world Numbers vs. Lab Numbers • Summary • SLB – BesTV Scenario • LLB – JSBC Scenario • FAQ
Real-world Numbers vs. Lab Numbers • Real-world Numbers vs. Lab Numbers • Summary • SLB – BesTV Scenario • LLB – JSBC Scenario
Summary For Lab numbers, we’ll try to measure the maximum number. A basic model is: • Use all CPU cores for all numbers; • For throughput, use most 1G ports or 10G ports; big packet size (512KB or 1MB); transmit multiple HTTP request/response in one TCP connection, etc. In real-world, performance numbers are often lower than lab values due to such reasons: • Less ports deployed; • Less CPU cores used; • Less packet size; • Less request/response or even a single transition in one TCP connection. • Etc.
Real-world Numbers vs. Lab Numbers • Real-world Numbers vs. Lab Numbers • Summary • SLB – BesTV Scenario • LLB – JSBC Scenario We'll take two real-world scenarios for example to illustrate some technical tips and suggestions for customer deployment. At the same time, one can see both the consistency and inconsistency between real-world numbers and lab values. All below results are based on AMP 8.2 builds.
SLB – BesTVScenario • Background: BesTV (百视通) is one of the biggest IPTV service providers in China. L4 SLB deployed on APV2600, using 3 ports, 1-in-2-out. When one-way Throughput is 290Mbps, CPS is 3593, PPS is 60000, CPU utilization is 42 %.
SLB – BesTVScenario • Online Status: When one-way Throughput is 290Mbps, CPS is 3593, PPS is 60000, CPU utilization is 42 %. • Lab Numbers: Test with 1K object, 13 request/response per TCP connection. • 1-in-2-out topo, when Throughput is 270Mbps, PPS is 68000, CPU utilization reaches 40%. - Similar to actual online status. • 2-in-2-out topo, when Throughput is 270Mbps, CPU utilization is 24%. port1/port2 in, port3/port4 out. BIG IMPROVEMENT!
SLB – BesTVScenario Test with 1K object, 10 request/response per TCP connection. 1:1: one request per TCP connection; INF:INF: infinite requests per TCP connection. Performance of 1:13 should be greater than 1:1 and lower than INF:INF. CPU: the max CPU usage displayed in “top –S”. NIC: All testing are based on 4 on-board ports of APV2600.
SLB – BesTVScenario • APV 2600 Deployment Suggestion: We found some performance inconsistency when using different port groups when deploying APV 2600. • Port1, Port2: Avalanche Client; Port3, Port4: Avalanche Server L7 Throughput: 0.7~0.9Gbps • Port1, Port2: Avalanche Server; Port3, Port4: Avalanche Client L7 Throughput: 1.62Gbps Resolution: • 4-in-4-out, single arm, bonding. • 2-in-2-out, two arm, port3/port4 connects Client; port1/port2 connects Server. * • Use add-on NIC card instead of on-board NIC.
SLB – BesTVScenario • 1K packet size Throughput of APV2600/4600/8600: After collected statistics of many customers, we found 1K packet size is the most typical packet size in Internet. Please refer to this table for real-world deployment. “1 HTTP Request per Connection” should be the value in the strictest sense. CPU: the max CPU usage of CPU cores.
Real-world Numbers vs. Lab Numbers • Real-world Numbers vs. Lab Numbers • Summary • SLB – BesTV Scenario • LLB – JSBC Scenario
LLB – JSBC Scenario • Background: JSBC (江苏广电) is a ISP providing Internet access service via provincial broadcasting and television network. APV9600 deployed LLB; 3-in-3-out topo, routing mode. one-way Throughput reaches 2.7Gbps; average CPU usage displayed in "show statistic system" is 28%, the max usage of CPU core is 85% (top -S).
LLB – JSBC Scenario • Online Status one-way Throughput reaches 2.7Gbps; average CPU usage displayed in "show version" is 28%, the max usage of CPU core is 85% (top -S). Inbound Statistics Outbound Statistics
LLB – JSBC Scenario • Lab Numbers: 1:1: one request per TCP connection; INF:INF: infinite requests per TCP connection. AVG CPU: CPU usage displayed in “show statistics system”. MAX CPU: the max CPU usage displayed in “top –S”; mostly all CPU cores used are at the same level.
LLB – JSBC Scenario • Analysis: • In lab, when one-way throughput reaches 2.6Gbps, the max CPU usage is 90%. It's similar to the actual online status (2.7Gbps, max CPU 85%). • To optimize the performance, traffic load should be well balanced to 12 CPU cores, thus we have two methods: • 2*10G ports, 1-in-1-out; (packets received from 10G port will be processed by all CPU cores ) • 12*1G ports, 6-in-6-out, bonding. (There are 12 CPU cores in APV9600; each core will process packets received from one 1G port)
LLB – JSBC Scenario • Optimize LLB performance in real-world deployment • From above comparison, we know that the more CPU cores are used, the better unit performance will be. • Principle: • When ports used are less than total CPU cores, use as many as ports; • Avoid CPU cores using conflict *. • Next we’ll take JSBC for example, illustrating how to optimize the performance in real-world deployment by considering this principle.
LLB – JSBC Scenario • JSBC’s HW configuration:
LLB – JSBC Scenario • 4x 4-port 1G copper (82580), 2x 4-port 1G fiber (82580), 1x 2-port 10G fiber
Topics • APVx600 Serials Platform Introduction • Performance Numbers of APVx600 • Real-world Numbers vs. Lab Numbers • FAQ