240 likes | 260 Views
This paper presents a methodology for evaluating the runtime support in network processors, including multi-core systems-on-chip, programmability, high packet processing rate, control processors, co-processors, memory hierarchy, and interconnection. The evaluation methodology is based on traffic representation and an analytical system model. Results are provided for three example runtime support systems: ideal allocation, full processor allocation, and partitioned application allocation.
E N D
A Methodology for Evaluating Runtime Support in Network Processors University of Massachusetts, Amherst Xin Huang and Tilman Wolf {xhuang,wolf}@ecs.umass.edu
Runtime Support in Network Processor • Network processor (NP) • Multi-core system-on-chip • Programmability & high packet processing rate • Heterogeneous resources • Control processors • Multiple packet processors • Co-processors • Memory hierarchy • Interconnection • Runtime support • Dynamic task allocation IXP 2800
General Operation of Runtime Support in NP • Input • Hardware resources • Workload • Mapping method • Output • Task allocation • Dynamic adaptation • Different runtime support systems • Difficult to compare AP3 AP2 AP2 AP3 AP3 AP1
Contributions • Evaluation methodology • Traffic representation • Analytical system model based on queuing networks • Results • Specific: 3 example runtime support system • Ideal Allocation • Full Processor Allocation • R. Kokku, T. Riche, A. Kunze, J. Mudigonda, J. Jason, and H. Vin. A case for run-time adaptation in packet processing systems. In Proc. of the 2nd workshop on Hot Topics in Networks (HOTNETS-II), Cambridge, MA, Nov. 2003 • Partitioned Application Allocation • T. Wolf, N. Weng, and C.-H. Tai. Design consideration for network processor operating systems. In Proc. of ACM/IEEE Symposium on Architectures for Networking and Communication System (ANCS), pages 71-80, Princeton, NJ, Oct. 2005
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Workload • NP workload is characterized by applications and traffic • How to represent workload?
Dynamic Workload Model • Workload graph: • Application/Task: T • Traffic: • Processing requirement: • Example: • Processing requirement: • R. Ramaswamy and T. Wolf. PacketBench: A tool for workload characterization of network processing. In Proc. of IEEE 6th Annual Workshop on Workload Characterization (WWC-6), page 42-50, Austin, TX, Oct. 2003
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Runtime System Model • Unified approach for all runtime systems • Queuing networks • Specific solution for each runtime system • Runtime mapping: • Graph: • Packet arrival rate: • Service time: • Metrics for all runtime systems • Processor utilization: • Average number of packets in the system:
Three Example Runtime Support Systems • System I: Ideal Allocation • System II: Full Processor Allocation • System III: Partitioned Application Allocation
Example Evaluation Model – System I • Ideal Allocation • All processors can process all packets completely • Unrealistic, but can provide baseline M/G/m FCFS single station
M/G/m Single Station Queuing System • Cosmetatos approximation • Evaluation metrics G. Cosmetatos. Some Approximate Equilibrium Results for the Multiserver Queue (M/G/r). Operations Research Quarterly, USA, pages 615 – 620, 1976 G. Bolch, S. Greiner, H. de Meer, and K. S. Trivedi. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley & Sons, Inc., New York, NY, August 1998
Example Evaluation Model – System II • Full Processor Allocation • Allocate entire tasks to subsets of processors • Allocate as few processors as possible to save power • One processor run one type of task • Reallocation is triggered by queue length BCMP M/M/1-FCFS model (Jackson network)
BCMP Network • BCMP: Basket, Chandy, Muntz, and Palacios • Characteristics: Open, closed, and mixed queuing network; Several job classes; Four types of nodes: M/M/m–FCFS (class-independent service time), M/G/1–PS, M/G/∞–IS, and M/G/1–LCFS PR • Product-form steady-state solution: • Open M/M/1-FCFS BCMP Queuing Network: • Evaluation metrics: F. Baskett, K. Chandy, R. Muntz, and F. Palacios. Open, Closed, and Mixed Networks of Queues wit Different Classes of Customers. Journal of the ACM, 22(2): 248 – 260, April 1975
Example Evaluation Model – System III • Partitioned Application Allocation • Tasks be partitioned across multiple processors • Synchronized pipelines • Allocate tasks equally across all processors to maximize throughput • Reallocate at fixed time intervals Equations for evaluation metrics are the same as System II. BCMP M/M/1-FCFS model (Jackson network)
Outline • Introduction • Evaluation Methodology • Dynamic Workload Model • Runtime System Model • Result • Summary
Setup • System • 16 100MIPS processing engines • Queue lengths are infinite • Workload • Other assumptions • Partition applications into 7-15 subtasks
Processor Allocation Over Time • Ideal: • 16 processors • Full Processor: • Change with traffic • Partitioned Application: • 16 processors Full processor allocation system
Processor Utilization Over Time • Ideal: • Lowest processor utilization • Full Processor: • Highest processor utilization because using fewer number of processors • Partitioned Application: • Low processor utilization • Not equal to ideal case due to the unbalanced task allocation and pipeline overhead
Packets in System Over Time • Ideal: • Least number of packets • Full Processor: • Packets queued up due to its high processor utilization • Partitioned Application: • Most number of packets due to unbalanced task allocation and pipeline overhead • More stable performance because of finer processor allocation granularity
Performance for Different Data Rates • Ideal: • Smooth increase • Full Processor: • Periodical peak • Partitioned Application: • Smooth increase • The maximum data rate supported by the systems • Ideal: 100% • Full Processor: 79.6% • Partitioned application: 75.1%
Implication of the Results • Ideal Allocation • Provide a base line • Full Processor Allocation • Allocate as few processors as possible to save power • Use entire processor as the allocation granularity • Good: High processor utilization • Bad: High performance variance • Partitioned Application Allocation • Equally distribute tasks on all the processors • Finer processor allocation granularity • Good: Stable performance • Bad: Difficult to get optimized solution => pipeline synchronization overhead
Summary • Analytical methodology for evaluating different runtime support NP systems • Dynamic workload model and runtime system model • Results: 3 example runtime support systems • Quantitative metrics • Tradeoffs