320 likes | 560 Views
Commercial Network Processor Architectures Agere PayloadPlus Vahid Tabatabaee Fall 2007. References. Title: Network Processors Architectures, Protocols, and Platforms Author: Panos C. Lekkas Publisher: McGraw-Hill Agere PayloadPlus Family White Papers
E N D
Commercial Network Processor Architectures Agere PayloadPlus Vahid Tabatabaee Fall 2007
References • Title: Network Processors Architectures, Protocols, and PlatformsAuthor: Panos C. LekkasPublisher: McGraw-Hill • Agere PayloadPlus Family White Papers • Payload+: Fast Pattern Matching & Routing for OC-48, David Kramer, Roger Bailey, David Brown, Sean Mcgee, Jim Greene, Robert Corley, David Sonnier, (Agere Systems) in Hot Chips a Symposium on High Performance Chips, Aug. 19-21, 2001 • Agere Product Brief documents for FPP, RSP, ASI and FPL. • Agere White paper: “The case for a classification Language”, Feb. 2003.
General Information • Agere PayloadPlus is a comprehensive networking processor solution for OC-48. • It has expanded to support OC-192 through the NP10/TM10 (renamed to APP750NP and APP750TM). • This product is discontinued since then. • Originally this was a 3 chip solution but later on it was integrated into a single chip solution. • We review the original solution and APP550 (single chip) which their info. is on the Agere website.
The Big Picture The network processor family has a pipeline architecture and includes (in the original 3 chip solution): • Fast Pattern Processor (FPP) • Takes data from PHY chip • Protocol recognition • Classification • based on layer 2 to 7 • Table lookup with millions of entries and variable lengths • Reassembly • Routing Switch Processor (RSP) • Queueing • Packet Modification • Traffic Shaping • QoS processes • Segmentation • Agere System Interface (ASI) • Management • Tracks state information • Support for RMON (Remote Monitoring)
The 3 Chip Solution • POS-PHY: Packet Over Sonet – PHYsical • UTOPIA: Universal Test & Operation Phy Interface for ATM • FBI: Functional Bus Interface Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf
Main Responsibilities and Interfaces • FPP receives data from the PHY over a standard interface that can be POS PHY Level 3 (POS-PL3) or a UTOPIA 2 or 3 interface. • FPP classify traffic based on the contained at layer 2 to 7. • FPP send packet over POS-PL3 to RSP. • RSP is responsible for • Queueing, packet modification, shaping, QoS tagging, Segmentation. • The ASI chip is responsible for • Exceptions, maintains state information, interface to host processor, configure FPP and RSP over the CBI interface. • The management-Path Interface (MPI) enables the FPP to receive management frames from the local host. • Functional Bus Interface (FBI) connects the FPP to ASI to externally process function calls.
Memory • 64 bit standard PC-133 synchronous dynamic random access memory (SDRAM) • 133 MHz pipelined zero bus turnaround (ZBT) synchronous static random access memory (SSRAM). • PayloadPlus can use standard off-the-shelf standard DRAM for table lookups and does not need expensive and power hungry Content Addressable Memory (CAM). • Typical power limit for a line card is 150 W.
FPP Features • Programmable classification from layer 2 to 7 • Pipelined multi-threaded processing of PDU • High-level Functional Programming Language (FPL) that implicitly takes care of multiple threads • ATM re-assembly at OC-48 rates (eliminates external SAR) • Table lookup with millions of entries • Eliminates need for external CAMs • Deterministic performance regardless of the table size • Configurable UTOPIA/POS interfaces
FPP Protocol Data Unit (PDU) • FPP is a pipelined multithreaded processor that can simultaneously analyze and classify up to 64 protocol data units (PDU). • Each incoming PDU is assigned its own processing thread which is called a context. • Each PDU consists of one or multiple 64-byte blocks • The context is a processing path that keeps track of: • All blocks of PDU. • Input port Number of the PDU • Data offset for the PDU • The last block information • Program variable associated with the PDU • Classification information of the PDU
FPP Block Diagram Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf Source: http://nps.agere.com/support/non-nda/docs/FPP_Product_Brief.pdf
FPP Functional Description • The input framer frames incoming data into 64 byte blocks. • It writes blocks into the data buffer (SDRAM) and into block buffers and context memory. • The block buffer stores data that are being processed and other associated context data for the execution of the FPP operations on the incoming data. • The output interface sends the PDU and their classification information to the RSP. • The Pattern Process Engine (PPE) performs pattern matching to determine how the incoming PDUs are classified. • The Queue Engine manages FPP replay contexts, provide address for block buffers and maintains information on blocks, PDUs and connection queues.
FPP Functional Description (two pass) • FPP processes bit streams in two passes. • In the first pass the PDU blocks are read into the queue engine memory • It produces data blocks as separate 64-byte blocks • The data offsets of each block is determined • Links between individual blocks that compose a PDU is established. • The PDU type is identified • In the second pass (replay phase) as the PDU is replayed from the queue engine • The PDU is processed as a whole entity. • Pattern matching is executed • At the same time PDU transmission toward the output unit is done.
FPP Top Level Flow Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf
RSP (Traffic Manager) Features • 64K queues • Programmable shaping (such as VBR, UBR, CBR) • Programmable discard policies (RED, WRED, EPD) • Programmable QoS (CBR, VBR, UBR) • Programmable CoS (Fixed Priority, Round Robin, WRR, WFQ, GFR) • Programmable packet modification • Support for multicast • Generates required checksums/CRC
RSP overview Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
RSP Functional Description • RSP handles classification and analysis results of the FPP on the incoming PDU. • It supports up to 64 logical input port. • For each PDU there is a command from the FPP that instructs RSP how to handle the PDU. • The PDU is added to a queue and stored in the PDU SDRAM. • RSP supports up to 64K programmable queues. • Processed data is output on a configurable 32-bit interface • There is also an 8-bit POS-PHY level 3 management interface. • RSP uses custom logic and three Very Large Instruction Word (VLIW) compute engines to process PDU
VLIW Compute Engines • The compute engines operate in a pipeline fashion • Each compute engine is dedicated to a processing function • Traffic Management Engine enforces, discard policies, and keeps queue statistics. • Traffic Shaper Engine ensures QoS and CoS for each queue. • Stream Editor Engine performs necessary PDU modifications • In each queue definition, the RSP includes, destination, scheduling information, and pointers to programs for each of the three VLIW compute engines. • Therefore, RSP can run multiple protocols at the same time. • The external CPU can also add queue definitions to set up ATM virtual circuits, for example.
RSP Data Flow The RSP 3 major processing stages: • Prepares and queues the PDU for scheduling • Assembles the blocks into a PDU in SDRAM • Determines the destination queue • Determines if the PDU should be queued. If it should, it is added to the appropriate queue for scheduling • Selects the next PDU block to be scheduled • Selects the physical port • Selects the logical port • Selects the scheduler • Selects the QoS queue Selects the CoS queue • Modifies and transmits the PDU on the appropriate output ports • Adjusts the QoS transmit intervals and CoS priority • Performs PDU modifications • Perform AAL5 CRC if necessary http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
Hierarchical Scheduling (Internal Scheduling Logic) • Channels: The output interface supports a 32-bit data channel which supports 1-4 POS-PHY or UTOPIA channels. It also has an 8-bit management output. • Physical Ports: Physical output ports are assigned to channels. There are up to 32 physical ports since there are 32 back pressure signals. • Logical ports: The RSP supports up to 256 logical output ports. • Schedulers: A set of schedulers is defined for each logical port. The RSP supports CBR, VBR and UBR schedulers. • QoS queues: Each of the QoS queues is assigned to a single scheduler. • COS queues: Up to 16 CoS queues feed a single QoS queue. http://nps.agere.com/support/non-nda/docs/RSP_Product_Brief.pdf
ASI • ASI seamlessly integrates FPP and RSP with the host processor. • It makes it possible for the designer to do the following: • Centralized initialization and configuration of the NP system and its physical interfaces. • Send routing and VPI/VCI updates to the system. • Implement various routing and management protocols. • Handle any occurring exceptions. • ASI enables high speed flow-oriented state maintenance: • Gathering Remote Network Monitoring (RMON) statistics • Time stamping packets • Checking Packet Sequence • Policing ATM and frame relay up to OC-48 rates • 8-bit POS-PHY interface over which the ASI sends packets to the FPP and receives them from RSP
How Does ASI Work? • It has a PCI interface for communication with host processor. • 32-bit high speed interface (FBI) to get functional call from FPP. • Two ALUs for processing FPP external function requests for: • Maintaining state and statistics. • Policing (leaky bucket) • Two SSRAM interface to allow memory access for different tasks without contention http://nps.agere.com/support/non-nda/docs/ASIProductBrief.pdf
ASI Configuration Capabilities • ASI enables host processor to configure up to 8 devices • The configuration bus is compatible with both Intel and Motorola bus formats. • It is used to : • Initialize and configure FPP and RSP • Load the program code for the FPP and RSP • Load the dynamic updates to the FPP tables and RSP queues • Configure third party external framers and physical interfaces
Policy and Conformance Checking • ASI performs conformance checking or policing for up to 64k connections at OC-48 rate. • It only does marking, not scheduling or shaping • Several variations of GCRA (leaky-bucket) algorithm can be used • For the dual leaky bucket case, the ASI indicates whether cells or frames are compliant or not and from which bucket the nonconformance was derived.
FPL • FPL is a functional language for classification. • In the functional language the programmer tells the computing resources what to do rather than how to do it. • In FPL you describe the protocol and the actions to process them. • In C you have to say how to process protocols. • FPL codes would be much shorter, easier to debug, and modify.
FPL Main Features • Fast pattern matching and classification of the data stream. • Defining functions for the FPP to execute based on the recognized patterns • Easy to read semantics • Dynamic updating of the code in the FPP • Software development tool set
Two Pass Processing • Recall the two pass processing in FPP • The first pass does preliminary process such as identifying the PDU type. • In the second pass (replay) it can simply transmit the PDU and conclusions or process a higher level protocol. • The queue engine allows you to process PDUs embedded in higher layer protocols in the replay phase.
Sample FPL Program Flow Source: http://nps.agere.com/support/non-nda/docs/FPL_Product_Brief.pdf
FPL code example Source: http://www.hotchips.org/archives/hc13/3_Tue/13agere.pdf
Dynamic FPL Program Changes • You can add and delete certain types of FPL statements from the image code in FPL dynamically. • FPL supports two types of pattern statement structures: • Single-rule patterns have a single pattern to match with one or two functions to perform. • These are called flows • These can not be added or removed dynamically • Multiple rule pattern statements allow you to define tables • This is used to define IP routing tables • These are called trees • You can add or delete statements from existing trees • You can not add a tree dynamically
Performance of the Network Processor • Drop in the performance due to the N+1 problem
Network Processor Performance • Performance evaluation for a mixture of packet sizes • Performance drops when the number of computations per packet increases