230 likes | 384 Views
Control. Fred Kuhns fredk@arl.wustl.edu Applied Research laboratory Department of Computer Science and Engineering Washington University in St. Louis. Virtual Networking – Basic Concepts. Substrate Links interconnect adjacent Substrate Routers. Substrate Router. One or more
E N D
Control Fred Kuhns fredk@arl.wustl.edu Applied Research laboratory Department of Computer Science and Engineering Washington University in St. Louis
Virtual Networking – Basic Concepts Substrate Links interconnect adjacent Substrate Routers Substrate Router One or more Meta Router instances Meta Links interconnect adjacent Meta Routers. Defined within substrate link context substrate links may be Tunneled within existing networks: IP, MPLS, etc.
Adding a Node Install new substrate router Define meta-links between meta nodes (routers or hosts) Create substrate links between peers Instantiate meta router(s)
System Components • General purpose processing engines (PE/GP). • Shared: PlanetLab VM environment. • Local Planetlab node manager to configure and manager VMs • vserver, vnet may change to support substrate functions • Implement substrate functions in kernel • rate control, mux/demux, substrate header processing • Dedicated: no local substrate functions • May choose to implement substrate header processing and rate control. • Substrate uses VLANs to ensure isolation (VLAN == MRid) • Can use 802.1Q priorities to isolate traffic further. • NP blades (PE/NP). • Shared: user supplies parse and header formatting code. • Dedicated: User has full access to and control over the hardware device • General Meta-Processing Engine (MPE) notes: • Use loopback to enforce rate limits between dedicated MPEs • Legacy node modeled as dedicated MPE, use loopback blade to remove/add substrate headers. • Substrate links: Interconnect substrate nodes • Meta-links defined within their context. • Assume an external entity configures end-to-end meta-nets and meta-links • Substrate links configured outside of the node manager’s context
Switch • Switch Blade Specs: • Promentum™ ATCA-2210 • http://www.radisys.com/products/ds-page.cfm?productdatasheetsid=1191 • 20-port 10GE fabric switch • 14 10GE links to user slots • 4 10GE links for external connections (up/cross links) on front panel • 24-port 1GE Base switch • 14 1GE links to users lots • 1GE link to redundant switch blade • 1 10GE and 4 1GE links for external connections (up/cross links) on front panel • Wire-speed L2 and L3 switching • 4K IEEE 802.1Q VLANs • Etc… • Traversing the Switch: • Switching is based on Ethernet Destination Address • Isolation is based on VLAN. • One VLAN will be assigned to each MetaNet present on a Substrate Router. • All switch traffic for a MetaNet will be required to use its assigned VLAN. • Frames from a MetaNet will only be transmitted to a port which is allowed to receive the specified VLAN.
Packet Processing • Key features • 16 32 bit 1.4 GHz Micro-engines • peak instruction rate >20 GIPs • 8 hw contexts per processor • support >50 i/byte (input & output) • pipeline connections for streaming • four QDR SRAM interfaces and three RDRAM interfaces • high IO bandwidth (up to 20G) • Xscale control processor • encryption/decryption engine
System Architecture • General purpose blades. • shared blades run Plab OS • no change to current apps • also support dedicated blades • use separate blade server to preserve ATCA slots for NPs • NP blades. • support dedicated PEs • control from Vserver on PE/GP • shared PE options • shared NP for fast path • shared NP with plugins • 10 GE fabric switch • VLANs used to isolate metarouters • uplinks for connecting to multiple chasses • Good ratio of PEs to LC: 3:1 compute blade with disk Radisys7010 Radisys 7010 with RTM up to 10 1GEinterfaces PE/GP PE/NP Line Card . . . . . . 10 GE Switch Switch Blade 1 GE for control 10 Gb/s for data
Block Diagram of a Meta-Router Control/Management using Base channel (Control Net: IPv4) Meta Interfaces (MI): MI connected to meta-links 1G 2G 1G .5G 1G .5G 0 1 2 3 4 5 MPEk1 MPEk2 MPEk3 data path data path control .1G .1G 3G 3G .1G .1G MPEs interconnected in data plane by a meta-switch. Packet includes Meta-Router and Meta-PE identifier Some Substrate detected errors or events reported to Meta-Router “control” MPE. Meta Switch Meta-Router • Meta-Processing Engines (MPE): • - virtual machine, COTS PC, NPU, FPGA • - PEs differ in ease of “programming” and performance • - MR may use one or more PEs, with possibly different types The first Meta-Processing Engine (MPE) assigned to Meta-Network MNetk called MPEk1
System Block Diagram RTM RTM 10 x 1GbE PE/NP PE/NP PE/GP PE/GP LC LC PCI GP CPU xscale xscale xscale xscale … … … NPU-B NPU-B NPU-A NPU-A TCAM GbE interface 2x1GE 2x1GE X X Fabric Ethernet Switch (10Gbps, data path) Base Ethernet Switch (1Gbps, control) I2C (IPMI) map VLANX to VLANY Node Server Loopback user login accounts Node Manager Shelf manager
Top-Level View (exported) of the Node PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_SHARED) … PE/GP (control, IPaddr) (platform, x86) (type, linux_vserver) … S-Link (type, p2p) (peer, _Desc_) (BW, XGbps) … … … … PE/NP (control, IPaddr) (platform, IXP2800) (type, IXP_DEDICATED) … PE/GP (control, IPaddr) (platform, x86) (type, dedicated) … S-Link (type, p2p) (peer, XXX) (BW, XXGbps) … Exported Node Resource List (Processing engines, Substrate Links) Node Server Substrate Control user login accounts Node Manager
MNetk Control and Management Plane MNetk Data Plane MPEk1 MPEk2 MPEk3 VLANk MNetk MI4 MNetk MNetk MNetk MI0 MI1 MI2 MI3 Substrate: Enabling an MR Allocate control-plane MPE (required) Meta-Router MR1 for MNetk Update host with local Net gateway Allocate data-plane MPEs Host (located within node) Enable VLANk on fabric switch ports PE PE PE 2 1 0 3 local Enable control over Base switch (IP-based) 4 10GbE (fabric) loopback 5 6 7 Update shared MPEs for MI and inter-MPE traffic … LC LC Line card Substrate Use loopback to define interfaces internal to the system node. Define Meta-Interface mappings
Lookup table Lookup table map to Port, Meta Link pair map to Port Meta Link pair … … … … … … … Lookup table map to MR:MI … Block Diagram map received packet to MR and MI Each MR:MI pair is assigned its own rate controlled queue Line Card Line Card Lookup table Shared PE map to MR:MI MR1 … MR2 MR5:MI1 Dedicated PE MR3 Line Card Line Card Fabric Switch Fabric Switch Shared PE/NP MR4 … MR5 1 1 2 2 Meta-Interfaces are rate controlled Shared PE/GP “VM” manager VMM Node Server meta-router Meta-net control and management functions (configure, stats, routing etc). Communicate with MR over separate base switch. Internet Node M. VMM? meta-net5 control Base switch (control) ‘slice’/MN VMs? App-level service
Partitioning the Control plane • Substrate manager • Initialization: discover system HW components and capabilities (blades, links etc) • Hides low level implementation details • Interacts with shelf manager for resetting boards or detecting failures. • Node manager • Initialization: request system resource list • Operational: Allocate resources to meta-Networks (slice authorities?) • Request substrate to reset MPEs • Substrate assumptions: • All MNets (slices) with a locally defined meta-router/service (sliver) have a control process to which it can send exception packets and event notifications. • Communication: • out-of-band uses Base interface and internal IP addresses • in-band uses data plane and MPE id. • Notifications: • ARP errors, Improperly formatted frame, Interface down/up, etc. • If meta-link is a pass-through link then the Node manager is responsible for handling meta-net level errors/event notification. For example link goes down.
Initialization: Substrate Resource Discovery • Creates list of devices and their Ethernet Addresses • Network Processor (NP) blades: • Type: network-processor, Arch: ixp2800, Memory: 768MB (DRAM), Disk: 0, Rate: 5Gbps • General Processor (GP) blades: • Type: linux-vserver, Arch: X, Memory: X, Disk: X, Rate: X • Line Card blades: • not exposed to node manager, used to implement meta-interfaces • another entity creates substrate links to interconnect peer substrate nodes. • create table mapping line card blades, physical links and Ethernet addresses. • Internal representation: • Substrate device ID: <ID, SDid> • If device has a local control daemon: <Control, IP Address> • Type = Processing Engine (NP/GP): • <Platform, (Dual IXP2800|Xeon|???)>, <Memory, #>, <Storage, #> <Clock, (1.4GHz|???)> <Fabric, 10GbE>, <Base, 1GbE>, ??? • Type = Line Card • <Platform, Dual IXP2800> <Ports, {<Media, Ethernet>, <Rate, 1Gbps>}>, ??? • Substrate Links • <Type, p2p>, <Peer, Ethernet Address>, <Rate Limit>, … • Met-Link list <MLid, MLI>, <MR, MRid>, …
Initialization: Exported Resource Model • List of available elements • Attributes of interest? • Platform: IXP2800, PowerPC, ARM, x86; Memory: DRAM/SRAM; Disk: XGB; Bandwidth: 5Gbps; VM_Type: linux-vserver, IXP_Shared, IXP_Dedicated, G__Dedicated; Special: TCAM • network-processor: NP-Shared, NP-Dedicated • General purpose: GP-Shared (linux-vserver), GP-Dedicated • Each element is assigned an IP address for control (internal control LAN) • List of available substrate links: • Access networks (expect Ethernet LAN interface): substrate link is multi-access • Attributes: Access: multi-access, Available Bandwidth, Legacy protocol(s) (i.e. IP), Link protocol (i.e. Ethernet), Substrate ARP implementation. • Core interface: assume point-to-point, Bandwidth controlled • Attributes: Access: Substrate; Bandwidth, Legacy protocol?
Instantiate a router: Register MNet • Substrate assumptions: • All MNets (slices) with a locally defined meta-router/service (sliver) will have defined a control process to which it can send exception packets and event notifications. • Communication: out-of-band uses Base interface and internal IP addresses, in band uses data plane. ??? • Notifications: ARP errors, Improperly formatted frame, Interface down/up, etc. • If meta-link is a pass-through link then the Node manager is responsible for handling errors/event notification. • Node manager Actions: • Request binding of MNidk to allocated device (use SDid from initialization) • Substrate enables VLANk on applicable ports of the fabric switch • Allocate hardware resources (see following discussion for different scenarios) • If control module already instantiated then notify it of the MR location (IP address of control interface). • If creating control entity then register it with any line cards with meta-router interfaces (for exception traffic). ???
Instantiate a router: Register Meta-Router (MR) • Define MR specific Meta-Processing Engines (MPE): • Register MR ID MRidk with substrate • substrate allocates VLANk and binds to MRidk, • Request Meta-Processing Engines • shared or dedicated, NP or GP, if shared then relative allocation (rspec) • shared: implies internal implementation has support for substrate functions • dedicated w/substrate: user implements substrate functions. • dedicated no/substrate: implies substrate will remove any substrate headers from data packets before delivering to MPE. For legacy systems. • indicate of this MPE is to receive control events from substrate (Control_MPE). • substrate returns MPE id (MPid) and control IP (MPip) address for each allocated MPE • substrate internally records Ethernet address of MPE and enables VLAN on applicable port • substrate assumes that any MPE may send data traffic to any other MPE • MPE specifies target MPE rather then MI when sending packet.
Instantiate a router: Register Meta-Router (MR) • Create meta-interfaces (with BW constraints) • create meta-interfaces associated with external substrate links • request meta-interface id (MIid) be bound to substrate link x (SLx). • we need to work out the details of how a SL is specified • We need to work out the details of who assigns inbound versus outbound meta-link identifiers (when they are used). If downstream node then the some entity (node manager?) reports the outgoing label. This node assigns the inbound label. • multi-access substrate/meta link: node manager or meta-router control entity must configure meta-interface for ARP. Set local meta-address and send destination address with output data packet. • substrate updates tables to bind MI to “receiving” MPE (i.e. were substrate sends received packets) • create meta-interfaces for delivery to internal devices (for example, legacy Planetlab nodes) • create meta-interface associated with an MPE (i.e. the endsystem)
Line Cards: Assumptions • Initially use a simplified model • Core interfaces has point-to-point substrate links which correspond (physically or logically) to physical links. • LAN interfaces only support legacy IP traffic
Scenarios • Shared PE/NP, send request to device controller on the XScale • Allocate memory for MR Control Block • Allocate microengine and load MR code for Parser and Header Formatter • Allocate meta-interfaces (output queues) and assign Bandwidth constraints • Dedicated PE/NP • Notify device control daemon that it will be a dedicated device. May require loading/booting a different image? • Shared GP • use existing/new PlanetLab framework • Dedicated GP • legacy planetlab node • other
IPv4 • Create the default IPv4 Meta-Router, initially in the non-forwarding state. • Register MetaNet: output Meta-Net ID = MNid • Instantiate IPv4 router: output Meta-Router ID = MRid • Add interfaces for legacy IPv4 traffic: • Substrate supports defining a default protocol handler (Meta-Router) for non-substrate traffic. • for protocol=IPv4, send to IPv4 meta-router (specify the corresponding MPE).
General Control/Management • Meta routers use Base channel to send requests to control entity on associated MPE devices • Node manager sends requests to central substrate manager (xml-rpc?) • request to both configure, start/stop and tear down meta-routers (MPEs and MIs). • Substrate enforces isolation and policies/monitors meta-router sending rates. • Rate exceeded error: If MPE violates rate limits then its interface is disabled and the control MPE is notified (over Base channel).. • Shared NP • xscale daemon • requests: start/stop forwarding; Allocate shared memory for table; Get/set statistic counters; Set/alter MR control lock; Add/Remove lookup table entries. • Lookup entries can be added to send data packets to control MPE, packet header may contain tag to indicate reason packet was sent • mechanism for allocating space for MR specific code segments. • dedicated NP • MPE controls XScale. When XScale boots a control daemon si told to load a specific image containing user code.
ARP for Access Networks • The substrate offers an ARP service to meta-routers • Meta-router responsibilities: • before enabling interface must register its meta-network address associated with meta-interface • send destination (next-hop) meta-net address with packets (part of substrate internal header). Substrate will use arp with this value. • if meta-router wants to use multicast or broadcast address then it mus also supply the Link layer destination address. So the substrate must also export the Link layer type. • substrate responsibilities • all substrate nodes on an access network must agree on meta-net identifiers (MLIs) • Issues ARP requests/responses using supplied meta-net addresses and met-net id (MLI). • maintain ARP table and timeout entries according to relevant rfcs. • ARP Failed error: If ARP fails for a supplied address then substrate must send packet (or packet context) to control MPE of meta-router.