1 / 26

Efficient Router Protocol Implementation: Balancing Speed and Stability

Dive into the trade-offs between responsiveness and stability in router architectures. Understand the nuances of protocol implementation, Quagga usage, and achieving faster convergence. Explore configurations, timers, and tasks while managing parallelism through events or threads. Learn to handle complex protocol tasks efficiently and optimize protocol performance. Explore the anatomy of routing protocols, scheduling strategies, and the big question of events versus threads.

kchute
Download Presentation

Efficient Router Protocol Implementation: Balancing Speed and Stability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3 • Responsivness vs. stability • Brief refresh on router architectures • Protocol implementation • Quagga

  2. To read • To present • Threads vs. events (belegrakis) • U-Loop prevention • Sub-millisecond convergence (lekakis) • REFS • Netlink • Quagga manual • Router architectures

  3. How to be faster • Faster SPF • Better algorithms • Incremental SPF • Faster detection • Faster HELLOs • BFD!!! • In the line card instead of the control plane • many protocols can share • Faster FIB download • Download “important” prefixes first • Do things faster • Trigger SPF immediately • Trigger LSA origination immediately

  4. How to be stable • SPF may be expensive • Can not do SPF all the time something minor changes, may be better to do one SPF for all changes • Avoid extra FIB downloads • Do not overload the CPU • Do not want to sent too many updates at once • Receiver may get overloaded • Do not want to trigger updates too quickly • Link may be flapping • When CPU/links are loaded ensure that I do not miss important things • Do not miss HELLOs, will make things worse

  5. Configuration: Timers • Hello timer, dead timer • LSA update delay • LSA pacing • LSA retransmission pacing • SPF delay • Wait for this time before you do SPF • SPF hold-time • Do not do another SPF before this time passes • Can have dynamic timers • Be fast when CPU is idle • Be slow when CPU is loaded

  6. It is difficult • Speed and stability are conflicting goals • Alternatively: Disconnect convergence from data plane • Avoid u-loops • See overview • Have alternate next-hops pre-computed and switch to them in case of failures • We will see this later

  7. Anatomy of a protocol (routing or not) • Inputs • Static configuration • Other protocol instances in the network • Other components in my platform • Dynamic events on my platform; link states etc… • State • Transient (packets, queues) • Protocol • Computation • Triggered (process incoming packet) • Periodic (timers) (refresh state, LSA) • Outputs • To other protocol instances on the network • To other components in the platform • To FIB

  8. Examples of protocol tasks • Receive, send protocol packet(s) • Schedule/process timers • Perform computations (I.e SPF) • Communicate with other components • Download Routes to FIB • Process changes in the environment • Link state changes • Adjacency changes • Process configuration and configuration changes

  9. Tasks • Can be complex and long running • Download 1,000 routes to FIB • Originate 500 LSAs • Perform an SPF in a large network • Usually protocol runs on one CPU • Have to multiplex tasks

  10. Scheduling • Scheduling of tasks is what makes or breaks an implementation • Liveness • even when I download 100,000 routes to the FIB, I can receive and process LSAs • Stability • Prioritize tasks • Send the hellos first even under load • Never skip important tasks when overloaded • Shed excess load so that I do not collapse • Queue incoming packets and start dropping if queue becomes too long • Slow down the SPFs…

  11. The big question • How to implement/handle parallelism • Events vs. threads • Events • trigger event handlers that are essentially function calls • Run to completion, I.e. until the function returns • Threads • flows of execution with their own local state/stack • Can be suspended and resumed • With pre-emptive threads system may switch to another thread along the way • With non-preemptive threads I have to yield

  12. How does my protocol look with events • Assign events and event handlers • Packet receive, packet send, spf etc… • Event loop (A.K.A the big select() loop) • Loop waiting for events • Incoming packet, timer, signal other event • Pick the next event to handle • According to my own scheduling • Call its event handler • When I want to initiate an action I post an event • Put the packet in a queue • Schedule a Packet_send event

  13. How does my protocol look with threads • FIG! • Assign tasks to threads • Packet_rx thread, packet_tx thread, FIB_download thread • Thread blocks when there is no work to do • Packet_rx on the socket, FIB_download on a cond variable • It is unblocked when there is work • System handles the scheduling of the threads • May not have control in it

  14. Events Manage my own state Manage my own scheduling I explicitly handle parallelism by controlling when a event handler terminates If I want to suspend an event handler must take care of its state Threads Can arbitrarily suspend/resume a thread State is automatically managed in the thread stack The thread scheduler has control With pre-emptive threads system handles parallelism But I have to LOCK Events vs. threads

  15. Pros I have total control of everything and I can do what is best Handle parallelism explicitly no need for locking, etc May be more efficient No context switches and state saving there Cons I have total responsibility of everything, system does not help me If I want to yield to another handler need to take care of the state myself, I.e. stop a long SPF in the middle Events

  16. Pros Parallelism is handled in a more clean and natural way System helps a lot in scheduling, state copying Cons Real parallel programming is hard Locking etc State copying can be expensive Thread scheduler may be making the wrong scheduling decisions Not application specific Threads

  17. An example: Quagga • First some router architecture • Forwarding and control plane • Forwarding plane has to be fast • NPs, FPGAs, ASICs, little bit inflexible • Control plane is usually implemented in a commodity processor • Commodity OS, environment and tools

  18. Big and Small Routers • How does a large router look? • EXAMPLE control vs. forwarding plane • line-cards, switch, FIB per-line-card, control processor • How does a PC router look? • EXAMPLE • Kernel for the forwarding • Use space for the control plane

  19. Distributed control planes • I want resiliency and minimal fate sharing • Break the control plane into components that are independent • Processes • One process per-protocol • It was a novelty 6 years ago, now everybody has it • May need to share some state • Need to prioritize between multiple routes • Redistribution: later

  20. Quagga: a distributed control plane for a PC router • Multiple processes • One per-protocol • Zebra • manage all the routes from all protocols • send routes to the FIB (kernel) • Centralize the management of local interfaces etc…

  21. Communication • EXAMPLE of system • Zebra – protocols talk to each other through a private control protocol • Over a TCP socket • Protocols send their packets directly to the interfaces • But send their routes to zebra • Over a TCP socket • Zebra talks to the kernel through netlink

  22. Paths • Interface down • Kernel to zebra through netlink • Zebra to protocols through private proto • Route download • Protocol to zebra through private proto • Zebra to kernel through netlink • OSPF Hellos • Directly from OSPF to interfaces and back • Data packets • Never leave the kernel

  23. Zebra protocol • Interface • Add, delete, addr-add, addr-delete, up, down • Route • Ipv4-add, ipv4-del, ipv6-add, ipv6-del • Redistribute • Add, del

  24. Netlink • Uses a special socket • Very powerful • Read and change interface state • Read and change interface configuration • Read and change routing tables • And MPLS, scheduling…. • And efficient • Multicast some notifications

  25. Configuration and management • Prompt based configuration and management • telnet localhost 2601 for zebra • telnet localhost 2604 for ospf

  26. Implementation • Directories • Zebra, ospf, lib for common functions • Event based (but confusingly called threads) • Main loop in lib/thread.c thread_fetch() • Considers: sockets, timers, signals • Timers are used as a general event mechanism • If I want to do something now, I schedule a timer with 0 expiration • Netlink interface in zebra/rt_netlink.c

More Related