600 likes | 774 Views
CS244 Spring 2014 Network Verification. “ Header Space Analysis: Static Checking for Networks ” [Kazemian, Varghese, McKeown 2012]. George Varghese. Context. Peyman & Nick : geometric model for slicing for SDNs to avoid interference I visit Stanford for sabbatical
E N D
CS244 Spring 2014 Network Verification “Header Space Analysis: Static Checking for Networks”[Kazemian, Varghese, McKeown 2012] George Varghese
Context Peyman & Nick : geometric model for slicing for SDNs to avoid interference I visit Stanford for sabbatical We realize its good to generalize Realize we had to add a formal model and could go beyond slices to reachability First algorithms were very slow. After SIGCOMM reject, optimized like crazy
More context: Networks Today… OSPF IPv4 UDP Spanning tree MPLS ICMP IPsec NAT RSVP VLAN ARP GRE IPv6 BGP IGMP TCP • Mess of protocols: • Crude tools: Ping, Traceroute, tcpdump, SNMP, NetFlow • Switches tuned with arcane config files • Kept working by “masters of complexity”
SNIPPET FROM CONFIG FILE . . LIKE WRITING MACHINE CODE MICROSOFT CORE - 2000 LINES. HUMAN ERRORS. HUGE OPEX COSTS
Simple questions are hard to answer with tools today • What are all the packet headers from A that can reach B? • What will happen if I remove an entry from a firewall? • Is Group X provably isolated from Group Y? • Are there any loops in the network? • Why is my network slow?
Q: Why is reachability hard today? Was it easier when the Internet started? Will it become easier with SDNs so all of this will go away? (many students). Can you imagine a world where SDNs could make reachability questions harder?
MOTIVATION: OPEX NOT CAPEX ANECTODAL: HOW MIGHT YOU DO A MEAUREMENT STUDY TO REALLY VALIDATE THAT SUCH BUGS ARE WORTH FINDING. DATA SET? • Internal:> 1 hr customer visible outage/quarter (P. Patel) • Azure: 30,000 cores down 3 hrs, Sept 2012, L2/L3 config • Bing: Entire data center, 8 hours, early 2012, L2/L3 config • GNS: Hotmail, Skydrive down 8 hours, 2011, new config • External: (2012 Operator Survey): • 35% had at least 25 tickets per month; > 1 hour to resolve • Cost of downtime on average: $500K/hour + reputation cost
Q: A cloudy future? Do the emergence of cloud services affect the CAPEX versus OPEX argument?
Network Verification Vision Input ACL IP table Output ACL MAC Table Spanning Tree VLAN Table Filtering Rules ARP Table MAC Table Spanning Tree Input ACL IP table Output ACL MAC Table MPLS Mappings IP Table ARP Table MAC Table Spanning Tree Juniper Cisco
Network Verification Vision Input ACL IP table Output ACL MAC Table Spanning Tree VLAN Table Filtering Rules ARP Table MAC Table Spanning Tree Input ACL IP table Output ACL MAC Table MPLS Mappings IP Table ARP Table MAC Table Spanning Tree
Insight: Treat Network as a Program 3 HACKER’S VIEW Packet Forwarding 1 2 Match Action + + 0xx1..x1 11xx..0x Send to port 2 Rewrite with 1x01xx..x1 Send to port 3 Rewrite with 1xx011..x1 VERIFIER’S VIEW ROUTER ABSTRACTED AS SET OF GUARDED COMMANDS . . NETWORK BECOMES A PROGRAM CAN USE PL TOOLS Model header as point in high dimensional space and all networking boxes as transformers of header space
Header Space Framework Header Data 01110011…1 0xxxx0101xxx L Step 1 - Model a packet, based on its header bits, as a point in {0,1}L space – The Header Space
Header Space Framework 1101..00 Transfer Function: 3 Packet Forwarding 1 1110..00 2 Match Action + + 0xx1..x1 11xx..0x Send to port 3 Rewrite with 1xx011..x1 Send to port 2 Rewrite with 1x01xx..x1 Step 2 – Model all networking boxes as transformers of header space
Transfer Function Example 1 2 3 (h,1) if dst_ip(h) = 172.24.74.x T(h, p) = (h,2) if dst_ip(h) = 172.24.128.x (h,3) if dst_ip(h) = 171.67.x.x • IPv4 Router – Forwarding Behavior • 172.24.74.x Port1 • 172.24.128.x Port2 • 171.67.x.x Port3
Transfer Function Example 1 2 3 (rw_mac(dec_ttl(h),next_mac) , 1) if dst_ip(h) = 172.24.74.x T(h, p) = (rw_mac(dec_ttl(h),next_mac) , 2) if dst_ip(h) = 172.24.128.x (rw_mac(dec_ttl(h),next_mac) , 3) if dst_ip(h) = 171.67.x.x • IPv4 Router – forwarding + TTL + MAC rewrite • 172.24.74.x Port1 • 172.24.128.x Port2 • 171.67.x.x Port3
Example Actions: • Rewrite: rewrite bits 0-2 with value 101 • (h & 000111…) | 101000… • Encapsulation: encap packet in a 1010 header. • (h >> 4) | 1010…. • Decapsulation: decap 1010xxx… packets • (h << 4) | 000…xxxx • TTL Decrement: • if ttl(h) == 0: Drop • if ttl(h) > 0: h – 0…000000010…0 • Load Balancing: • LB(h,p) = {(h,P1),…(h,Pn)}
Q: Timeout: Bit vectors? Really? What was the past approach? What do we gain by collapsing headers? What do we lose? What can’t we model using functions? Tung Paper says its header independent but all models use fields. Cheating?
Composing Transfer Functions R1 R2 R3 T1(h, p) We can determine end to end behavior by composing transfer functions,
Inverting Transfer Functions T(h,p) T-1(h,p) Input Header Space Output Header Space Tell us all possible input packets that can generate an output packet.
Q: Who cares? Why is composability important? Why is invertibility important? How is invertibility possible when these are not functions but relations?
Header Space Framework • Step 3- Header Space Set Algebra. • Intersection • Complementation • Difference • Check subset and equality condition. • Every region of Header Space, can be described by union of Wildcard Expressions. (example: 10xx U 011x) • Goal: do set operation on wildcard expressions.
HS Set Algebra- Intersection wildcard empty • Bit by bit intersect using intersection table: • Example: • If result has any ‘z’, then intersection is empty: • Example:
Q: Who cares about algebras? How are wildcard packets represented? Why is the efficiency of these operations crucial? What about soundness and completeness (Zhao)
Header Space Framework • Simple abstraction that gives us: • Common model for all packets • Header Space. • Common model for forwarding functionality of all networking boxes. • Transfer Function. • Mathematical foundation to check end-to-end properties about networks. • T(h,p) and T-1(h,p). • Set operations on Header Space.
Finding Reachability All Packets that A can possibly send All Packets that A can use to communicate with B All Packets that A can possibly send to box 2 through box 1 A T-11 Box 1 Box 2 T2(T1(X,A)) T-12 T-11 T1(X,A) T4(T1(X,A)) T-14 T-13 All Packets that A can possibly send to box 4 through box 1 Box 4 Box 3 B T-13 T3(T2(T1(X,A)) U T3(T4(T1(X,A))
Q: This seems too simple? Why is this not brute force simulation? (Quote) Similar to taint analysis. Compare. Relation to SAT solvers? Model checkers. Sets of solutions versus 1: AllSAT Why does it work well. Example where it works badly. Linear fragmentation? Scalability
Predicates on Paths: Policies • So far only reachability but can generalize to check path predicates such as: • Blackhole freedom (A B and notice unexpected drop) • Communication via middle box. (AB packets must pass through C) • Maximum hop count (length of path from A B never exceeds L) • Isolation of paths (http and https traffic from AB don’t share the same path)
Finding Loops T1(X,P) Box 2 T2(T1(X,P)) T-12 Box 1 T-13 T-11 Box 3 T-14 Original HS Returned HS T3(T2(T1(X,P))) Box 4 T4(T3(T2(T1(X,P)))) • Is there a loop in the network? • Inject an all-x test packet from every switch-port • Follow the packet until it comes back to injection port
Finding Loops Finite Loop Infinite Loop ? Is the loop infinite?
Q: Loop detection worries . . . Why are loops even interesting given we have hop counts Are we missing cases. Tortuous loops Why does it work well. Example where it works badly. Linear fragmentation? Scalability
Network Slices • By slicing network we can share network resources. (e.g. Bank of America and Citi share the same infrastructure in a financial center). • Like VM, we need to ensure no interaction between slices. (security, independence of slices). We need to check isolation of slices.
Definition of Slice in HSA VLAN = A • Network slice is a piece of network resources defined by • A topology consisting of switches and ports. • A set of predicates on packet headers.
Checking Isolation of Slices • How to check if two slices are isolated? • Slice definitions don’t intersect. • Packets don’t leak after forwarding.
Q: Are slices really useful . . . Given VLANs, is general slicing really useful today? What are good use cases?
Header Space Library (Hassel) • Two versions – Python and C. • Foundation Layer • Implements Header Space and Transfer Function objects. • Application Layer • Reachability, Loop Detection and Slice Isolation checks. • < 100 LoC for these checks. • Parser (only available in Python) • CLI Parsing tool for Cisco IOS, Juniper Junos and OpenFlow table dump. • Example: for Cisco IOS, reads IP table, ARP table, MAC table, Spanning tree output and Config file. • Keeps mapping from TF Rule to CLI line number. • Lazy subtraction: key optimization
Q: Hmm . . . Is it really automated Can you really go all the way? Yendluri, Gasparyan are sceptical. Diff tools Why are the optimizations a big deal. Why 1000x performance lift?
Stanford backbone network Vlan RED Spanning Tree Vlan BLUE Spanning Tree Owns 6 x /16 IP domains. ~750K IP fwd rule. ~1.5K ACL rules. ~100 Vlans. Vlanforwarding. Loop detection test
What we did next: NetPlumber: Online header space checkin (NSDI 2013)
Overview of NetPlumber • Vision: SDN controllers can add new rules 100s of times per second. • Problem: can we verify changes in real time and prevent problems before they occur. • Hassel is too slow for this especially for large networks. • Opportunity: Addition of new rule often makes “small” change to transfer function.
Experiment On Google WAN Default/Aggregate Rules. Run time with Hassel > 100s Not much more benefit!
What’s next: Dynamic Testing by Automatic Test Packet Generation (CONEXT 2013)
Finding Errors In Data Plane • NetPlumber and Hassel both designed to find problems in the control plane. • What if • Switch/link is down. • Link is congested. • ASIC is broken. • The only ways to detect these sort of data plane problems is at run time! • Active testing. • Passive monitoring.
ATPG Goals • Monitor the data plane by sending test packets. Find reachability set • Maximum rule coverage. • Minimum number of packets required to cover every rule (set cover of reachability set). • Constraints on terminal ports and headers of test packets. • Once error is detected, localize it.
Other Work Header Space Analysis may be the first solution that is protocol-independent, fast and practical and can be used as basis for building many other tools. • Pieces of HSA existed in some forms. • Geometric Packet Classification. (SIGCOMM 1998) • Axiomatic basis for communication. (SIGCOMM 2007) • Predicate Routing. (HotNets 2002) • Static Reachability of IP Networks. (INFOCOMM 2005) • Anteater. (SIGCOMM 2011) • Veriflow.(HotSDN 2012)
TOOLS LOOKING BEYOND: NETWORK VERIFICATION: WHEN HOARE MEETS CERF
Q: Are testing and verification essential parts of other engineering disciplines. Q: What can learn from them.
Example: Digital Hardware Design Specification $10B tool businesssupports a$250B chip industry 100s of Books >10,000 Papers 10s of Classes Functional Description (RTL) Testbench & Vectors Functional Verification Logical Synthesis Static Timing Place & Route Design Rule Checking (DRC) Layout vs Schematic (LVS) Layout Parasitic Extraction (LPE) Manufacture & Validate
Communication Engineering De- Modulation S Frequency Modulation D Antenna Amplifier Antenna Band Pass Filter Cos(wt) Cos(wt) OTHER EXAMPLES: TRANSISTORS GATES, INSTRUCTIONS SSA
Q: You are a grad student. You want to go beyond HSA. What should you do?.