310 likes | 436 Views
Networking Update. Terry Gray Director, Networks & Distributed Computing University of Washington UW Medicine IT Steering Committee 16 January 2004 20 February 2004. Outline. In our last episode… Context Expanded Partnership Recent Problems Today Systemic Problems and Progress
E N D
Networking Update Terry Gray Director, Networks & Distributed Computing University of Washington UW Medicine IT Steering Committee 16 January 2004 20 February 2004
Outline • In our last episode… • Context • Expanded Partnership • Recent Problems • Today • Systemic Problems and Progress • Network Security Chronology • Design Issues
Increased dependency on network apps Decreased tolerance for outages Decades of deferred maintenance... Inadequate infrastructure investment Some old/unfortunate design decisions Some extraordinarily fragile applications Fragmented host management Increasingly hostile security environment Increasing legal/regulatory liability Importance of research/clinical leverage Context: A Perfect Storm
Changed: C&C now responsible for... In-building network implementation andoperational support for med ctrs, clinics Med center network design “for real” Not Changed: C&C still responsible for... Network backbone, routers Regional and Internet connectivity SoM and Health Sciences networking Key Elements of the Partnership
Consistency, interoperability, manageability Leverage C&C networking expertise Clinical/research hi-performance network needs 24x7 Network Operations Center (NOC) Advanced network management tools Avoid design/build organizational conflicts Beyond the network...hope to share distributed system architecture and network computing expertise Why the Partnership Makes Sense
Oct 29: Partial router failure reveals escalation procedure problems Oct 30: Security breach triggers connectivity and server problems Nov 12: 13 minute power outage triggers extended server outage Dec 12: Router upgrade uncovers wiring error, which triggers multicast storm(None of these were related to the network transition, save perhaps timing of #4) Recent Problems
Environmentals (Power, A/C, Physical Security) Network Client Workstations Servers Applications Personnel, Procedures, Policy, and ArchitectureFailures at one level can trigger problems at another level; need Total System perspective System Elements
What’s up with C&C’s alarm system vendor? If power was out for only 14 minutes, why was service out for multiple hours? What can we say about an app so fragile that a net interruption of a few seconds requires a server reboot? What can we say about thin clients built on top of thick (WinXP) operating systems? What can we say about a network where one wiring fault can disable most of the net? Reasonable Questions
Old infrastructure (e.g cat 3 wire) Non-supportable technologies (e.g. FDDI) Non-supportable (non-geographic) topology Expensive shortcuts (e.g. cat5 mis-terminated) Security based on individual IP addresses Subnets with clients and critical servers Documentation deficiency Contact database Device location database Critical device registry Systemic Network Problems(NB: these pre-date Tom et al)
Ever-increasing system complexity, dependencies Departmental autonomy Un-controlled hosts Un-reliable power and A/C in equipment rooms No net-oriented application procurement standards Are HA and DRBR expectations realistic? Are backup plans workable? Systemic General Problems
Network Device Growth Note: Most dips reflect lower summer use; last one is a measurement anomaly
Agreement on standard maintenance window Created “Top 10” list --creeping to Top 20 :) Static addressing work-around (success!) FDDI, VLAN elimination Subnet splits/upgrades (1500 computers) Equipment upgrades Router consolidation, dedicated subnets, separate med center backbone Equipment, outlet location database updates Initial wireless deployment Near-term Progress and Plans
Biggest cost: physical infrastructure & wireplant upgrades NetVersant engaged for cost estimation project Cisco engaged for network architecture review We recommend similar reliability/design assessment for servers, apps & procedures Design Review and Cost Estimates
Networks = Connectivity; Security = Isolation Fault Zone size vs. Economy/Simplicity Reliability vs. Complexity Prevention vs. (Fast) Remediation Security vs. Supportability vs. FunctionalityDifferences in NetSec approaches relate to: Balancing priorities (security vs. ops vs. function) Local technical and institutional feasibility Design Tradeoffs
Tradeoff Examples • Defense-in-depth conjecture (for N layers) • Security: MTTE (exploit) N**2 • Functionality: MTTI (innovation) N**2 • Supportability: MTTR (repair) N**2 • Perimeter Protection Paradox (for D devices) • Firewall value D • Firewall effectiveness 1 / D • Border blocking criteria • Threat can’t reasonably be addressed at edge • Won’t harm network (performance, stateless block) • Widespread consensus to do it • Security by IP address
Network Security Credo • Focus first on the edge(Perimeter Protection Paradox) • Add defense-in-depth as needed • Keep it simple (e.g. Network Utility Model) • But not too simple (e.g. offer some policy choice) • Avoid • one-size-fits-all policies • cost-shifting from “guilty” to “innocent” • confusing users and techs (“broken by design”)
What is it? Why important? Incompatible with perimeter security? Too late to save? NUM-preserving perimeter defense Logical Firewalls Project 172 Foiled by static IP addressing… Requires all hosts be reconfigured Preserving the Net Utility Model
Network isolation for critical services. Host integrity. (Make the OS is net-safe.) Host perimeter. (Add host firewalling) Server sanctuary perimeter. Network perimeter defense. Real-time attack detection and containment. Lines of Defense
1990: Five anti-interoperable networks 1994: Nebula shows network utility model viable 1998: Defined border blocking policy 2000: Published Network Security Credo 2000: Added source address spoof filters 2000: Proposed med ctr network zone 2000: Proposed server sanctuaries 2001: Ban clear-text passwords on C&C systems 2001: Proposed pervasive host firewalls 2001: Developed logical firewall solution 2002: Developed Project-172 solution 2003: Slammer, Blaster… death of the Internet 2003: Developed flex-net architecture Network Security Chronology
Parallel networks; more redundancy Supportable (geographic) topology Med center subnets = separate backbone zone Perimeter, sanctuary, and end-point defense Higher performance High-availability strategies Workstations spread across independent nets Redundant routers Dual-homed servers Next-Gen Network Architecture
Tom’s Nobody gets hurt Nobody goes to jail Terry’s “Works fine, lasts a long time” Low ROI (Risk Of Interruption) Steve’s Four Nines or bust! Success Metrics
We all want: High MTTF, Performance and Function Low MTTR and support cost The art is to balance those conflicting goals we are jugglers and technology actuaries Success Metrics II
How many nines? Problem one: what to measure? How do you reduce behavior of a complex net to a single number? Difficult for either uptime or utilization metrics Problem two: data networks are not like phone or power services… Imagine if phones could assume anyone’s number Or place a million calls per second! Success Metrics III
Mitigating impact of closed networking: Needs of the many vs. needs of the few Pressure to make network topology match administrative boundaries Complex access lists False sense of security Increased MTTR Next-generation threats: firewalls won’t help Security vs. High-Performance Wireless Balancing innovation, operations, & security Concerns, Future Challenges
Five 9s is hard (unless we only attach phones?) Even host firewalls don’t guarantee safety Perimeter firewalls may increase user confusion, MTTR Nebula existence proof: security in an open network Even so… defense-in-depth is a Good Thing It only takes one compromise inside to defeat a firewall Controlling net devices is hard --hublets, wireless The cost of static IP configuration is very high Net reliability & host security are inextricably linked Never underestimate non-technical barriers to progress Lessons