1 / 29

When Technology Falters: The CareGroup Network Outage

hali
Download Presentation

When Technology Falters: The CareGroup Network Outage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. When Technology Falters: The CareGroup Network Outage John D. Halamka MD CIO, CareGroup CIO, Harvard Medical School

    2. Agenda In depth overview of the Network Outage Key Lessons The Sequel – SQL Slammer Questions and Answers

    3. CareGroup Network as Built

    4. Timeline November 13, 2002 1:45pm Napster-like internal attack Change begins, redundant links cut Callisma and Cisco on site November 14, 2002 Spanning tree issues WAN issues CAP declared at 4:00pm

    5. Core Switch Utilization

    6. Timeline November 15, 2002 PACS Rebuild Research/Cardiology rebuild Reboot of core and distribution layer November 16, 2002 VLAN mismatch Redundant Core built as contingency

    7. Core Switch Utilization

    8. Root Cause Analysis CareGroup Network grew organically by Merger and Acquisition into a massive bridged switched network which was not within Spanning Tree spec Equipment was not life cycle managed Router/switch configuration was not in accordance with best practices i.e. multicast dense mode

    9. Spanning Tree Problems When TAC was first able to access and assess the network, we found the Layer 2 structure of the network to be unstable and out of specification with 802.1d standards. The management vlan (vlan 1) had in some locations 10 Layer 2 hops from root. The conservative default values for the Spanning Tree Protocol (STP) impose a maximum network diameter of seven. This means that two distinct bridges in the network should not be more than seven hops away from one to the other.

    10. Key Lessons Partner with your network vendor Encourage external audits of your network Engage advanced engineering services Avoid senior management blind spots

    11. Key Lessons Avoid flat topology bridged switched networks.

    13. Key Lessons Re-evaluate the enterprise architecture of your network Routed core Switched distribution and access layers Robust Firewall

    14. Key Lessons Life Cycle Manage your network Eliminate Legacy Protocols Recognize the value of new feature sets Hardware must keep up with the demands of a changing organization – video over IP, IP telephony, bioinformatics, image management

    15. Key Lessons Implement appropriate monitoring and diagnostic tools to maintain the health and hygiene of your network Concord NATKit CiscoWorks OpenView

    16. Key Lessons Have a robust downtime plan Out of band diagnostics Dial up modems and computers in key clinical areas Overview of CareGroup Disaster Recovery plan

    17. Service Objectives

    18. Protection Features

    19. Protection features

    20. Protection Techniques Cost versus Benefit

    21. Protection Techniques by Vulnerability

    22. Key Lessons Implement Strict Change Control Standards, configurations, devices, protocols, links, processes, procedures, or services Prior review and approval of all network infrastructure changes Multi-discipline membership Changes classed as substantial, moderate, or minimal impact

    23. Key Lessons Implement Strict Change Control (cont) Substantial changes require Cisco AES review Changes scheduled 2am – 5am weekends Changes require baseline, testing, and recovery plans As-Built documentation to include overall, physical and logical diagrams NCCB recommends expense allocation

    24. The Sequel – SQL Slammer Released at 12:30am on January 25 Infected East Coast at 12:40am Microsoft SQLServer 2000 was patched, however Microsoft did not issue any patches or security warnings on Microsoft Data Engine 2000 (MSDE), which is included with numerous desktop products

    25. Spread of the Worm

    27. Exact effect on CareGroup MSDE and non-IS maintained databases infected Network saturated by worm activity Shut off links to Research areas Blocked all traffic from the public internet Network traffic levels returned to normal

    28. Cleanup Restart of servers and desktops that were disrupted by the outage Once all areas research areas had cleaned desktops, we restored port 1433 connectivity

    29. Further Lessons learned VPN as a security risk Implement a scanning program to analyze research desktop and server vulnerabilities Ensure you have modern network equipment that afford you the tools to control intra-VLAN traffic

    30. Conclusions Lifecycle manage your network just as you would your desktop Ensure senior management understands the value of the network as a strategic asset Build great downtime procedures including out of band connectivity just in case the technology falters

More Related