470 likes | 482 Views
This research focuses on the development of a secure cluster architecture called Self Cleansing Intrusion Tolerance (SCIT) that enhances security and availability by reducing exposure to attacks and increasing server redundancy. The approach does not rely on signatures or intrusion detection and has potential applications in critical and infrastructure systems such as DNS servers, web servers, and directory servers. The research is supported by TATRC (US Army), NIST, and SUN Microsystems.
E N D
Self Cleansing Intrusion Tolerance: An Approach for Increasing Security and Availability Arun Sood George Mason University/Computer Science Task Technologies Ltd asood@gmu.edu, 703.347.4494 SCIT Collaborators: Dr Yih Huang, Mr. David Arsenault, Mr. Ravi Bhaskar and Mr. Jeffrey Zeiberg http://cs.gmu.edu/~asood/scit Research supported by TATRC (US Army), NIST funded Critical Infrastructure Protection Program, SUN Microsystems
Background and Overview • Self Cleansing Intrusion Tolerance (SCIT) • Limits the exposure of servers to attacks. • Uses redundancy to reduce exposure. • Does *not* rely on signatures, prior knowledge or intrusion detection. • Development process • Prototypes built for Firewall, Web Server and DNS Server. Working on LDAP. • Three patent applications are pending. One disclosure filed. • SCIT research funded by US Army (TATRC), NIST funded Critical Infrastructure Protection Project, and SUN Microsystems.
Review of Intrusion Management • Intrusion Prevention. • Block intrusions. • Common security practices, such as blocking unused ports, restricting Server privileges, choosing strong passwords, … • Intrusion Detection and Recovery. • Timely detection of intrusion to stop losses and repair damage. • Processing overhead grows with traffic volume. • A large site will have to use a significant portion of the processing power for intrusion detection. • False alarm management.
Review - 2 • Intrusion Tolerance. • Minimizes losses and facilitates automatic recovery. • Addresses undetected intrusion (the worst-case scenarios). • SCIT: Self Cleansing Intrusion Tolerance. • Security thru server cleansing and rotations. • Effectiveness increases with server redundancy. • Reduces exposure window. • Fends off attacks or at least limits losses. • Makes it difficult to exploit vulnerabilities. • Overhead independent of traffic volume.
Project Summary • Our objective is to develop a secure cluster architecture that • Applies to several critical and/or infrastructure applications. • Manages undetected attacks. • Improves security through increasing server redundancy. • Improvements in security can be “bought”. • Hardware redundancy enhances security and availability. • Uses virtualization technologies.
Application Domains: Examples • Our work is expected to suit server clusters that • Process transactions with no inter transaction dependencies. • Can handle session info. State info is more difficult. • Within reasonable time limits. • Examples: DNS servers, Web servers, Directory servers, File servers, DHCP servers, Authentication servers, Back office servers, Transaction-oriented database servers, …
Application Domains: Restrictions • Our current approach does not address the following: • Media streaming servers. • Telnet/ssh servers. • FTP servers, or any server that supports long data downloads/uploads. • Essentially, long sessions without time constraints. • The solutions to the above are part of our longer term research goals.
Overview • Research objectives • Self-Cleansing Intrusion Tolerance (SCIT) concept • Hardware Enhanced Security (SCIT – HES) • DNS – SCIT server • Cluster security vs availability trade-off • Virtualization
Redundancy + SCIT = Availability + Security To Achieve SCIT = Self Cleansing Intrusion Tolerance
SCIT Server RotationsExample: 5 online and 3 offline servers Server Rotation Online servers; potentially compromised Offline servers; in self-cleansing
SCIT Server RotationsExample: 5 online and 3 offline servers Server Rotation Online servers; potentially compromised Offline servers; in self-cleansing
SCIT Server RotationsExample: 5 online and 3 offline servers Offline servers; in self-cleansing Server Rotation Online servers; potentially compromised • -No Server service interruption. • For DNS, 2 second exposure time using SUN server.
Cost T T Exposure Window • SCIT increases security by reducing exposure window. • Exposure window is the time a server is online. • SCIT target - keep the exposure window below T, a client defined requirement.
Overview • Research objectives • Self-Cleansing Intrusion Tolerance (SCIT) concept • Hardware Enhanced Security (SCIT – HES) • DNS – SCIT server • Cluster security vs availability trade-off • Virtualization
SCIT Primitives: Incorruptibility Requirements • SCIT server rotations should not be disrupted. • Online servers connected to the clients (public Internet), but not the central controller. • Offline/cleansing servers connected to controller, but not to the clients. • The controller and cleansing servers always isolated from external influences.
Reconfiguration Network Paths using HES toggle Online SCIT Central Controller Clients reset Offline Clients
SCIT with Hardware Enhanced Security (HES) Clients SCIT Control Clients Clients Clients
Implementation Considerations • IPMI (Intelligent Platform Management Interface) supports power management (power up/down) and remote reset/reboot. • Many managed Ethernet switches can enable/disable ports individually and dynamically. • Or customized hardware.
Overview • Research objectives • Self-Cleansing Intrusion Tolerance (SCIT) concept • Hardware Enhanced Security (SCIT – HES) • DNS – SCIT server • Cluster security vs availability trade-off • Virtualization
Secure DNS • Objective: Enhancements to enable query clients to authenticate DNS reply. • Zone private key is used to sign the DNS record. • Zone private key exposure will comprise integrity of the DNS responses • Query clients verify signature using zone public key • If DNS server is successfully attacked, there are dire consequences Resource Record Set
Dynamic Updates: Key exposed • For secure operations, private key should not be reachable from the public internet. • The Challenge: How can we do dynamic updates without exposing the private key? • The Solution: Do Dynamic Update computations off-line. • Trade-off: Turnaround time for Dynamic Updates increases.
DNS-HES Cluster • Advertises two public IP addresses • A primary DNS IP. • A secondary DNS IP. • Uses four or more servers • A primary DNS server. • A secondary DNS server. • A backend processing server. • One or more servers in cleansing.
Example: DNS-HES Cluster Master Storage (Master file, keys) Online 0 Clients B P Online 1 Central Controller Network link One ormoreserversin cleansing (mode C) Electrical/Opticalsignal line Clients S
The Primary Server Connects to one Online Storage and Clients Master Storage LocalMaster file SCIT Controller P Clients Online 0 Online 1
Overview • Research objectives • Self-Cleansing Intrusion Tolerance (SCIT) concept • Hardware Enhanced Security (SCIT – HES) • DNS – SCIT server • Cluster security vs availability trade-off Example: 2DNS-3WEB Cluster • Virtualization
User 1 User 2 User 3 User N Load Balancer Traffic Distribution, Session Persistence, SSL Termination Server 1 Server 2 Server 3 Server N Session Persistence: Sticky Sessions
User 1 User 2 User 3 User N Load Balancer Traffic Distribution, Session Persistence, SSL Termination Server 1 Server 2 Server 3 Server N Server Clusters • Persistent db • Disk, memory resident. • Multicasting • Shared memory
User 1 User 2 User 3 User N Load Balancer Traffic Distribution, Session Persistence, SSL Termination Server 1 Server 2 Server 3 Server N Virtual Server 1 Virtual Server 2 Virtual Server 3 Virtual Server Implementation: Session Replication • Virtual servers are short lived. • Persistent db is easy. • Multi casting • Additional network traffic. • Reduce traffic through smaller clusters. • Shared memory is not recommended.
Unified SCIT Control forSecurity and Availability • Ensure SCIT operations. • Guarantee minimum service availability. • Servers can be added to or removed from the cluster at any time. • Adding servers improves both. • Availability. • Security, by reducing server exposure times.
Generalization: N servers and M roles • The DNS-HES design can be generalized to a SCIT cluster using N servers to support M roles. • Without loss of generality, each role is assumed by only one server. • Example: • A 2DNS-3WEB cluster supports M=5 roles: P, W, W, W, and S. • The primary DNS server (P) and one web server (W) provides minimum (but still useful) services
Role Swap • A Role-R Swap (or simple R Swap) rotates the present online server in Role R offline for cleansing and finds a newly cleansed server to resume role R. • The 2DNS-3WEB cluster supports P, W, W, W, and S swaps. • Example of P-swap. C P
P P C Role Swap • A Role-R Swap (or simple R Swap) rotates the present online server in Role R offline for cleansing and finds a newly cleansed server to resume role R. • The 2DNS-3WEB cluster supports P, W, W, W, and S swaps. • Example of P-swap. C P
Rotation Pattern • A rotation patternP is a sequence of role swaps that covers all M roles. • Example 1: Round Robin pattern PWWWS • Example 2: Skewed pattern PWWPWS • Example 3: with randomness PXWXPXWXWXS X {P, W, W,W,S}
Relative Importance of Roles • Each service role R has an index, (R), that reflects its relative importance. (We use integers.) • Without loss of generality, the indices are assumed in descending order • 2DNS-3Web: (P)(W)10, and (W)(W)(S)0.
Cluster Index • The SCIT cluster index at time t, is denoted by (t) - it is the sum of the indices of those roles that are active at time t. • The value of a SCIT cluster must be at all times greater than or equal to a predetermined minimum value, min. • 2DNS-3WEB: min=20
Minimum Number of Servers • Let Nmin be the minimum number of servers required to achieve min • Nmin = 2 in 2DNS-3WEB. • When N<Nmin, behavior undefined. • When N=Nmin • N servers servicing the most important N roles. • No server rotations/swaps.
Servers in Cleansing • When MN>Nmin (more roles than servers). • N-1 most important roles active. • At least one server in cleansing to trigger rotations. • Virtualization ? • When N>M (more servers than roles). • M servers servicing M roles. • NM servers in cleansing and for rotations. • The more servers in cleansing, the faster the rotations, and the shorter the server exposure times.
Provable Properties • The proposed SCIT control algorithm satisfies the following properties: • Server Rotation Guarantee. With arbitrary server failures, server rotations continue if there are NNmin+1 functioning servers in the cluster. • Minimum Service Guarantee. With arbitrary server failures, min is maintained at all times (after an adjustment period) if there are NNmin functioning servers in the cluster.
Overview • Research objectives • Self-Cleansing Intrusion Tolerance (SCIT) concept • Hardware Enhanced Security (SCIT – HES) • DNS – SCIT server • Cluster security vs availability trade-off Example: 2DNS-3WEB Cluster • Virtualization
Virtualization in Support of Security • Virtualization was motivated by server consolidation. • Intensive ongoing research to use virtualization to enhance security. • Features that would help (a researcher’s wish list). • Fast creation of new Virtual Server (VS) on the fly. • VS forking like in process forking. • Checkpoints and fast reverts. • Snapshot only the memory in use, not the entire VS memory (as in the case of VMware snapshots). • Efficient sharing of resources. • Memory and disk: copy on write among identically configured VS. • Ultimate goal is single use VS.
Current Research • SCIT/VS: SCIT with Virtual Servers. • Take advantages of virtualization features for virtual server rotations and cleansing. • Operates at much faster speed than hardware rotations, resulting in much smaller exposure windows. • A SCIT/VS+Apache testbed is being built. • Initial results. • 10% of SCIT operational overhead. • VS cleansing every 30 seconds. • A promising combination:Solaris+FireEngine+Xen. • Should reduce the overall overhead significantly.
Conclusions • Incorruptible intrusion tolerance through physical isolation. • Connecting security and service availability. • Many critical infrastructure services use redundancy for availability / dependability. • Many hardware components are included in modern high availability systems.
SCIT Publications + Contact Info • Current research focuses on combining SCIT with virtualization technology to drastically reduce server exposure times. • SCIT papers are available at http://cs.gmu.edu/~asood/scit Arun Sood asood@gmu.edu 703.993.1524