330 likes | 456 Views
The DPASA Survivable JBI- A High-water Mark in Intrusion Tolerant Systems. Partha Pal On Behalf of the Entire DPASA * Team BBN Technologies, Adventium Labs, SRI, U Illinois and U Maryland.
E N D
The DPASA Survivable JBI- A High-water Mark in Intrusion Tolerant Systems Partha Pal On Behalf of the Entire DPASA* Team BBN Technologies, Adventium Labs, SRI, U Illinois and U Maryland * The DPASA project was sponsored by DARPA under an AFRL Contract during 2002-2005 The BBN-led DPASA Team designed the survivable architecture for, used it to defense-enable an DoD relevant information system, and subjected it to multiple Red-Team evaluations.
Outline • Intrusion Tolerance • The DPASA Approach • Survivability Architecture • Design Principles • Baseline (undefended) and the Survivable System • Evaluation Results • Conclusion and Future Direction
Generations of Security Research Prevent Intrusions (Access Controls, Cryptography, Trusted Computing Base) Cryptography Access Control & Physical Security Trusted Computing Base But intrusions will occur 1st Generation: Protection Detect Intrusions, Limit Damage (Firewalls, Intrusion Detection Systems, Virtual Private Networks, PKI) PKI VPNs 2nd Generation: Detection Intrusion Detection Systems But some attacks will succeed Boundary Controllers Firewalls Tolerate Attacks (Redundancy, Diversity, Deception, Wrappers, Proof-Carrying Code, Proactive Secret Sharing) Hardened Operating System Intrusion Tolerance Graceful Degradation Big Board View of Attacks Real-Time Situation Awareness & Response 3rd Generation: Tolerance No system is perfectly secure– only adequately secured with respect to the perceived threat.
3rd (moving towards 4th) Generation W/O attack Level of service Next Gen? 3rd Gen: Survivable Undefended time Start of focused attack • 3rd Generation: Tolerance and Survivability: • Assumes that attacks/bad things cannot be totally prevented– some attacks will even succeed, and may not even be detected on time.. • Focuses on desired qualities or attributes that need to be preserved/ retained/continued even if in a degraded manner— • availability: (of information and service) • integrity: (of information and service) • confidentiality: (of information) • Next Generation of Survivability: • Regain, recoup, regroup and even improve…
Drivers and Contributing Factors Attack Unavailability is detectable • COTS: Bugs and unknown vulnerabilities Can’t reach server: Wait or give up.. • Open/interoperable: Discovery and use of new exploits Corruption • Distributed:More places to attack Wrong answer: Can get pretty bad.. • Interconnected: Attack initiation and propagation Exfiltration Stolen data: • Interdependent: Cascade effect Attacker may try to introduce corruption or steal rather than disrupt!
Contention Between Defense and the Adversary • Continued operation: • Preserve C, I and A • Degrade– Applications Attacker memory CPU CPU memory host host • Attacker and Application compete for the same resources • corrupt • consume • Application security • Adaptive response The game is inherently biased against the defense: adversary needs to find only one way to win, whereas the defense needs to cover as many possibilities as it can. Therefore, in the short term, successfully denying or delaying the adversary is a win for the defense..
Defense Mechanisms Applications Attacker memory CPU CPU memory host host • Defense mechanisms: mechanisms that do not contribute to principal functionality of the system, but included in the system to preserve/bolster C, I A • Tools, protocols, subsystems… • Network, Host (OS), Application layer mechanisms
Survivability Architecture • Survivability architecture: • survivability goals + undefended system + design principles • organization of components, both functional components from the undefended system and the added defense mechanisms, their interconnections, and protocols that govern them.. • Entities, interconnections, protocols..
Designing for Survivability • Key motto: combine protection-detection-adaptive response • High barrier to entry from outside as well going from one part to another • Improve the chance to spot attacker activity • Adapt to changes caused by the attacker Key Assets
Dynamic Defense in Depth • Multiple layers of defense • Unlikely that all layers have the same hole • Dynamically changing the defenses • Analogous to changing your passwords • Reduces the likelihood of success to dictionary attacks • Unpredictable to the attacker • Disclose as little as possible to the attacker, confuse, obfuscate his view Choice and organization of defenses requirements + design principles
Design Principles • SPOF protection • Controlled use of diversity • Physical barriers before key assets • Robust basis of defense in depth • Containment layers • Modularity • Range of adaptive responses • Human override • Minimalism • Configuration generation from specs Many of these are surprisingly simplistic and intuitive--- but it is also surprising how many of these are routinely ignored in current system design
SPOF Protection • It may be impossible to protect all “single points of failures” in a system • Depending on the level of abstraction/granularity there may be way too many • Do not go overboard in choosing the “unit” • A host, a process, an instance representing a physical object… • Not the DMA controller, bus, or the CPU in a host.. • Units that perform key or essential functions and are exposed to outside must not be left as SPOF • The web server that runs your electronic store front, or facilitates collaboration • The database or application server that your sales force or analysts constantly need • Do not ignore how you access network !!! • Typically mitigated by redundancy • Spatial redundancy may not always be possible • Redundancy in time domain (restart) • Managing redundancy • Transparent (middleware) • Applications are aware of the redundancy
Diversity and Physical Barriers Introduce diversity Introduce redundancy a “key asset” d c b b a a diversity? Network Network Network Network applications accessing the key asset over the network a a accessibility of 4 replicas? SPOF? Introduce physical barriers using DMZ 4 replicas are still accessible run same attack 4 times? • Notion of “zones”: Crumple zone, Operations zone, Executive zone • Enablers • Application level proxies • Additional features • Rate limiting • Size limiting • Learning usage pattern • Tunnel termination • Insertion of protocol diversity Management & decision-making functions controlled communication Main operational functionality Access points
Controlled Use of Diversity Network • Source of artificial diversity • Hardware architecture • OS • Programming language • Application • COTS • n-version programming? • Automated diversity generation? LOGOS are registered trademarks of respective owners quad1 quad2 quad 4 Diversity is expensive • Initial investment, continued maintenance & management b d a c • Controlled use of diversity • In a given situation more diversity is not necessarily better • Given the organization on left, using 4 different OS is not better than using 3 • There are situations where a small additional investment provides a big pay off– identify and take advantage of these! a b c d SE LINUX WINDOWS SOLARIS b b b
Robust Basis for Defense in Depth GAM003 Photodisc (Illustration) Royalty Free Photograph • It is likely that a majority of the defense mechanisms are “software” • Depends on hardware, OS and network services • May depend on other software mechanisms as well! • How to avoid “house of cards” in building defense in depth? • Forming a robust basis: useful things to consider while trying to satisfy a need • Hardware based mechanisms • Cryptographic strengths • Assumptions about operating environment • Redundancy: Hardware based vs. software based • Interconnecting hosts in a network or inter-network: use of managed switches is better than programming it in • Storing and using private keys: smart cards/separate co-processors is better than using the main disk/memory/CPU • Fine grain packet filtering and encryption: NIC based solution is better than software tools (IPTables etc)
Containment Layers network segment host process System management function Operations zone proxy of the system management function executive zone Main functionality: PSQ (publish-subscribe and query server) operations zone crumple zone Application level proxies quad2 quad4 quad1 quad3 • Containment layers: architectural construct that helps limit the spread of attacks/attack effects • Two main dimensions to consider • Spatial and Functional Containment in spatial dimension Adding the functional dimension
Modularity System management function Operations zone proxy of the system management function executive zone Main functionality: PSQ (publish-subscribe and query server) operations zone crumple zone Application level proxies quad2 quad4 quad1 quad3 • Survivable system must adapt to changes caused by attacks • Is Containment+ Redundancy enough to support adaptive response? X Will the system still work if you kill the affected application? What if we have to go up in the spatial containment hierarchy– shutdown the host, quarantine the host or the network containing the host? Modularity is the design property that facilitates such responses Enablers: • Actuator mechanisms: to effect the response • Post-action coordination: (implemented in code) healing/recovery, masking/degradation
Range of Adaptive Response • Survivable system must adapt to changes caused by attacks • It is important to have a range of adaptive responses • Some symptoms are more critical than others, e.g., port scan vs. all heartbeats went down • In some cases response delayed is response failed, e.g., observed an attack signature • Some responses are more severe than others, e.g., restoring a file vs. isolating a network Rapid response: Local scope, fully automated, local decision making based on local observation. • Spurious file [process]: delete [kill] • Lost file: recover Coordinated response: System wide scope, mostly automated, coordinated decision making (multiple rounds of message exchanges) based on corroborated information from multiple parts of the system • Restart a function, reboot a host, isolate a network Human assisted response: • Clean a host and restart • Examine the log (forensics) to identify a signature and patch Enablers: • Advanced middleware, Sensors and correlators, Logical decision tools/expert systems
Baseline (Undefended System) Information Information Object Repository Metadata Metadata Repository JBOSS APP Server JBOSS APP Server Security data Security Repository Repository Repository Repository PSQ PSQ PSQ Srvr Srvr Srvr CORE LAN CORE LAN Solaris* Solaris* Emulated Emulated Emulated Windows* Windows* Public IP Network Public IP Network Public IP Network IP Network IP Network IP Network * * various versions various versions Client 6 AODB Client 5 TAP Client 1 WxHaz Client 10 MAF MAF CombatOps Client 9 CombatOps HUB HUB TARGET TARGET Client 7 Client 2 ChemHaz Client 8 CAF CAF Client LAN 4 Client LAN 3 EDC Client 3 AODBSVR DB SVR 1 AODBSVR JEES Client 4 SWDIST BE SVR 1 SWDIST DB SVR 2 TAPDB TAPDB Client LAN 2 Client LAN 1
Defense-Enabled System Client 10 MAF MAF NIDS NIDS NIDS NIDS NIDS NIDS NIDS QIS QIS QIS QIS QIS QIS QIS QIS VLAN VLAN Bump In Wire w/ADF Bump In Wire w/ADF ADF NIC ADF NIC Experiment Control/logging network SeLinux SeLinux WinXP WinXP Pro Solaris 8 Win2000 VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router Emulated IP network using VLANS in a single Cisco 3750 VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router VPN Router HUB HUB HUB HUB HUB HUB HUB HUB HUB HUB Client 6 AODB TAP Client 5 Client 1 WxHaz Client 9 CombatOps CombatOps HUB HUB ChemHaz Client 2 TARGET TARGET Client 7 Client 8 CAF CAF EDC Client 3 Client 4 JEES AODBSVR DB SVR 1 AODBSVR SWDIST BE SVR 1 SWDIST DB SVR 2 TAPDB TAPDB Client LAN 4 Client LAN 3 Client LAN 1 Client LAN 2
Key Aspects of the Survivability Architecture • Defense mechanisms • Policy enforcement • Encryption • Authentication • Detection and correlation • Redundancy/redundancy management • Adaptive response (recover, degrade) • Design principles and enablers • Multiple layers: policy, encryption, authentication… • SPOF, Diversity, Hardware grounding, Modularity, Containment, Range of adaptive response • Architectural elements • Zones, Quadrants, Survivable Middleware, Protection domains • System Managers (SM), Access Proxies (AP), Local Controllers (LC) • Protocols • Corruption Tolerant PSQ: Embedded in the Survivable Middleware • Heartbeats: • Alerts: Embedded sensors • Command: among SMs, SM-LC…
Some Annotations Policy Enforcement (permissions, capabilities) • JVM security policy • SELinux/CSA policies • Process Protection Domain • System Protection Domain • ADF policies • Network Procetion Domain Encryption • Outer VPN • ADF VPGs • Application level encryptiom Authentication • VPN level (router-router, router-hosts) • ADF level (host to host) • Application level Adaptive responses • Restore files • Kill processes • Isolate host • Reboot host • Retry PSQ operations • Adjust redundancy/level of tolerance (degrade) • Restart application • Quarantine network segments Detection and Correlation • Embedded sensors (applications, proxies, heartbeats) • Policy engines • NIDS • EMERALD • Advisor
Survivable Middleware Client PSQ Platform PSQ Middleware Application Data From Outside PUB/SUBConnector Transport Substrate Common Interface PSQ Function Common API Implementation Transport Substrate JBOSS Rep Client PSQ Platform Client Zone Data From Outside Executive Zone PSQ Middleware Application Common Interface Core Survivability Delegate PUB/SUB Connector Crumple Zone Operations Zone Common API Implementation Protocol Handler PSQ Function PSQ Proxy JBOSS Rep Specialized Stub Other elements (e.g., management) are not shown Undefended Pub/Sub Middleware: Password login. No redundancy at core Survivable Middleware adds a stronger level of authentication, access control and reliability: Cryptography-based login, Redundant core, Transparent protocol based on weak assumptions, multiple transports
Red Team Evaluation (Adversarial) • Run 1 • Defended system ran for 14 hours with no visible impact • The policies were so tight that the red team had no visibility of their actions or their impact • Run 2 (modified the policy to enable red team the visibility they requested) • 12 hour scenario completed, but the red team was able to cause significant hiccups during the scenario • With the added visibility they were able to DOS specific clients when they needed to publish information • Run 3 (different red team) • Within an hour they took out the PIX VPNs! • Residual flaw in the Cisco router configuration (recall red teams have complete knowledge of everything)– in addition to the agreed upon span port, the configuration also gave them a trunk port access! • Rerunning the same attack without the trunk port did not succeed, but the red team was divided in their opinion about whether the attack could be customized to work w/o the trunk port access Although having access to trunk ports in multiple routers in a backbone is a considerable amount of privilege, run 3 exposed and exploited the tradeoff we made in the design ! No loss of published information or corruption!
Red Team Evaluation (Cooperative) Compressed 3 hr scenario “Traitor” blue team member(s) worked with the red team Red team started inside the defended system with attack code pre-positioned High-level access and higher privilege implied some of the sensors were blind New flaws and defense opportunities exposed: Bad refs + Spread, Java serialization, MSQL Injection, ADF Policy Server exploit This extraordinary success of the defense required considerable human help
Current Conclusion • A high-water mark in survivable system design • Proof that information systems can be made highly survivable • Survivability Architecture: individual mechanisms abound, this was a great first example of integrating them coherently with a tight and consistent policy • There is no such thing as “improbable risk” against a highly motivated adversary • Exploiting the SPOF PIX VPN routers were assessed to be an improbable risk • Created a daunting level of difficulty to breach confidentiality and integrity, but availability is not there yet • That is despite all the redundancy, diversity and adaptive response • Loss is easily detected • Human intelligence required in interpreting observed information and controlling the architecture
Future Direction • What to do with availability • Beyond degradation? • Regenerate? Learn while you regenerate? • Artificial diversity? • Minimizing the need for human intelligence? • Motivation • Cost issue • Response time • Human factors • Can there be an expert system/expert assistant?
Reference Material • For more information • Papers about this project: • http://www.dist-systems.bbn.com/papers/2005/ACSAC/index2.shtml • http://www.dist-systems.bbn.com/papers/2005/ACSAC/index.shtml • http://www.dist-systems.bbn.com/papers/2006/NCA/index.shtml • http://www.dist-systems.bbn.com/papers/2005/NCA/index.shtml • Other BBN papers • Michael Atighetchi, Partha Pal, Franklin Webber, Richard Schantz, Christopher Jones, Joseph Loyall. Adaptive Cyberdefense for Survival and Intrusion Tolerance. IEEE Internet Computing, Vol. 8, No. 6, November/December 2004, pp. 25-33. • http://www.dist-systems.bbn.com/papers/2006/SPE/index.shtml • COCA • http://www.cs.cornell.edu/home/ldzhou/coca.htm • MAFTIA paper • http://www.maftia.org/ • OASIS book: • http://csdl2.computer.org/persagen/DLAbsToc.jsp?resourcePath=/dl/proceedings/&toc=comp/proceedings/oasis/2003/2057/00/2057toc.xml • Useful technologies • ADF (3COM, Secure Computing, Adventium Labs) • http://doi.ieeecomputersociety.org/10.1109/DSN.2006.17 • http://doi.ieeecomputersociety.org/10.1109/DISCEX.2001.932222 • SELinux, CSA • http://www.nsa.gov/selinux/ • http://www.cisco.com/en/US/products/sw/secursw/ps5057/index.html • EMERALD (SRI) • http://www.csl.sri.com/projects/emerald/ • Routers, Managed switches (Various vendors Cisco, HP etc) • http://www.cisco.com/warp/public/707/21.html • http://www.hp.com/rnd/index.htm • Tripwire (Tripware Inc), Veracity (Rocksoft) • http://www.tripwire.com/index.cfm • Spread (JHU, Spread Concepts) • http://www.spreadconcepts.com/ • http://www.dsn.jhu.edu/research/group/secure_spread/ • BFT protocols • L. Lamport, R. Shostak, and M. Pease. The Byzantine generals problem. ACM Trans. Program. Lang. Syst., 4(3):382-401, 1982. • http://www.cs.cornell.edu/fbs/publications/2004-1924.pdf • Malkhi, Reiter, Castro, Liscov etc .. • Advanced middleware like QuO • http://quo.bbn.com