540 likes | 715 Views
Honeywell Laboratories. C ORTEX : Mission-Aware Closed-Loop Cyber Assessment and Response. 1/27/05 PI Meeting David Musliner Christopher Geib Mike Pelican. Outline. Project overview. Thin-slice initial demo. Proactive response planning. Planner evaluation tools. Quadchart.
E N D
Honeywell Laboratories CORTEX: Mission-Aware Closed-Loop Cyber Assessment and Response 1/27/05 PI Meeting David Musliner Christopher Geib Mike Pelican
Outline • Project overview. • Thin-slice initial demo. • Proactive response planning. • Planner evaluation tools. • Quadchart.
Project Overview • Technical Objectives – Automated defense systems that: • Model and understand their changing mission needs. • Automatically develop defensive plans to recognize and stop attacks. • Automatically regenerate and rebuild system infrastructure. • Learn to prevent attacks. • Resulting in a highly reliable self-regenerative system. • Existing Practice – Very limited condition-action rules within some IDS systems. • Not mission aware, not self-aware. • No lookahead, no proactive resource testing. • No dynamic replanning or performance tradeoffs.
Project Overview • Technical Approach – Integrate, extend & improve: • Scyllarus’ state of the art intrusion detection/correlation technology. • CIRCADIA’s automated planning and controller synthesis. • Learning methods to: • Refine models of attacks. • Improve recognition of new attacks. • Truly New – • Mission-aware, context-sensitive response and self-regeneration. • Planned preemptive self-testing to detect faults in mission-critical assets before they are required. • Focused learning to improve the system’s performance on its specific mission.
Mission/phase specific planning problem Learning System, security, and mission application actions The CORTEX Vision System Reference Model (Mission, behaviors, faults, threats) - Mission Aware Meta Planner Controller Synthesis Module Custom reactive plan (proactive protection, reactive defense, and healing) Unexpected states, unhandled contingencies Sensor inputs Likely Security Situation Active Security Controller Executive Dynamic Evidence Aggregator
Overview (cont’d) • Major Risks and Mitigations – • Planning domain complexity: • System demonstrations on limited-scope domain. • Scalable synthetic evaluation domains for planning. • Alternative planning approaches. • Learning: • Focused learning techniques for knowledge-rich parts of the problem (e.g., learning size limits on buffer overflow vulnerability). • Aggressive schedule: • Thin-slice first demonstration emphasizing infrastructure. • Cyclic development plan focusing on incremental improvement in each sub-area.
Overview (cont’d) • Quantitative Metrics – • Measures of attack learning and detection rates. • Respond to 100% of detected attacks. • Expected Major Achievements – • High confidence intrusion assessment and diagnosis. • Pre-planned responses to contain/recover from faults and attacks. • Automatic tradeoffs of security vs. service level & accessibility. • Learning to recognize and defeat novel attack.
JUL 04 Overview • Task Schedule: • Develop thin-slice demonstration (first version complete). • Extend scenario (in progress). • Develop learning capability & experiments (in progress). • Model mission phases (in progress). • Proactive response planning (in progress). • Milestones Demos: Mission Aware Demo Thin slice demo Learning Demo DEC 04 APR 05 DEC 05
Thin Slice Demo: Self-Regenerative MySQL
Demo Objectives • Implement “taste-tester” architecture to form a redundant, high-reliability MySQL server system. • Illustrate detection and self-regenerative response to successful attack. • Illustrate (simple) learning to improve immunity. • Provide basis for future demonstrations of multi-phase mission-awareness and learning.
Demo Scenario • N (8) MySQL servers are available as redundant, replicable assets. • Queries arrive and are processed by the designated “Lead Taster”. • If the Lead Taster has no problem with the query, it is replicated to each of the servers. • If the Lead Taster fails: • Bad query is not sent to other servers. • A backup server becomes Lead Taster. • Bad query is sent to learning module for generalization. • Dead server is restarted. • Future occurrences of the same or similar exploits are ineffective.
Demo Development Process • Design architecture for integrating sensor data aggregation, reaction planning, plan execution, and learning. • Design reduced-scope architecture for Demo 1. • Survey MySQL vulnerabilities to identify suitable host versions and exploits. • Build infrastructure and simple visualization machinery. • Execute demonstration with hand-generated plan. • Build planning input model of domain. • Evaluate planner performance on domain model.
Demo 1 Architecture Snort Rules Append new rule After rule update Kill -HUP If(hb_sync_bad) { switch to next taster } Verter Tail alerts HB_sync, good/bad, Query Push Cache Snort RTS SQL Query HB_sync If(alert) Q=Qb Else Q=Qg Good | Bad Query Result If(hb_sync_good) { Replicate to all tasters } Tail xml High Events Replicator If(hb_sync_bad) { Send bad query to learning } Are we dead after this “good” query? Good Query Alert Distributor Write new snort rules via CIRCADIA proto Lead Taster Learning Tasters
Assumptions • Attacks take the Lead Taster off line. • We are now beginning to look at other forms of attacks. • The query just processed is responsible for failures. • Queries must be transactional in effect. • Required adding synchronous commits for non-transactional administrative commands that did, in fact, contain a vulnerability. • For “binary poisons,” we assume that preventing the final step of the attack is sufficient.
Before the Attack Tasters Bad Guy Snort Good Guy Replicator RTS (Executive) Verter
Bad Guy enters exploit… Before the Attack
After the Attack Lead Taster died New Lead Taster RTS detects failure and switches Lead, sends bad query to learning
Before the 2nd Attack Dead Taster is restarted Learner builds new tailored Snort rule
After the 2nd Attack Bad Guy enters exploit again… To no avail; system has learned to block bad query
Simple Planner Model for Demo 1 (def-temporal query-arrives :preconds ((query F)) :postconds ((query T)) :delay-distribution (uniform-distribution 10 20) :min-delay 10 ) (def-temporal query-stale :preconds ((query T)) :postconds ((failure T)) :delay-distribution (uniform-distribution 20 50) :min-delay 20 )
Planner Model (cont’d) (def-reliable process :preconds ((taster T) (query T)) :postconds ( (.5 (taster F) (query F) (hb-sync F)) (.5 (current F) (query F) (hb-sync T))) :delay-distribution (uniform-distribution 1 1) :cost 1 ) (def-action replicate-to-tasters :preconds ( (current F) (taster T) (backup T) ) :postconds ( (current T) ) :wcet 1 :cost 1 )
Planner Model • Goal: maximize Expected Utility (EU). • Rewards: maintain “(current T)” for 10 utils/tick. • Arbitrary duration: 200 ticks. • Maximum possible EU < 2000 (200 duration * 10 utils/tick) • Less than because some queries will arrive, incurring cost. • Planner uses goal-driven heuristic to derive plan. • Evaluates safety and EU performance of plan using simulation (sampling). • Backtracks/jumps to create new plans, directed by failures. • Not yet well-directed in search after non-failure plan found.
First Safe Plan Found Blue states satisfy goal. Two non-goal states. EU = 1880. Elapsed planning time: 800 milliseconds. If query kills taster, wait until next query arrives to switch tasters and rebuild the dead one.
12th Safe Plan Found • Only one non-goal state. • EU = 1940. • Elapsed planning time: 30 minutes. • Key: Switch tasters and restart backup server immediately, even though you are in the goal state. • Pre-position for eventuality of being pushed out of goal state and pre-arranging to speed restoration of goal state.
Improving the Planner • Local search (plan patching) based on heuristic guidance. • E.g.: If the current plan includes a multi-step chain to re-establish a maintenance goal, try to move one or more of the steps earlier, before the goal is violated. • Random restarts probably required to escape local maxima. • Investigate alternative solution method: map to MDPs. • Younes (CMU): Tempastic-DTP planner maps GSMDP problems to MDPs using phase-type distributions. • Exponential state space growth, but solution method is non-iterative.
Scalable Planner Evaluation Domains • In addition to demo-specific domains, we have built scalable test domain generators to provide rigorous evaluation metrics. • Expands test coverage to domains where utilities and probabilities determine success. • Include abstractions for important SRS domain characteristics. • Goal: help drive Cortex planner development by identifying relevant weaknesses.
Basic Abstractions • Each test consists of "games", revolving around a single "goal". • Dwell goals: per-tick reward for maintaining a feature in face of clobbering threats, e.g., providing a network service, while under attack. • Achievement goals: one-time reward for completing multi-step process, e.g., configuring a network. • Goals and threats can be combined to test scalability or the ability to make trade-offs.
Example Scalability Baseline • Domain: single dwell goal subject to N threats. • Threat delay: uniform distribution from 1 to 100. • Time-to-failure: 20 ticks. • Response time: 1 tick.
JUL 04 CORTEX – Mission-Aware Closed-Loop Cyber Assessment and Response NEW IDEAS Attacks, intrusions • System Reference Model including mission models drives intrusion assessment, diagnosis, and response. • Automatically search for response policies that optimize tradeoff of security against mission ops. • “Taste-tester” server redundancy supports robustness and learning from new attacks. Security Tradeoff Planner Computing services Networks, Computers Controller Synthesis Module Scyllarus Intrusion Assessment Active Security Controller Executive CIRCADIA IMPACT SCHEDULE • High confidence intrusion assessment and diagnosis. • Pre-planned automatic responses to contain and recover from faults and attacks. • Automatic tradeoffs of security vs. service level & accessibility. • Learns to recognize and defeat novel attacks. Demos: Mission Aware Demo Thin slice demo Learning Demo DEC 04 APR 05 DEC 05
Network Model Security Model Attack Models Audit Reports How Scyllarus Intrusion Detection Works Intrusion Reference Model H1 H2 Accidentally mis-configured application Intrusion in progress Likely Security Situation Hypotheses (Possible situations) Intrusions Attacks Audit report of communication attempt Audit report of unauthorized user Audit report of network probe Dynamic Evidence Aggregator
Daily Traffic Example Sifting Key Events from Raw Reports IDS-1 16,000 Raw Reports Interesting events Evidence Analysis IDS-2 Clustering Reports into Events Believable Interesting events 1000 10 4000 IDS-3 Uninteresting events
Security Tradeoff Planner Controller Synthesis Module Verifier Scheduler Projection/Synthesis Algorithm Threat Model Dynamics Model Action Model Active Security Controller Executive CIRCADIA Controller Synthesis Module Controller Synthesis Module reasons about models of goals, threats, cyberspace dynamics and actions to derive new sets of control rules online. • Timed automata models capture temporal constraints, probabilities. • Game theoretic view plus time: search for controller automaton while projecting adversary’s moves. • Temporal reasoning derives requirements on sensing/monitoring. • Formal methods verify controller behavior against policy requirements.
Controlled State Space Graph • Considers different orders of attacker actions, consistent with preconditions. • Factored, transition-based attacker model allows CIRCADIA to generalize beyond single-path characterization of a given attack script. • Includes sequences of CIRCADIA actions to prevent further damage and recover from current (non-goal) situations.
Motivation • Current computational mission (resources, tasks) affects: • Detection of attacks and failures. • Appropriate responses. • Existing intrusion detection and response does not incorporate knowledge of mission. • Thesis: mission awareness will enable Self-Regenerative System behavior.
Scyllarus A management and analysis system for network security monitoring: • Correlates reports from many disparate intrusion detectors to provide information useful to operating personnel or administrators. • Weighs evidence for/against intrusions to reduce false alarms. • Assesses intrusion events for plausibility and severity. • Discounts attacks against non-susceptible targets. • Consolidates and retains all report data for forensic investigation. • Maintains detector and system configuration information.
Process reports from a variety of intrusion detection sensors: Network, host, and hybrid. Commercial, open-source, research. Process substantial report volume: thousands of reports/hour. Provide significant reductions in report volume: thousands -> tens. Monitor sizeable networks Up to 1000 nodes with one system. Cluster and correlate reports from multiple sensors: More effective detection of stealthy attacks. Vast reduction in false alarms and noise. Categorize events for efficient review Plausibility, severity, utility of events. Discount attacks on unsusceptible targets. Retain events and reports in database for forensic analysis. Scyllarus Capability Summary
CIRCADIA Cooperative Intelligent Real-time Control Architecture for Dynamic Information Assurance • Autonomic defense for computing resources. • Adaptive monitoring. • Real-time reactive control responses. • Uses control-theoretic methods to automatically synthesize its control strategies, rather than relying on hand-built rules or other knowledge.
Automatically Synthesizing Security Control Systems Computational mission services ` Security Tradeoff Planner Networks, computers Controller Synthesis Module Intrusion Assessment Active Security Controller Executive CIRCADIA NEW IDEAS IMPACT • Automatic responses guaranteed to defeat intruders in real-time. • System derives appropriate responses for novel attack combinations. • Automatic tradeoffs of security and monitoring vs. service and accessibility. • Easier to deploy & maintain than manual rule bases. • Use control theory to derive appropriate response actions automatically. • Automatically tailor monitoring and responses according to mission, available resources, varying threats, and policies. • Reason explicitly about response time requirements to provide performance guarantees.
CORTEX Advances (Beyond Scyllarus) • Add mission modeling capability to form System Reference Model. • Incorporate propagation models to represent information flow and filtering components. • Enhance state assessment for mission awareness: • Mission affects expected sensor behavior. • Mission affects criticality of failures and attacks. • Bring state assessment fully online for soft real-time performance. • Stretch Goal: Retrospective revision of alerts based on new information.
CORTEX Advances (Beyond CIRCADIA) • Automatically map System Reference Model elements to planning problem for controller synthesis. • Develop new controller synthesis algorithms for qualitative probabilistic models, based on local search. • Develop meta-level control to focus and adjust response planning algorithms based on mission phasing and urgency of self-reconfiguration. • Interface to state assessment for real-time response.
CORTEX Advances (Learning) • Adapt existing concept drift algorithms to update surprise levels (qualitative probabilities) within the threat models. • Adapt performance profiles within the Mission models and Self (meta-level) models. • Develop strategies for preemptively testing resource capacities based on mission, self, and threat models. • Predict and test for failures and adapt before they are critical.
(def-action rebuild-taster :preconds ( (backup F) ) :postconds ( (backup T) ) :wcet 5 :cost 1 ) ;;; ************ problem def *********** (def-machine system-ops (query-arrives query-stale process ) ) (def-machine manage-system (send_to_learning_switch_tasterdb replicate-to-tasters rebuild-taster ) ) (def-maintenance-goal dbcurrent ;;:features ((current T)(taster T)(backup T)) :features ((current T)) :reward 10 ) (def-problem cortex-taster :version "$Revision: 1.2 $" :machines (system-ops manage-system ) :initial-states (scenario1-initial-state) :transitions () :goals (dbcurrent) ) (solve-problem cortex-taster)
;;; cortex-taster.lisp #| (defun t1 () (load "domains/taster/cortex-taster")) (set-verifier-mode :meu) (set-search-mode :forward) (setf *sim-maxtime* 200) (setf *max-utility* 2000) (setf *debug-list* NIL) (pushnew :top *debug-list*) (pushnew :csm *debug-list*) (pushnew :meu *debug-list*) (setf *max-number-of-intermediate-plans-considered* 10000) (setf *TEMPSWITCH-FIX-MC-SIM-CULPRIT-NO-OP-BUG* T) (setf *store-all-improved-plans* T) ;;(setf *check-all-plans-diff* T) ;;(setf *backjump-if-inferior* T) ;;(setf *cautious-culprit-match* T) (reset-randoms) ;; testing results stuff.... (setf *omit-no-ops* nil) ; a= first plan produced... (setf a (first (last *stored-plan-list*))) (setf b (first *stored-plan-list*)) (diff a b) (mapcar #'eu *stored-plan-list*) (mapcar #'elapsed-time *stored-plan-list*) (restore-stored-plan a) (davinci-draw-sim-reachable-states) (restore-stored-plan b) (davinci-draw-sim-reachable-states) |# (def-state scenario1-initial-state :features ((failure F) (query F) (current T) ; backups are current (taster T) ; taster is up (hb-sync T) ; last query was good (backup T) ; backup is up ) ) (def-temporal query-arrives :preconds ((query F)) :postconds ((query T)) :delay-distribution (uniform-distribution 10 20) :min-delay 10 ) (def-temporal query-stale :preconds ((query T)) :postconds ((failure T)) :delay-distribution (uniform-distribution 20 50) :min-delay 20 ) (def-reliable process :preconds ((taster T) (query T)) :postconds ( (.5 (taster F) (query F) (hb-sync F)) (.5 (query F) (hb-sync T) (current F))) :delay-distribution (uniform-distribution 1 1) :delay (make-range 1 1) :cost 1 ) ;;; ************ manage tasters ************** (def-action send-to-learning-switch-tasterdb :preconds ( (taster F) (backup T) ) :postconds ( (taster T ) (backup F) ) :wcet 1 :cost 1 ) (def-action replicate-to-tasters :preconds ( (current F) (taster T) (backup T)) :postconds ( (current T) ) :wcet 1 :cost 1 ) (def-action rebuild-taster :preconds ( (backup F) ) :postconds ( (backup T) ) :wcet 5 :cost 1 ) ;;; ************ problem def *********** (def-machine system-ops (query-arrives query-stale process ) ) (def-machine manage-system (send_to_learning_switch_tasterdb replicate-to-tasters rebuild-taster ) ) (def-maintenance-goal dbcurrent ;;:features ((current T)(taster T)(backup T)) :features ((current T)) :reward 10 ) (def-problem cortex-taster :version "$Revision: 1.2 $" :machines (system-ops manage-system ) :initial-states (scenario1-initial-state) :transitions () :goals (dbcurrent) ) (solve-problem cortex-taster)