410 likes | 669 Views
Logic-based, data-driven enterprise network security analysis. Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University. COS 598D: Formal Methods in Networking Princeton University March 08, 2010. Self Introduction. Brief Bio PhD, Princeton University, 2005
E N D
Logic-based, data-driven enterprise network security analysis Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University COS 598D: Formal Methods in Networking Princeton University March 08, 2010
Self Introduction • Brief Bio • PhD, Princeton University, 2005 • Post-doc, Purdue CERIAS, Idaho National Laboratory, 2006 • Assistant Professor, Kansas State University, 2006-now • Research Interests • Computer and network security, especially on formal and quantitative analysis • Programming languages, formal methods • Research Group • Argus: http://people.cis.ksu.edu/~xou/argus/
Overview of the two lectures • Lecture One • Datalog model for network attacks • SLG resolution for Datalog evaluation • Exhaustive proof generation for Datalog • Lecture Two • Formulating security hardening problem as a SAT solving problem • Applying MinCostSAT to achieve optimal security configuration • Open research problems
Reasoning System Apache 1.3.4 bug! Cyber Defender’s Life Automated Situation Awareness Users and data assets IDS alerts Network configuration Vulnerability reports Security advisories
Multi-step Attacks Internet Firewall 1 buffer overrun Demilitarized zone (DMZ) webServer Firewall 2 NFS shell sharedBinary Trojan horse workStation Corporation webPages fileServer
Two Questions • Are there potential attack paths in the system? • How can they happen? • How can they be addressed in an optimal way? • Are there attacks that are going on/have succeeded in the system? • How do you know? • How to counter the attack? What we are going to focus on
MulVAL Could root be compromised on any of the machines? User information Ou, Govindavajhala, and Appel. Usenix Security 2005 Datalog Rules from Security Experts Vulnerability Information (e.g. NIST NVD) Analyzer Answers Vulnerability definition (e.g. OVAL, Nessus Scripting Language) Vulnerability Scanner Vulnerability Scanner Network reachability information Network Analyzer
Host access-control lists reachable(internet, webServer, tcp, 80) reachable(webServer, fileserver, nfs, -) . . . Network config (firewall analyzer)
File permissions fileOwner(webServer, /bin/apache, root) fileAttr(webServer, /bin/apache, r,w,x,r,0,0,r,0,0) Host config scanner
Installed software … … vulExists(dbServer, 'CVE-2009-2446', mySQL). vulExists(webserver, ‘CVE-2006-3747’, httpd) Host-based vulnerability scanner
US-CERT NVD Apache 1.3.4 bug! Security advisories … … vulProperty('CVE-2009-2446', remote, privEscalation). vulProperty('CVE-2006-3747', remote, privEscalation).
Datalog Rules Linux security behavior; Windows security behavior; Common attack techniques execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port). Security expert The rules are completely independent of any site-specific settings.
Rule for NFS accessFile(Server, Access, Path) :- nfsExport(Server, Path, Access, Client), reachable(Client, Server, nfs, -), execCode(Client, _Perm). dmz webServer NFS shell sharedBinary corp webPages fileServer
Rule for Trojan Horse execCode(H, User) :- accessFile(H, write, Path), fileOwner(H, Path, User). projectPlan sharedBinary Trojan horse corp webPages fileServer workStation
Deducing new facts Oops! execCode(attacker, webServer, apache). execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port). internet networkAccess(webServer, tcp, 80). Derived Firewall 1 serviceRunning(webServer, httpd, tcp, 80, apache). From Vulnerability Scanner webServer dmz vulExists(webServer, httpd, remote, privilegeEscalation). From Vulnerability Scanner & NVD
Advantages of using Prolog • Prolog’s goal-oriented evaluation is potentially more efficient. • Prolog provides more programming flexibility. Can we evaluate Datalog programs in Prolog?
However… • Prolog as a programming language cannot be directly used to evaluate Datalog ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). parent(bill,mary). parent(mary,john). ?- ancestor(X,Y).
However… • Prolog as a programming language cannot be directly used to evaluate Datalog ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z). parent(bill,mary). parent(mary,john). ?- ancestor(X,Y).
However… • Prolog as a programming language cannot be directly used to evaluate Datalog ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z). ancestor(X,Y) :- parent(X,Y). parent(bill,mary). parent(mary,john). ?- ancestor(X,Y).
Problem of SLD resolution ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y). parent(bill,mary). parent(mary,john). • ancestor(X, Y). • parent(X,Y). • parent(X,Z), ancestor(Z,Y). • X=bill • Y=mary • X=mary • Y=john • X=mary • Z=john • X=bill • Z=mary • • Success ancestor(john,Y). • • Success ancestor(mary,Y). • … • Failure parent(mary,Y). parent(mary,Z2), ancestor(Z2,Y). • Y=john • Z2=john ancestor(john,Y). • • Success • … • Failure
Problem of SLD resolution ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z). ancestor(X,Y) :- parent(X,Y). parent(bill,mary). parent(mary,john). • ancestor(X, Y). • ancestor(Z, Y), parent(X, Z). • ancestor(Z1, Y), parent(Z, Z1), parent(X, Z). • ancestor(Z2, Y), parent(Z1, Z2), parent(Z, Z1), parent(X, Z). …
Problem of SLD resolution • Termination of cyclic Datalog programs not only depends on logical semantics, but also the order of the clauses and subgoals. • This creates problems since in network security analysis, such cyclic rules are common place. • e.g. after compromising one machine, the attacker can use it as a stepping stone to compromise another. • Datalog is a declarative language; thus order should not matter. • A pure Datalog program shall always terminate due to the bound on the number of tuples.
Bottom-up Evaluation ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z). ancestor(X,Y) :- parent(X,Y). parent(bill,mary). parent(mary,john). Semi-naïve Evaluation: Step(1) (base case)ancestor(bill,mary),ancestor(mary,john) Step(2)Iteration 1ancestor(bill, john) Iteration 2No new tuples (“fixpoint”)
SLG Resolution • Goal-oriented evaluation • Predicates can be “tabled” • A table stores the evaluation results of a goal. • The results can be re-used later, i.e. dynamic programming. • Entering an active table indicates a cycle. • Fixpoint operation is taken at such tables. • The XSB system implements SLG resolution • Developed by Stony Brook (http://xsb.sourceforge.net/ ). • Provides full ISO Prolog compatibility.
SLG resolution example ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z). ancestor(X,Y) :- parent(X,Y). parent(bill,mary). parent(mary,john). generator node new table created for ancestor(X,Y) • ancestor(X, Y). active node resolve ancestor(Z,Y) against the results in the table for ancestor(X,Y) • parent(X,Y). • ancestor(Z, Y), parent(X, Z). • Z=bill • Y=john • X=bill • Y=mary • X=mary • Y=john • Z=bill • Y=mary • Z=mary • Y=john • parent(X, bill). • • Success • • Success • Failure • parent(X, bill). • parent(X, mary). • Failure • • Success X=bill
SLG in MulVAL netAccess(H2, Protocol, Port) :- execCode(H1, User), reachable(H1, H2, Protocol, Port). netAccess(…) execCode(…) from input tuples Possible instantiations Possible instantiations table for first subgoal table for goal
SLG complexity for Datalog • Total time dominated by the rule that has the maximum number of instantiations • Time for computing one table = Computation of the subgoals + retrieving information from input tuples + matching results in the rules bodies • Time for computing all tables = retrieving information from input tuples + matching results in the rules’ bodies • See “On the Complexity of Tabled Datalog Programs” http://www.cs.sunysb.edu/~warren/xsbbook/node21.html
MulVAL complexity in SLG execCode(Attacker, Host, User) :- vulExists(Host, _, Program, remote, privilegeEscalation), networkService(Host, Program, Protocol, Port, User), netAccess(Attacker, Host, Protocol, Port). Scale with network size O(N) different instantiations
MulVAL complexity in SLG netAccess(Attacker, H2, Protocol, Port) :- execCode(Attacker, H1, _), reachable(H1, H2, Protocol, Port). Scale with network size Complexity of MulVAL O(N2) different instantiations
Datalog proof generation • In security analysis, not only do we want to know what attacks could happen, but also we want to know how attacks can happen • Thus, we need more than an yes/no answer for queries. • We need the proofs for the true queries, which in the case of security analysis will be attack paths. • We also want to know all possible attack paths; thus we need exhaustive proof generation.
An obvious approach execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port). execCode(Host, PrivilegeLevel, Pf) :- vulExists(Host, Program, remote, privilegeEscalation, Pf1), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel, Pf2), networkAccess(Host, Protocol, Port, Pf3), Pf=(execCode(Host, PrivilegeLevel), [Pf1, Pf2, Pf3]). This will break the bounded-term property and result in non-termination for cyclic Datalog programs
XSB reasoning engine Datalog Proof Steps MulVAL Attack-Graph Toolkit Ou, Boyer, and McQueen. ACM CCS 2006 Datalog rules Security advisories Translated rules Graph Builder Network configuration Datalog representation Datlog proof graph Machine configuration Joint work with Idaho National Laboratory
Stage 1: Record Proof Steps netAccess(H2, Protocol, Port, ProofStep) :- execCode(H1, User), reachable(H1, H2, Protocol, Port), ProofStep= because( ‘multi-hop network access', netAccess(H2, Protocol, Port), [execCode(H1, User), reachable(H1, H2, Protocol, Port)] ). Proof step
1 0 2 3 Stage 2: Build the Exhaustive Proof because(‘multi-hop network access', netAccess(fileServer, rpc, 100003), [execCode(webServer, apache), reachable(webServer, fileServer, rpc, 100003)]) execCode(webServer, apache) multi-hop network access netAccess(fileServer, rpc, 100003) reachable(webServer, fileServer, rpc, 100003)
Complexity of Proof Building • O(N2) to complete Datalog evaluation • With proof steps generated • O(N2) to build a proof graph from proof steps • Need to build O(N2) graph components • Building of one component • Find the predecessor: table lookup • Find the successors: table lookup Total time: O(N2), if table lookup is constant time
1 0 2 3 4 5 6 NFS shell Logical Attack Graphs accessFile(attacker,fileServer, write,/export) Trojan horse installation netAccess(attacker,webServer, tcp,80) NFS semantics Remote exploit execCode(attacker, webServer,apache) accessFile(attacker,workStation, write,/usr/local/share) vulExists(webServer, CAN-2002-0392, httpd, remoteExploit, privEscalation) execCode(attacker,workStation,root) : OR : AND networkService (webServer,httpd,tcp,80,apache) : ground fact
Related Work • Sheyner’s attack graph tool (CMU) • Based on model-checking • Cauldron attack graph tool (GMU) • Based on graph-search algorithms • NetSPA attack graph tool (MIT LL) • Graph-search based on a simple attack model
Advantages of the Logic-programming Approach • Publishing and incorporation of knowledge/information through well-understood logical semantics • Efficient and sound analysis by leveraging the reasoning power of well-developed logic-deduction systems
Next Lecture • How to make use of the proof graph • Optimizing mitigation measures through SAT solving • Open problems • Uncertainty in reasoning