280 likes | 397 Views
ASAP-SG: Summary. Project Description: Utility-driven, public-private collaborative project to develop system-level security requirements for smart grid technology Needs Addressed: Utilities: specification in RFP Vendors: reference in build process
E N D
ASAP-SG: Summary • Project Description: • Utility-driven, public-private collaborative project to develop system-level security requirements for smart grid technology • Needs Addressed: • Utilities: specification in RFP • Vendors: reference in build process • Government: assurance of infrastructure security • Commissions: protection of public interests • Approach: • Architectural team produce material • Usability Analysis team assess effectiveness • NIST, UtiliSec review, approve • Deliverables: • Strategy & Guiding Principles white paper • Security Profile Blueprint • 6 Security Profiles • Usability Analysis Schedule: June 2009 – June 2012 Budget: $3M/year ($1.5M Utilities + $1.5M DOE) Performers: Utilities, EnerNex, Inguardians, SEI, ORNL Partners: DOE, EPRI Release Path: NIST, UCAIug Contacts: Bobby Brown bobby@enernex.com Darren Highfill darren@utilisec.com
ASAP-SG Security Profiles • Advanced Security Acceleration Project for the Smart Grid • Prescriptive, actionable guidance • How to build-in and implement security • Tailored to a set of specific smart grid functions, such as • Advanced Metering Infrastructure • Third Party Data Access • Distribution Management • WAMPAC (Synchrophasors) • Substation Automation • Home Area Networks COMPLETE COMPLETE COMPLETE COMPLETE IN PROGRESS PROPOSED
Methods from reliability engineering and their application to cyber-security James Nutaro Oak Ridge National Laboratory nutarojj@ornl.gov
Outline • Failure identification • State transition systems • Applications • Failure likelihood • Markov models • Applications • Consequence assessment • Dynamic models • Applications
Failure identification • We will use state transition models to • Enumerate failures of a systems • Prioritize failures • Determine failure modes to high priority failures • Device security controls to negate failure modes
What is a state transition system? • A state transition systems has • A set of state variables • A range for each state variable • A state is an assignment of values to the state variables • A transition is a change of state • An trajectory of the system is a sequence of states (or, equivalently, a sequence of transitions)
An example of a simple data processing systems, part 1 • Two state variables: • data • activity • The data state variable • Describes if data is presently available for the system to process • Range is noneand present • The activity state variable • Describes what the system is doing • Range is idleand active
An example of a simple data processing systems, part 2 • This system has four states • It has sixteen possible transitions • Acceptable transitions are shown in the figure • The system is designed for executions that involve only these transitions data=none activity=idle data=present activity=idle data=none activity=active data=present activity=active
An example of a simple data processing systems, part 3 • Unacceptable transitions are shown in this figure • Any execution that includes one of these transitions is a failure – something went wrong data=none activity=idle data=present activity=idle data=none activity=active data=present activity=active
Enumerating failure transitions • The simplest failure is a trajectory that consists of a single unacceptable transition • Call this simplest failure a failure transition • We can enumerate these transitions • Given N states, there are N*N possible transitions • M of these occur by design • The remaining N*N – M are failure transitions
An example of a simple data processing systems, part 4 data=none activity=idle data=present activity=idle Failure transition Acceptable transition data=none activity=active data=present activity=active
Failure modes • Each failure transition has, in general, several causes • These causes are the failure modes for that failure transition data=none activity=idle A failure mode Driver for the network card incorrectly signals the arrival of a data packet data=none activity=active
Security controls • Security controls are designed to mitigate, negate, or otherwise render implausible one or more failure modes A failure mode Driver for the network card incorrectly signals the arrival of a data packet A security control Require all drivers to be signed and then verified upon loading by the OS kernel data=none activity=idle data=none activity=active
Which failures to address? • Most useful models will be much larger than our example • As the number of states grows, the number of failures grows as the square of that • Thousands upon thousands of failure transitions • It is infeasible to address all of them • One solution • Create a rule for prioritizing failures • Generate prioritized list based upon rule and model • Start at the top • Stop when out of time, money, or have met a coverage criteria (e.g., top 10% of failures have been addressed)
Failure likelihood • We will extend state transition models to • Estimate the probability of a failure • Use this as a tool for • prioritization • Estimating the benefit of a security control • Markov chains will be our primary tool
Markov chain 0.5 • State transition model plus a probability for each transition • Sum of probabilities on the transitions away from a state must equal 1 • Right is an example with two states 0.5 0.9 0.1
Basic likelihood assessment • The probability of particular failure transition occurring during an arbitrary execution is calculated by simulation • Start in the initial state for the model • Select a transition at random based on the probabilities of the outgoing transitions • Repeat until satisfied (e.g., confidence interval is sufficiently small) • Probability of particular transition is the number of times it was taken divided by the total number
Other types of assessments • Ranking of first failures • What are my most likely problems? • For each failure transition, calculate the likelihood that it will be encountered first during an execution of the system • Mean transitions to fail • How long until I encounter a problem? • Determine the average number of acceptable transitions prior to the first failure transition
Security controls • A security control reduces the likelihood of the failure transition that it addresses A failure mode Driver for the network card incorrectly signals the arrival of a data packet A security control Require all drivers to be signed and then verified upon loading by the OS kernel data=none activity=idle data=none activity=active
Challenges • Probabilities are difficult to come by in practice • But there may be sufficient data to make a good guess • e.g., how likely is it that without authentication you will be subject to an unauthorized user? • e.g., how likely is this is you use a particular password policy? • Lots of real world experience to build statistics from here; possibly sufficient data in other cases • Analysis can be quite involved (i.e., expensive in terms of time and dollars)
Rewards • A tool for guiding investment in cyber-security • To what extent does a security control reduce my likelihood of a system failure? • Is the reduction worth the cost? • How much is enough? Are my expected failure rates acceptable?
Consequence assessment • We will extend state transition models to • Include time and dynamics • Use this as a tool for • Estimating the likelihood of unwanted physical effects • Determining performance requirements for security solutions • Assessing risk • Discrete event models will be our primary tool
Discrete event model • All the elements of state transition and Markov models plus • Interactions with the outside world (e.g., the system being controlled) • Evolution through time Input Output State
Method for consequence assessment Discrete event model of computer system Dynamic model of system under control
Uses of a combined model • Links failure analysis to physical consequences • Questions that might be answerable: • Which failures pose the biggest risk in terms of physical outcomes? • How is my risk related to the speed with which I can find and remove an intruder? • How does a particular security solution affect these risks?
Challenges • Performance characteristics for some security solutions may be difficult to obtain • For example, how quickly does an intrusion detection system find an intruder? • How quickly can I remove that intruder? • Analysis can be very involved (i.e., very expensive in terms of time and dollars)
Rewards • A tool for both understanding risk and guiding investment in cyber-security • To what extent does a security control reduce my risk? • Is the reduction in risk worth the cost? • How much is enough? Are my expected risks acceptable?