390 likes | 520 Views
Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness. Swapna Gokhale ssg@engr.uconn.edu Asst. Professor of CSE, University of Connecticut, Storrs, CT.
E N D
Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness Swapna Gokhale ssg@engr.uconn.edu Asst. Professor of CSE, University of Connecticut, Storrs, CT Aniruddha Gokhale a.gokhale@vanderbilt.edu Asst. Professor of EECS, Vanderbilt University, Nashville, TN Presented at NSWC Dahlgren, VA April 13, 2005
Focus: Distributed Performance Sensitive Software (DPSS) Systems • Military/Civilian distributed performance-sensitive software systems • Network-centric & larger-scale “systems of systems” • Stringent simultaneous QoS demands • e.g., dependability, security, scalability, thruput • Dynamic context
Context: Trends in DPSS Development • Historically developed using low-level APIs • Increasing use of middleware technologies • Standards-based COTS middleware helps to: • Control end-to-end resources & QoS • Leverage hardware & software technology advances • Evolve to new environments & requirements • Middleware helps capture & codify commonalities across applications in different domains by providing reusable & configurable patterns-based building blocks • Key Observation • DPSS systems are composed of patterns-based building blocks • Observed quality of service depends on the right composition Examples: CORBA, .Net, J2EE, ICE, MQSeries Patterns: Gang of Four, POSA 1,2 & 3
Talk Outline • Motivation • Use of Performance Analysis Methods for System design • Use of Performance Analysis Methods for Improving Cyber Trust • Planned Future Work • Concluding Remarks
Problem 1: Variability in Middleware Although middleware provides reusable building blocks that capture commonalities, these blocks and their compositions incur variabilities that impact performance in significant ways. • Per Building Block Variability • Incurred due to variations in implementations & configurations for a patterns-based building block • E.g., single threaded versus thread-pool based reactor implementation dimension that crosscuts the event demultiplexing strategy (e.g., select, poll, WaitForMultipleObjects • Compositional Variability • Incurred due to variations in the compositions of these building blocks • Need to address compatibility in the compositions and individual configurations • Dictated by needs of the domain • E.g., Leader-Follower makes no sense in a single threaded Reactor
Solution Approach: Applying Performance Analytical Models for DPSS Design workload workload Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems • Build and validate performance models for invariant parts of middleware building blocks • Weaving of variability concerns manifested in a building block into the performance models • Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems • Estimate end-to-end performance of composed system • Iterate until design meets performance requirements Composed System Refined model of a pattern Refined model of a pattern Refined model of a pattern Invariant model of a pattern Refined model of a pattern Refined model of a pattern weave weave variability variability Refined model of a pattern Refined model of a pattern system
Problem 2: Benign/Intentional Disruptions • Terrorist threats and/or malicious users can bring down a cyber infrastructure • Normal failures (hardware and software) could also disrupt the cyber infrastructure • Existing disruption detection techniques use low-level trace data that is agnostic about the application • Application-specific disruption detection is expensive • Need a reusable middleware-based solution
Solution Approach: Design-Time Performance Analysis for Disruption Detection • Identify the service profile, which consists of the modes of operation of the service • which uses the building block. • Estimate the likelihood or occurrence probabilities of each mode of operation. • Estimate the values of the input parameters for each mode of operation. • Obtain the values of the performance metrics for each mode of operation by • solving the SRN model. • Compute the expected estimates of the performance metrics, as the weighted • sum of the performance metrics for each mode, with weights given by the • occurrence probabilities of each mode.
Algorithm: Design-Time Performance Analysis for Disruption Detection • Compute performance metrics for each observation window. • Summarize performance metrics for several past observation windows using exponential moving average. • Approximate weight of each window determined by smoothing constant. • Compute an anomaly score for each performance metric using the chi-square test. • Bayesian network to correlate the anomaly scores computed using each performance metric to obtain an overall anomaly score for the building block as a whole. • Correlate the anomaly scores of different building blocks, residing possibly at different layers to obtain the anomaly score of the service. • Hierarchical correlation of anomaly scores to reduce the false positives.
Algorithm: Runtime Requirements Context • Low-level data logging is application agnostic • Application-level logging is expense to implement and cannot access low-level logs Needs • Operational data collection at multiple layers of DPSS system • Provide data logging as reusable middleware feature • QoS-driven, configurable and selective data logging e.g., based on throughput, queue delays, event losses • Collected data corresponds to QoS policy e.g., number of events, their priorities, lost events, missed deadlines
Case Study: The Reactor Pattern Reactor Event Handler * handle_events() register_handler() remove_handler() dispatches handle_event () get_handle() * owns Handle * notifies handle set <<uses>> Concrete Event Handler A Concrete Event Handler B Synchronous Event Demuxer handle_event () get_handle() handle_event () get_handle() select () The Reactor architectural pattern allows event-driven applications to demultiplex & dispatch service requests that are delivered to an application from one or more clients. • Many networked applications are developed as event-driven programs • Common sources of events in these applications include activity on an IPC stream for I/O operations, POSIX signals, Windows handle signaling, & timer expirations • Reactor pattern decouples the detection, demultiplexing, & dispatching of events from the handling of events • Participants include the Reactor, Event handle, Event demultiplexer, abstract and concrete event handlers
Reactor Dynamics : Main Program : Concrete : Reactor : Synchronous Event Handler Event Demultiplexer Con. Event Events register_handler() Handler get_handle() Handle handle_events() Handles select() event handle_event() Handles service() • Registration Phase • Event handlers register themselves with the Reactor for an event type (e.g., input, output, timeout, exception event types) • Reactor returns a handle it maintains, which it uses to associate an event type with the registered handler • Snapshot Phase • Main program delegates thread of control to Reactor, which in turn takes a snapshot of the system to determine which events are enabled in that snapshot • For each enabled event, the corresponding event handler is invoked, which services the event • When all events in a snapshot are handled, the Reactor proceeds to the next snapshot
Case Study: Virtual Router • Virtual router is used to scale virtual private networks • Differentiated services for different VPNs – security is key requirement along with scalability and dependability • Illustrates demultiplexing and dispatching semantics of the Reactor pattern
Characteristics of the Reactor Performance Model • Single-threaded, select-based Reactor implementation • Reactor accepts two types of input events, with one event handler registered for each event type with the Reactor • Each event type has a separate queue to hold the incoming events. Buffer capacity for events of type one is N1 and of type two is N2. • Event arrivals are Poisson for type one and type two events with rates l1and l2. • Event service time is exponential for type one and type two events with rates m1and m2. • In a snapshot, event of type one is serviced with a higher priority over event of type two. • - Event handles corresponding to both types of events are enabled, event handle of type one event is serviced prior to event handle of type two event.
Performance Metrics for the Reactor • Throughput: • -Number of events that can be processed • -Applications such as telecommunications call processing. • Queue length: • -Queuing for the event handler queues. • -Appropriate scheduling policies for applications with real-time requirements. • Total number of events: • -Total number of events in the system. • -Scheduling decisions. • -Resource provisioning required to sustain system demands. • Probability of event loss: • -Events discarded due to lack of buffer space. • -Safety-critical systems. • -Levels of resource provisioning. • Response time: • -Time taken to service the incoming event. • -Bounded response time for real-time systems.
Using Stochastic Reward Nets (SRNs) for Performance Analysis • Stochastic Reward Nets (SRNs) are an extension to Generalized Stochastic Petri Nets (GSPNs) which are an extension to Petri Nets. • Extend the modeling power of GSPNs by allowing: • Guard functions • Marking-dependent arc multiplicities • General transition probabilities • Reward rates at the net level • Allow model specification at a level closer to intuition. • Solved using tools such as SPNP (Stochastic Petri Net Package).
Modeling the Reactor using SRN A2 A1 StSnpSht N2 N1 B2 B1 T_SrvSnpSht T_EndSnpSht Sn1 Sn2 S2 S1 SnpShtInProg Sr2 (a) (b) Sr1 • Part A: • Models arrivals, queuing, and prioritized service of events. • Transitions A1 and A2: Event arrivals. • Places B1 and B2: Buffer/queues. • Places S1 and S2: Service of the events. • Transitions Sr1 and Sr2: Service completions. • Inhibitor arcs: Place B1and transition A1 with multiplicity N1 (B2, A2, N2) • - Prevents firing of transition A1 when there are N1 tokens in place B1. • Inhibitor arc from place S1 to transition Sr2: • - Offers prioritized service to an event of type one over event of type two. • - Prevents firing of transition Sr2 when there is a token in place S1.
Reactor SRN: Taking a Snapshot A2 A1 StSnpSht N2 N1 B2 B1 T_SrvSnpSht T_EndSnpSht Sn1 Sn2 S2 S1 SnpShtInProg Sr2 (a) (b) Sr1 • Part B: • Process of taking successive snapshots • Sn1 enabled: Token in StSnpSht & Tokens in B1 & No Token in S1. • Sn2 enabled: Token in StSnpSht & Tokens in B2 & No Token in S2. • T_SrvSnpSht enabled: Token in S1 or/and S2. • T_EndSnpSht enabled: No token in S1 and S2. • Sn1 and Sn2 have same priority • T_SrvSnpSht lower priority than Sn1 and Sn2
Reactor SRN: Initial Marking A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Initial marking: • StSnpSht = 1, B1 = 2, B2 = 2, S1 = 0, S2 = 0 • Transitions enabled: Sn1 and Sn2 • Sn1 fires.
Reactor SRN: Firing a Transition (1/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of Sn1: • StSnpSht = 1, B1 = 1, B2 = 2, S1 = 1, S2 = 0 • Transitions enabled: Sr1, Sn2, T_SrvSnpSht • Sn2 and T_SrvSnhpSht are immediate transitions, have to fire before Sr1. • T_SrvSnpSht has a lower priority over Sn2. • Sn2 fires.
Reactor SRN: Firing a Transition (2/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of Sn2: • StSnpSht = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1 • Transitions enabled: Sr1, T_SrvSnpSht • T_SrvSnhpSht is an immediate transition, has to fire before Sr1. • T_SrvSnpSht fires.
Reactor SRN: Firing a Transition (3/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of T_SrvSnpSht: • TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 1, S2 = 1 • Transitions enabled: Sr1 • Snapshot in progress. • Sr1 fires.
Reactor SRN: Firing a Transition (4/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of Sr1: • TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 1 • Transitions enabled: Sr2 • Snapshot in progress • Sr2 fires.
Reactor SRN: Firing a Transition (5/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of Sr2: • TSnpShtInProg = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0 • Transitions enabled: T_EndSnpSht • End of snapshot • T_EndSnpSht fires
Reactor SRN: Firing a Transition (6/6) A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Upon firing of T_EndSnpSht: • StSnpSht = 1, B1 = 1, B2 = 1, S1 = 0, S2 = 0 • Transitions enabled: Sn1 and Sn2 • Back to initial state
Reactor SRN: Performance Measures Reward rate assignments to compute performance measures A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Throughput: Rate of firing of transitions Sr1 (Sr2). • Queue length: Number of tokens in place B1 (B2). • Total number of events: Sum of the tokens in places B1 & S1 (B2 & S2) • Probability of event loss: Number of tokens in place B1 == N1 (B2 == N2) • Response time: Can be obtained using the tagged customer approach. • SRN model solved using Stochastic Petri Net Package (SPNP) to obtain • estimates of performance metrics.
Designing DPSS Systems using SRNs A2 A1 N2 StSnpSht N1 B2 B1 Sn1 T_SrvSnpSht Sn2 T_EndSnpSht S2 S1 SnpShtInProg Sr2 Sr1 (a) (b) • Initial Step • Obtain performance measures for individual patterns-based building blocks • Iterative Algorithm • Compose systems vertically and horizontally to form a DPSS system • Determine performance measures for specified workloads and service times • Alter the configurations until DPSS performance meets specifications.
VR SRN: Performance Model • VR provides VPN service to two organizations. • Each organization has a customer edge router (CE) connected to the VR • Employees of each organization issue connection set up and tear down requests: • - Employees classified into two categories: Technical & Administrative • Differentiated level of service: • - Technical employees receive prioritized service over admin. employees • Reactor pattern could be used to (de)multiplex these events: • - Requests from tech. employees constitute event #1 (l1, m1, N1) • - Requests from admin. employees constitute event #2 (l2, m2, N2) • SRN model of the Reactor could be used to obtain estimates of performance • metrics.
N1 = N2 = 1 N1 = N2 = 5 Perf. metric #1 #2 #1 #2 Throughput 0.37/s 0.37/s 0.40/s 0.40/s Queue length 0.065 0.065 0.12 0.12 Total events 0.25 0.27 0.32 0.35 Loss probab. 0.065 0.065 .00026 .00026 VR SRN: Performance Estimates • SRN model solved using Stochastic Petri Net Package (SPNP) to obtain • estimates of performance metrics. • Parameter values: l1 = 0.5/sec, l2 =0.5/sec, m1 = 2.0/sec, m2 =2.0/sec. • Two cases: N1 = N2 = 1, and N1 = N2 = 5. • Observations: • Probability of event loss is higher when the buffer space is 1 • Total number of events of type two is higher than type one. • Events of type two stay in the system longer than events of type one. • May degrade the response time of event requests for admin. employees • compared to requests from technical employees
VR SRN: Sensitivity Analysis • Analyze the sensitivity of performance metrics to variations in input • parameter values. • Varyl1 from0.5/secto2.0/sec. • Values of other parameters: l2 =0.5/sec, m1 = 2.0/sec, m2 =2.0/sec, N1 = N2 = 5. • Compute performance measures for each one of the input values. • Observations: • Throughput of event requests from technical employees increases, • but rate of increase declines. • Throughput of event requests from admin employees remains unchanged.
VR SRN: Expected Behavior • VPN service has two modes of operation: normal & inclement. • Normal mode: • - Daily basis, some employees have negotiated telecommute plans and • use VPN for remote access. • Inclement mode: • - Hazardous driving conditions due to bad weather may keep people at home. • - Large number of telecommuters • - Increase in the connection set up and tear down requests. • Modes of operation can be defined at a finer level of granularity, such as • a few hours, rather than a day.
Perf. Metric Normal Inclement Average Event #1 Throughput 0.40/s 0.90/sec 0.4510/s Queue length 0.12 1.86 0.2940 Loss probab. 0.09 0.21 0.0291 VR SRN: Expected Behavior • Normal mode: • - l1 = 0.5/sec, l2 =0.5/sec, m1 = 2.0/sec, m2 =2.0/sec, N1 = N2 = 5 • - Probability – 0.9 • Inclement mode: • - l1 = 1.0/sec, l2 =1.0/sec, m1 = 2.0/sec, m2 =2.0/sec, N1 = N2 = 5 • - Probability – 0.1
VR SRN: Disruption Detection • Obtain an anomaly score for the Reactor based on each one of the • performance metrics for each event type. • Correlate the anomaly scores based on each event type to obtain an overall • anomaly score for the Reactor. • - Anomaly score for the Reactor used at each CE to demultiplex events from • two groups within a single organization. • Anomaly score for the Reactor in the VR used to demultiplex events from the • two organizations. • Correlate the anomaly score of the Reactor in the VR with the score of the • Reactor in CE #1 to determine service disruptions for organization #1. • Correlate the anomaly score of the Reactor in the VR with the score of the • Reactor in CE #2 to determine service disruptions for organization #2. • Source of disruption may be identified by correlating the scores at various layers.
Future Collaborative Research • Performance analysis methodology (UConn – S. Gokhale) • Develop and validate performance models for invariant characteristics of building blocks. • Compose and validate performance models for common building block compositions. • Develop model decomposition and solution strategies to alleviate state-space explosion issue. • Model-driven generative methodology (Vanderbilt – A. Gokhale) • Manually developing performance models of each block with its variations is cumbersome • Compositions of building blocks cannot be made in ad hoc, arbitrary manner • Model-driven generative tools use visual modeling languages and model interpreters to automate tedious tasks and provide “correct-by-construction” development • Aspect-oriented methodology (Univ of Alabama, Birmingham – J. Gray) • Variability in building blocks and compositions is a primary candidate for separating the concern as an aspect • Aspect weaving technology can be used to refine and enhance the models by weaving in the concerns into the performance models
Concluding Remarks • DPSS systems becoming increasingly complex • Increasing use of middleware technologies • Middleware resolves many challenges of DPSS development but also incurs many variability challenges due to their flexibility • Need to estimate performance early in development lifecycle • Goal is to use model-based performance analysis, model-driven generative techniques and aspect weaving to build middleware stacks whose performance can be estimated at design-time • Performance analysis can also be used to improve cyber trustworthiness • www.cse.uconn.edu/~ssg (Swapna Gokhale) • www.dre.vanderbilt.edu/~gokhale (Aniruddha Gokhale) • www.gray-area.org (Jeff Gray)
Dual Use of Performance Analysis Techniques for System Design and Improving Cyber Trustworthiness Swapna Gokhale ssg@engr.uconn.edu Asst. Professor of CSE, University of Connecticut, Storrs, CT Aniruddha Gokhale a.gokhale@vanderbilt.edu Asst. Professor of EECS, Vanderbilt University, Nashville, TN Jeffrey Gray gray@cis.uab.edu Asst. Professor of CIS, University of Alabama, Birmingham, AL Presented at NSWC Dahlgren, VA April 13, 2005
Solution: A New Approach to DPSS Design MIDDLEWARE Applying design-time performance analysis techniques to estimate the impact of variability in middleware-based DPSS systems • Build and validate performance models for invariant parts of middleware building blocks • Weaving of variability concerns manifested in a building block into the performance models • Compose and validate performance models of building blocks mirroring the anticipated software design of DPSS systems • Estimate end-to-end performance of composed system • Iterate until design meets performance requirements • Submissions to Sigmetrics 2005, HPDC 2005, Globecom 2005, IAWS 2005 • Planned submission to SRDS 2005, ISSRE 2005