160 likes | 423 Views
Fault and Intrusion Tolerant (FIT) Event Broker & BFT- SMaRt. A. Casimiro , D. Kreutz , A. Bessani , J. Sousa, I. Antunes, P . Veríssimo University of Lisboa, Portugal Meeting PT, November 27, 2012. Cloud Infrastructures. Switching and Routing. Control. Events. Events.
E N D
Fault and Intrusion Tolerant (FIT) Event Broker& BFT-SMaRt A. Casimiro, D. Kreutz, A. Bessani, J. Sousa, I. Antunes, P. Veríssimo Universityof Lisboa, Portugal Meeting PT, November 27, 2012
Cloud Infrastructures Switching and Routing Control Events Events Monitoring Tools and Control Engines Control Alert!Cloud infrastructures are one of the new hot targets of attacks! Events Control Control Events Storage farm Processing farm
Example scenario:Portugal Telecom Cloud Computing Infrastructure • SmartCloud product • First and main problem: • Centralized monitoring approach • Diversity of monitoring tools • ArchSight, Pulse, SCOM Problems: faults and attacks; diversity is hard to achieve in practice. ArcSight or other tool Agent-Based Events Agent with ArchSight Events Events ArcSight (engine) Agentless Events Events Monitoring Probe
The TRONE approach FaultandIntrusionTolerant (FIT) Event Broker AutomatedFailureDiagnosis Multi-homing for fastreconfiguration 1 2 3
FIT Event BrokerGoals and challenges • Overarching goals: • To provide support for trustworthy and resilient monitoring of cloud/datacenter infrastructures • To achieve improved Quality of Protection without neglecting Quality of Service (performance) needs • Some specific challenges: • Deal with large flows of information (events) • Support different kinds of events (e.g. different criticality) • Low intrusiveness and easy integration
FIT Event Broker Assumptions • System entities: • Probes, event collectors/brokers, consoles • Some event processing may be done by collectors • Fully connected network • E.g., all the entities lie in the same monitoring VLAN • Partially synchronous system • Clocks may be used to timestamp events • Faults • Some FIT brokers may crash or fail in a Byzantine way • We do not require/enforce clients (probes/consoles) to be correct • If this is a problem for monitoring, then it must also be solved
FIT Event Broker Baseline design options • Topic-based Publish-Subscribe paradigm • Good fit to considered scenarios • State Machine Replication • Active replication is better for Byzantine fault tolerance • f out of n replicas of a FIT Broker may fail in a Byzantine way • Public-key cryptography • Client authentication, avoid attacks from malicious probes • Event channels with support for QoP and QoS • Differentiated fault-tolerance support (e.g. crash only or BFT)
FIT Event BrokerInterface Create event channel In: TAG and CLASS Destroy event channel In: TAG Register to channel In: TAG Publish event In: EVENT Subscribe to channel In: TAG Receive event Out: EVENT
FIT Event BrokerInternal state • From the SMR perspective, it is important to identify the relevant state that needs to be maintained consistent across replicas • Data related to the broker configuration • Existing channels and their CLASS • Registered publishers and subscribers • Data related to events • Events that are ready to be delivered • All client input that affects the state of the FIT broker state (e.g. channel and subscription data, some events) must be handled as a state machine command
BFT-SMaRtOverview • Java-based platform for BFT SMR, available at http://code.google.com/p/bft-smart/ • Actively being developed and improved in our group • BFT SMR “common” features • State machine programming model • n ≥ 3f+1 replicas required • A small step away from being a commercial product • Advanced features • Replica recovery (state transfer) • Reconfigurations • Extensible API: e.g. custom voter
BFT-SMaRtServiceinvocation FIT Broker state Agreementonorder performedbySMaRt PROBE
BFT-SMaRtExecutionand response Commands are delivered to the FIT broker, whichupdatesthestate/queues andreplies Votingonclient side
BFT-SMaRtImplementation& Evaluation • TheFIT Broker iscurrentlybeingimplemented… • …andintegratedwith BFT-SMaRt • Evaluation: • Throughput • Aimis to dealwith 40K events/sec • Resilience • Measure performance underattack • Verifyrecoveryandreconfigurationcapabilities • A simple demo isavailable
BFT-SMaRtImplementation& Evaluation • Preliminaryresultsavailable [DAIS 2012] Throughput for up to 100 channels
Summary • FIT Event Broker – Event dissemination support • For easier deployment of multiple monitoring tools • Manage which events are propagated, to which consoles, with which QoS • BFT-SMaRT – Byzantine fault tolerant replication • First usable implementation of BFT replication • Leading edge worldwide • Resilience against malicious attacks with small overhead • Portugal Telecom’s cloud infrastructure is being used as real use case for application and evaluation of the work