Split Brain Detection Version 00

Split Brain DetectionVersion 00 Nigel Bragg September 4th , 2012

Introductionfrom :- new-haddock-RNNI-split-brain-avoidance-1210-v1.pdf A “split-brain” situation arises when : • In normal operation, two (or more) devices depend upon a control path to coordinate their operation such that they function as a single virtual entity with a single identity; and • Upon failure of the common control path, the two (or more) devices operate independently but • Each assumes the full functionality of the single virtual entity; and/or • Each continues to use the identity of the single virtual entity. • Split-brain issues are avoided if the solution is designed so that conditions 2a and 2b do not occur. • There are two general approaches to achieving this.

Approach A: Easy Split-Brain Avoidance • Prevent condition 2b by: • Assuring that all devices, or all but one pre-determined device, always switch to a unique identity (different from the identity of the single virtual device) upon failure of the control path. • Prevent condition 2a by either: • Assuring one and only one device assumes the full functionality of the single virtual device upon failure of the control path; or • Assuring that each device deterministically assumes a subset of the functionality that does not overlap or conflict with the subset assumed by another device. • Link Aggregation, using the standard protocol without any changes running across the NNI, achieves this. • Characterized as “easy” because this approach does not require distinguishing whether a node failure or a link failure resulted in the loss of the control path.

Approach B: Hard Split-Brain Avoidance • Prevent condition 2b by: • Assuring that one and only one device continues to operate with the identity of the single virtual device upon failure of the control path. • Note that with hard split-brain avoidance there is always one device continuing to operate with the identity of the single virtual device, whereas with easy split-brain avoidance there may or may not be a device that continues to operate with the identity of the single virtual device. • Prevention of condition 2a: • The options for prevention of condition 2a are the same for both easy and hard split-brain avoidance. This is because once the identity issue is resolved, there are many possible ways to resolve the division of functionality. • Characterized as “hard” because this approach requires distinguishing whether a node failure or a link failure resulted in the loss of the control path.

The reference model :- Two Systems with Distributed Aggregation System A System B Port Port Port Port Port Port Port Port (possible) Network Link Intra-Portal Link (could be virtual) Network Link Network Link Gateway Link (virtual) Gateway Link (virtual) Emulated System C Port Port Port Port Port Port Each Network Port on System A advertises: Actor_System = A Actor_Key = Ax A Port ID for each port unique within A Each Network Port on System B advertises: Actor_System = B Actor_Key = Bx A Port ID for each port unique within B Each (non Gateway) Port on System C advertises: Actor_System = C Actor_Key = Cn A Port ID for each port unique within C Where Cn is the same value on all of the ports,

Split Brain Detection (1) ? 2 It is desirable to solve the “hard” split brain problem toensure that a portal continues to operate as a single virtual device whichever node within it might fail, • which in turn requires that we have a robust way of determining that a node has failed, and not just been partially disconnected. Assertion • it is necessary to check for node reachability by allpossible paths before being entitled to regard it as dead So • normal “keep-alive” can be limited to run on the inter-DAS link (1), but if that fails (e.g. from the PoV of A seeking to establish the reachability of B above) • we need to probe for network connectivity between A and B (2), and • we need to ascertain reachability of B via the DRNI (3) If B is unreachable by all routes, it doesn’t matter if it has failed or not. A B X X 1 C DRNI 3 DRNI W X X Y Z

Split Brain Detection (2) ? 2 If inter-DAS link (1), fails (e.g. from the PoV of Aseeking to establish the reachability of B above) • We need to probe for network connectivity between A and B (2) – should be straightforward : • LBM from MEP(Sys ID A)  MEP(Sys ID B) ? • We need to ascertain the reachability of B via the DRNI (3) : • it is not clear now to probe B directly from A(and be sure to use all the links (3) of the DRNI), • W may believe all links are a distributed LAG – poisoned reverse, so propose : • A could harvest from W the full list of Port IDs being offered by C : • and need to request that that this information is “fresh”, • but the mechanism must also handle a dual-homed legacy real node W : • is there a mechanism to allow this ? What then ? A B X 1 X C DRNI 3 DRNI W X X Y Z

Split Brain Detection (3) ? 2 What then ? • “Assure that one and only one device continues to operate with the identity of the single virtual device on failure of the control path”. • If a Node sees zero connectivity to its “mate” Node, it picks up the DRNI identity C; • If a Node has lost the inter-DAS link (1) andconnectivity via its own network (2), • but some physical connectivity to its “mate” is advertised by W over the DRNI (3), • or that information is not available : – and so we must assume that connectivity exists, then the network behind A and B is severed : • Node A reverts to its “real” LAG parameters as A, • or would it be less disruptive to run its part of Cusing “last agreed parameters” ? • Else use own network (2) to negotiate roles, or exchange DRCP messages A B X 1 X C DRNI 3 DRNI W X X Y Z 123 o 000 a) 001 b) 010 c) 011 c)

Split Brain Detection Version 00