200 likes | 215 Views
Transactions in NSO. NSO 4.4, April 2017. Overview of the NSO Transaction Processing. Operator Logs In. Operator logs in and starts a session. The session is connected to a new (empty) transaction with the Running data store as the backend.
E N D
Transactions in NSO NSO 4.4, April 2017
Overview of the NSO Transaction Processing Operator Logs In Operator logs in and starts a session. The session is connected to a new (empty) transaction with the Running data store as the backend. A transaction is a change set relative something else, the backend. In the case here, the Running data store is the backend. Transaction <empty> Running x=3y=falsez=1.2.3.4
First Change Operator Logs In Operator change: x=15 Operator makes a change. The transaction is updated with the changed value. Within the transaction, reading x gives "15". Reading y gives "false". Reading z gives "1.2.3.4". Other applications, reading from the Running data store, will see that x is "3". Transaction x=15 Running x=3y=falsez=1.2.3.4
Changing the Same Element Again Operator Logs In Operator change: x=17 Operator changes the same element again. The transaction is updated with the changed value, x=15 is overwritten with x=17. Elements in a particular transaction can only have a single value. There is no concept of time inside a transaction, no before and after. A transaction is a set of changes, not a sequence. Transaction x=17 Running x=3y=falsez=1.2.3.4
More Changes Operator Logs In Operator change: x=17 Operator change: y=truez=1.2.3.4 Operator changes some more elements. The transaction is updated with the additional changes. This is still the same, single transaction. Setting an element in the transaction to the same value as in the backend has no effect. That is no change. Transaction x=17y=true Running x=3y=falsez=1.2.3.4
Commit: This is the topic of this presentation Operator Logs In Operator change: x=17 Operator change: y=truez=1.2.3.4 Operator commit The transaction manager kicks off the commit sequence when the operator requests the transaction to be committed. What NSO does during the commit sequence is the main topic of this presentation. Transaction x=17y=true Running x=3y=falsez=1.2.3.4
Running x=17y=truez=1.2.3.4 Possible Outcomes: OK or Fail Operator sees"OK" Operator Logs In Operator change: x=17 Operator change: y=truez=1.2.3.4 Operator commit Operator sees"Fail" Before we dive into the commit sequence, we should understand there are two possible outcomes of a transaction: OK: The transaction was processed without error and all of it has been acted upon. Fail: The transaction failed due to an error, and no part of it has been acted upon. In principle it should not be possible to observe a failed transaction for an outside observer. Running x=3y=falsez=1.2.3.4
Running x=3y=falsez=1.2.3.4 A closer look at theCommit Sequence Operator: x=17 y=truez=1.2.3.4 Transaction x=17y=true Lock TransHooks Validate Prepare Commit Notify Unlock The main stages of a transaction (according to literature) is Prepare-Commit (for simple two-phase transactions)or Prepare-Commit-Confirm (for three-phase transactions). NSO extends these phases with a few more steps to lay the groundwork and deal with changes afterwards.
Running x=3y=falsez=1.2.3.4 Lock Running Transaction x=17y=true Lock TransHooks Validate Prepare Commit Notify Unlock Locking, so that no other changes are under way while we are processing this transaction, is key to a simple programming model. On the other hand, this constraint limits the maximum throughput (transactions per minute) of the system.
Running x=3y=falsez=1.2.3.4 Service Create Transaction x=17y=true Lock TransHooks Validate Prepare Commit Notify Unlock The transaction manager computes the Undo information for each service instance, and injects it into the service data. Cust. Hook/ Transform RFSPre-validation RFSHook Undo #1 delete m Service #2Create … Service #1Create Undo #2 delete p The RFS Hook captures the changes for each modified service instance in a separate transaction-in-transaction, with the operator's transaction as back end. Service #1 m=3 Service #2 p="auto"
Running x=3y=falsez=1.2.3.4 Service #2 p="auto" <undo #2> Validate Transaction x=17y=true Service #1 m=3 <undo #1> Lock TransHooks Validate Prepare Commit Notify Unlock Validation has to happen after all transaction hooks have run. Transaction hooks can (and typically do) update the contents of the transaction, and we need to validate the final contents of the transaction, after all changes are done. YANGValidation CustomValidation
Running x=3y=falsez=1.2.3.4 Service #2 p="auto" <undo #2> Prepare Phase Transaction x=17y=true Service #1 m=3 <undo #1> Lock TransHooks Validate Prepare Commit Notify Unlock If HA is enabled, the same journal records that are written to disk are sent to standby nodes. CDB writes down all its changes to the disk journal, except the final transaction complete mark. CDB NEDManager CustomDatabase Send all NETCONF commands to the device's candidate store and validatethe new configuration.Did it work? Send all CLI commands to the device. This will validate and activatethe new configuration. Did it work? TransactionNED … No-transNED NETCONFDevice CLIDevice
Running x=3y=falsez=1.2.3.4 Point of No Return Commit or Abort? Transaction x=17y=true Lock TransHooks Validate Prepare Commit Notify Unlock Abort Unlock The moment of truth: If any transaction participant returns failure, take the abort path. Otherwise proceed with Commit. This is the point of no return. This is when we decide if the transaction went through or not. We will not (cannot) change our minds after this. This is defined by standard transaction theory.
The transaction manager updates running and creates rollback file Running x=17, y=true, z=1.2.3.4 p="auto", m=3 <undo#1>, <undo#2> Commit Phase Rollback #4711 x=3y=false Lock TransHooks Validate Prepare Commit Notify Unlock If HA is enabled, the transaction complete mark is also sent to standby nodes. CDB writes the final transaction complete mark to the disk journal. CDB NEDManager CustomDatabase All CLI commands already sent to the device and activated. So nothing to do. Send commit command to the device. This will activatethe new configuration. TransactionNED … No-transNED NETCONFDevice CLIDevice
Running x=17, y=true, z=1.2.3.4 p="auto", m=3 <undo#1>, <undo#2> Notify Lock TransHooks Validate Prepare Commit Notify Unlock Kickers and subscribers are informed about the changes (they care about) in the transaction. Kickers and subscribers cannot reverse the transaction (we are past the point of no return). KickerManager CustomSubscribers Subscriber seq=75 … Subscriber seq=50 Kicker client #2 … Kicker client #1
Running x=17, y=true, z=1.2.3.4 p="auto", m=3 <undo#1>, <undo#2> Unlock Running Lock TransHooks Validate Prepare Commit Notify Unlock After unlock, the next transaction in line can start processing. At any one time, only one transaction can be within the lock region (to keep the programming model sane). It is therefore very important that the time spent there is as short as possible. The lion's share of the time spent is almost always in waiting for devices to accept the configuration change and report back. For some devices with complicated CLIs, the time to compute the right sequence of CLIs to send may also be significant. If those things could be done outside the lock, the throughput could increase manifold. This is indeed possible. How commit queues accomplish this is explained later in this presentation.
Running x=3y=falsez=1.2.3.4 Service #2 p="auto" <undo #2> Commit Queue:Prepare Transaction x=17y=true Service #1 m=3 <undo #1> Lock TransHooks Validate Prepare Commit Notify Unlock Many NEDs need to read more than the changed data in order to generate the appropriate commands on the device. When commit queues are enabled, the NED Manager prepares a queue item for each involved device. Each queue item consists of the device specific change set and a snapshot handle to enable reading from Running as it looks at the time of queueing. NEDManager Q Item #4711 : 2 Q Item #4711 : 1 Dev. Changes p="auto" Snapshot This version of Running The snapshot database tracks what's on the device now. Running reflects the all the committed data, including later queue items. NETCONFDevice CLIDevice
Running x=17, y=true, z=1.2.3.4 p="auto", m=3 <undo#1>, <undo#2> Commit Queue:Commit Lock TransHooks Validate Prepare Commit Notify Unlock The NED manager places the queue items on the device queues. Once queue items are placed on the device queues, they are being sent to devices regardless of new transactions being committed, aborted, etc. Each queue item is handed to the respective NED for delivery to the device. NEDManager Dev. Changes p="auto" Snapshot This version of Running #4711 #4711 NETCONFDevice CLIDevice
Commit Queue:Execute Queue items are being processed asynchronously to other activities in the system. Commit queues will not give transactional integrity, but change sets coming from the same transaction are sent out at roughly the same time to all participating devices. Execute #4724 #4724 If a device fails to accept the change, that means the device will be marked as out of sync, and no further processing of that device queue will take place until situation cleared. #4716 Dev. Changes p="auto" Snapshot This version of Running #4711 #4711 NED NED NETCONFDevice CLIDevice
Running x=3y=falsez=1.2.3.4 Service #2 p="auto" <undo #2> Rainy day scenario:Abort Transaction x=17y=true Service #1 m=3 <undo #1> Lock TransHooks Validate Prepare Abort Notify Unlock This is why we like transactional devices and protocols so much. Uh-oh, we've already activated the new configuration, so we need to revertfrom it. Service disruption may already have occurred. The NED canask NSO foran inverse transaction, or a set of inverse CLI commands. NEDManager CustomDatabase Send abort command to the device. This will discardthe new configuration. It was never live, so zero disruption to services. TransactionNED … No-transNED Inverse Transaction x=3y=false delete m, p NETCONFDevice CLIDevice