200 likes | 577 Views
iSCSI Extensions for RDMA (iSER). draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004. Agenda. What is iSER? iSER connection setup Open issues iSER flow control Open issues. iSCSI Datamover with RDMA Extensions. SCSI.
E N D
iSCSI Extensions for RDMA (iSER) draft-ko-iwarp-iser-02 Mike Ko IBM August 2, 2004
Agenda • What is iSER? • iSER connection setup • Open issues • iSER flow control • Open issues M. Ko
iSCSI Datamover with RDMA Extensions SCSI • The Datamover Architecture defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol • Allows a datamover protocol layer to offload the tasks of data movement and placement from the iSCSI layer • The iSCSI Extensions for RDMA (iSER) protocol is one such datamover protocol • Applies the Datamover Architecture in extending the data transfer capabilities of iSCSI to include RDMA (Remote Direct Memory Access) as defined in the iWARP protocol suite • Allows iSCSI implementations to have data transfers which achieve true zero copy behavior using generic RDMA network interface controllers (RNICs) iSCSI Datamover Interface iSER Verbs RDMAP DDP iWARP MPA TCP M. Ko
Connection Setup for iSER-assisted Modeat the Initiator • Negotiated key values may be passed by the iSCSI layer to the iSER layer by invoking the Notice_Key_Values Operational Primitive • Before sending the final Login Request, the iSCSI layer invokes the Allocate_Connection_Resources Operational Primitive to request the iSER layer to allocate the iWARP resources for the connection • After the target returns the final Login Response, the iSCSI layer at the initiator invokes the Enable_Datamover Operational Primitive to request the iSER layer to transition into iSER-assisted mode • The first message sent by the iSER layer at the initiator to the target is the iSER Hello Message M. Ko
Connection Setup for iSER-assisted Modeat the Target • Negotiated key values may be passed by the iSCSI layer to the iSER layer by invoking the Notice_Key_Values Operational Primitive • Before sending the final Login Response, the iSCSI layer invokes the Allocate_Connection_Resources Operational Primitive to request the iSER layer to allocate the iWARP resources for the connection • The iSCSI layer invokes the Enable_Datamover Operational Primitive to enable the iSER mode qualified with the final Login Response PDU • The iSER layer sends the final Login Response PDU in byte stream mode and then transitions into iSER-assisted mode • After receiving the iSER Hello Message from the initiator, the iSER layer at the target responds by sending the iSER HelloReply Message M. Ko
Example of Successful iSER Connection Setup target A. SCSI Login Request PDU with RDMAExtensions=Yes B. SCSI Login Response PDU with RDMAExtensions=Yes C. Optional Notice_Key_Values to pass values of negotiated keys D. Allocate_Connection_Resources to set up iWARP resources E. SCSI Login Request PDU with T=1 and NSG=FullFeaturePhase F. Enable_Datamover to go into iSER mode (* = send last iSCSI PDU in byte stream mode) G. SCSI Login Response PDU in byte stream mode with T=1 and NSG=FullFeaturePhase H. iWARP Send Message containing iSER Hello J. iWARP Send Message containing iSER HelloReply initiator iSCSI Layer iSER Layer iSER Layer iSCSI Layer A B . . . C D E C D F* G F H J M. Ko
Negotiation of RDMAExtensions in Leading Connection Only • From section 2.3 of iSER draft: “iSER-assisted mode is negotiated during the iSCSI Login for each connection, but an entire iSCSI session MUST operate in one mode ...” • Question: Since RDMAExtensions is leading-only, this statement is incorrect • Proposed change: • Replace the sentence with “iSER-assisted mode is negotiated during the iSCSI Login for each session, and an entire iSCSI session MUST operate in one mode ...” M. Ko
CRC32C Protection in the Layer Below iSER • From section 5.1 of iSER draft: “when the RDMAExtensions key is negotiated to "Yes", the HeaderDigest and the DataDigest keys MUST be negotiated to "None" ... because ... the iWARP protocol suite provides a CRC32c-based error detection for all iWARP Messages” • Recent updates to the MPA draft renders the use of CRC optional • “Disabling of CRCs should only be done when it is clear that the connection through the network has data integrity at least as good as a CRC” • RDDP WG’s position is that all ULPs can assume CRC level or equivalent data protection • Proposed change: Add the explicit requirement that end-to-end CRC32C based error detection or equivalent be provided in a layer below iSER M. Ko
Order of RDMAExtensions Key Negotiation and Allocate_Connection Resources • From section 5.1.1 (and similarly for section 5.1.2): “If the outcome of the iSCSI negotiation is to enable iSER-assisted mode, then on the initiator side, ... the iSCSI Layer MUST invoke the Allocate_Connection_Resources Operational Primitive” • Question: The alternative approach of invoking Allocate_Connection_Resources before negotiating for iSER-assisted mode should be allowed • Current approach results in the connection being torn down if the required resources cannot be allocated • Alternative approach avoids this problem • Resources must be deallocated if login fails • Resources may have to be deallocated if the negotiated values are less than the allocated value • Proposed change: Update the draft to allow the alternative approach with the proviso that it is the responsibility of the implementation to deallocate the resources if the login fails or if the negotiation values are less than the allocated value M. Ko
Clarification on the Usage of the Notice_Key_Values Primitive • From section 5.1.1: “Optionally, the iSCSI Layer MAY invoke the Notice_Key_Values Operational Primitive before invoking the Allocate_Connection_Resources Operational Primitive” • Question: The word “optionally” is ambiguous • Could mean the iSCSI layer may choose to invoke the primitive • Or the iSCSI layer may choose to use that primitive, or some other defined or undefined primitive • Proposed change: Remove the word “optionally” M. Ko
Requiring the Use of the Notice_Key_Values Primitive • From section 5.1.1: “The iSCSI Layer MAY invoke the Notice_Key_Values Operational Primitive” “to request the iSER Layer to take note of the negotiated values of the iSCSI keys for the Connection” • Question: The word “MAY” should be replaced with “MUST” to enforce the invocation of the primitive • Proposed change: None • If the default values are accepted for all the negotiated keys, then there is no new information to be passed from the iSCSI layer to the iSER layer • Requiring a "MUST" instead of a "MAY“ would require this primitive be invoked even though it is not necessary • Also, it is not architecturally required for the iSCSI layer to issue the Notice_Key_Values primitive M. Ko
HeaderDigest, DataDigest, OFMarker, & IFMarker in iSER-assisted Mode • From section 6.1 and 6.6: These 4 keys must be negotiated to “none” or “no” if the RDMAExtensions key is negotiated to “yes” • Question: Draft seems to imply that these 4 keys must be negotiated even for the defaults • Suggestion: Negotiations resulting in RDMAExtensions=Yes for a session implies HeaderDigest=None, DataDigest=None, OFMarker=No, and IFMarker=No on all connections in that session • Override both the default and explicit settings • Proposed change: Update the draft to reflect the suggested change M. Ko
Scope of RDMAExtensions Key • From section 6.3: RDMAExtensions key has session-wide scope • Question: Should iSER support mixed mode sessions • Argument for: • Open an iSCSI connection when there are insufficient resources to support an iSER-assisted connection in allegiance reassignment and the session is in iSER-assisted mode • Flexibility on general principles • Argument against: • RFC 3720 assumes homogeneous connections in a session • Introducing mixed mode sessions would require that the RFC3720 semantics be carefully thought through to ensure correctness • The task states maintained by an iSCSI connection may be different from those for an iSER-assisted connection • iSER-assisted connection may require different LO key values for optimization compared with iSCSI connection • Test and debug effort will increase 2x to 3x for mixed mode support • Proposed change: None M. Ko
Clarification on the Order of RDMAExtensions Key Negotiation • From section 6.3: “If the RDMAExtensions key is to be negotiated, it must be offered only on the initial Login Request PDU or Login Response PDU of the leading connection, and if offered, the response must be sent in the immediately following Login Response or Login Request PDU respectively.” • Question: Clarify when the negotiation response is to be returned if the key is offered in a PDU where the C-bit is set • Question: Clarify that the negotiation takes place in the LoginOperationalNegotiation stage of the leading connection • Question: Section 5.2.2 of RFC3720 states that a response is optional if the Boolean function is "AND" and the value "No" is received • iSER draft always requires a response to be returned • However, since the default for RDMAExtensions is “no”, it is unlikely that the key-value pair of RDMAExtensions=no will be offered M. Ko
Clarification on the Order of RDMAExtensions Key Negotiation (cont.) • Proposed change: Replace sentence with “However, if the RDMAExtensions key is to be negotiated, an initiator MUST offer the key on the first Login Request PDU in the LoginOperationalNegotiation stage of the leading connection, and a target MUST offer the key on the first Login Response PDU with which it is allowed to do so (i.e., the first Login Response issued after the first Login Request with the C bit set to 0)in the LoginOperationalNegotiation stage of the leading connection. In response to the offered key=value pair of RDMAExtensions=yes, an initiator MUST respond on the next Login Request PDU with which it is allowed to do so, and a target MUST respond on the next Login Response PDU with which it is allowed to do so.” M. Ko
Order of RDMAExtensions Key Negotiation Response • From section 6.3: RDMAExtensions key must be offered for negotiation in the first PDU that a node is allowed to do so and the response must be returned in the immediately following PDU in which a node is allowed to respond • Question: Why must the RDMAExtensions key be negotiated first? • Negotiating the RDMAExtensions key first allows a node to optimally negotiate the value of other keys • Certain iSCSI keys such as MaxBurstLength, MaxOutstandingR2T, ErrorRecoveryLevel, InitialR2T, ImmediateData, etc., may have different optimization points depending on whether iSER-assisted mode is to be enabled in the iSCSI session • Proposed change: Update the draft to include the rationale for the order requirement M. Ko
Key Ordering Within a PDU • From section 6.3: “The [RDMAExtensions] key must precede any other login keys which may be affected by the outcome of the negotiation of the RDMAExtensions key” • Question: This can be interpreted as requiring key ordering within a PDU which is contrary to RFC3720 • Proposed change: Remove the sentence from the draft M. Ko
iSER Flow Control • For RDMA Send Type Messages • The iSER protocol does not provide additional flow control beyond that provided by the iSCSI layer on control-type PDUs • An implementation should be able to take advantage of iWARP Verbs mechanisms such as the Shared Receive Queue mechanism to effectively address the Send Message flow control question • For RDMA Read Resources • In the iSER Hello Message, the iSER layer at the initiator declares the maximum number of RDMA Read Requests that the initiator can receive on the particular RDMAP Stream (iSER-IRD) to the target • This allows the iSER layer at the target to adjust its resources if it can issue more RDMA Read Requests than the initiator can handle • In the iSER HelloReply Message, the iSER layer at the target declares the maximum number of RDMA Read Requests that the target can issue on a particular RDMAP Stream (iSER-ORD) to the initiator • This allows the iSER layer at the initiator to adjust its resources if it can handle more RDMA Read Requests than the target can issue • The iSER layer at the target will flow control the RDMA Read Request Messages to not exceed iSER-ORD M. Ko
Flow Control for Control-Type PDU • From section 8.1: “The iSER Layer SHOULD provision enough Untagged buffers for handling incoming RDMAP Send Message Types to prevent a buffer underrun condition” • Question: Should some form of send side flow control be established for iSCSI control-type PDUs? • Latest DDP draft, draft-ietf-rddp-ddp-02, no longer mandates that a DDP stream be disabled for a buffer underrun condition • Proposed change: Further discussion is needed M. Ko