370 likes | 1.15k Views
Developing Highly Available Multipath Solutions and Device-Specific Modules. Jaivir Aithal Senior Software Development Engineer Device & Storage Technologies jaivira@microsoft.com. AGENDA. Microsoft Multipath IO (MPIO) Deployment and Configuration
E N D
Developing Highly Available Multipath Solutions and Device-Specific Modules • Jaivir Aithal • Senior Software Development Engineer • Device & Storage Technologies • jaivira@microsoft.com
AGENDA • Microsoft Multipath IO (MPIO) Deployment and Configuration • Key Enhancements for Windows Server 2008 R2 • Configuration in the absence of storage • Performance optimizations • Health monitoring • Best Practices for MPIO tuning • Registry settings • Tips & Tricks for Device-Specific Module (DSM) writers • How to get a DSM to best work with MPIO’s management UI • Lessons learned through the Microsoft DSM (MSDSM) • Common pitfalls for DSM writers and tips for how to address them.
MPIO Deployment and Configuration • MPIO Optional Component (OC) • Using dism.exe • dism /online /quiet /enable-feature:MultipathIo • Claiming DSM Support • Using MSDSM vs. Vendor DSM • SPC-3 compliance • Migration requirements • Registry restrictions • HKLM\System\CurrentControlSet\Services\<DSM>\Parameters • x86 vs. x64 • System class vs. SCSI Adapter class • Driver signing
DSM Installation Determine OS Server 2008 or upwards Y Use HardwareIDRoot\MPIO N MPIO installed? Windows Server 2008? N N Use HardwareIDDetected\MPIO Enable Optional Component using DISM Y Y Install MPIO, DSM & MPDEV Enable Optional Component using PKGMGR Restart storage stack Install only the DSM
Enabling Pre-Configuration • Problem Definition • Ability to configure multipath settings without requirement for external storage to be physically attached • Scenarios • Datacenter automation (preconfigure servers, connect storage later) • Configuration utility that sets tunables • Management utility that sets operation settings • Architecture changes • WMI registration by MPIO Control object (FDO) • WMI registration piggy-backing on pseudo-LUN (PDO) • Supported only on Windows Server 2008 R2 and upwards
DSM Changes Required • Implementation Details • MOF changes • Distinguish DSM-centric classes from Device-centric ones • Split WMI classes into two files to avoid common mistakes • Generate the binary data during compile time • Remember to specify the resource name of the new binary MOF • Registration details • Update DsmType to DsmType5 • Pass the structure size as the size of the updated DSM_INIT_DATA • Specify DSM-centric WMI GUIDs using DsmWmiGlobalInfo • Continue specifying Device-centric GUIDs using DsmWmiInfo
DSM-centric MOF example – msdsmdsm.mof • // This is information that should be available even if no storage is physically present. • // • // Example: Supported devices list class. • [WMI, • Dynamic, • Provider("WmiProv"), • Description("Retrieve MSDSM's supported devices list.") : amended, • Locale("MS\\0x409"), • guid("{c362d67c-371e-44d8-8bba-044619e4f245}")] • class MSDSM_SUPPORTED_DEVICES_LIST • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of supported devices.") : amended] uint32 NumberDevices; • [WmiDataId(2)] uint32 Reserved; • [WmiDataId(3), • read, • MaxLen(31), • Description("Array of device hardware identifiers.") : amended, • WmiSizeIs("NumberDevices") • ] string DeviceId[]; • };
Device-centric MOF example – msdsm.mof • // This is information that pertains to a specific instance of the device. Here’s an example: • // • // Embedded basic-statistics class. • [WMI, guid("{a34d03ec-6b0b-46a1-9178-82525f41133f}")] • class MSDSM_DEVICEPATH_PERF • { • [WmiDataId(1)] uint64 PathId; • [WmiDataId(2)] uint32 NumberReads; • [WmiDataId(3)] uint32 NumberWrites; • [WmiDataId(4)] uint64 BytesRead; • [WmiDataId(5)] uint64 BytesWritten; • }; • // Statistics provider class • [WMI, Dynamic, Provider("WmiProv"), Description("Retrieve MSDSM Performance Information.") : amended, Locale("MS\\0x409"), guid("{875b8871-4889-4114-93f6-cd064c001cea}")] • class MSDSM_DEVICE_PERF • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of paths.") : amended] uint32 NumberPaths; • [WmiDataId(2), read, Description("Array of Performance Information per path for the device.") : amended, • WmiSizeIs("NumberPaths“)] MSDSM_DEVICEPATH_PERF PerfInfo[]; • };
DSM WMI Registration • typedefstruct _DSM_INIT_DATA { • // Size, in bytes. • ULONG InitDataSize; • // DSM entry points. • DSM_INQUIRE_DRIVER DsmInquireDriver; • . . . • DSM_BROADCAST_SRB DsmBroadcastSrb; • // Wmi entry point and guid information. • DSM_WMILIB_CONTEXT DsmWmiInfo; • // Version 2 starts here... • DSM_TYPE DsmType; • . . . • // Version 5 starts here... • // Wmi entry point and guid information for DSM-centric classes. • DSM_WMILIB_CONTEXT DsmWmiGlobalInfo; • } DSM_INIT_DATA, *PDSM_INIT_DATA;
DSM-centric WMI Registration • // DsmTypeUnknown == mustn't be used. • // DsmType1 == first version • // DsmType2 == indicates that DSM uses InterpretErrorEx() and handles WMI calls with • // DSM_IDS passed in as extra parameter • // DsmType3 == indicates that DSM handles cases where completion routine can be called with NULL DsmId • // DsmType4 == indicates that DSM provides version info • // DsmType5 == indicates that DSM provides additional DSM-centric (global) WMI classes • // DsmType6 == not used • typedefenum _DSM_TYPE { • DsmTypeUnknown = 0, • DsmType1, • DsmType2, • DsmType3, • DsmType4, • DsmType5, • DsmType6 • } DSM_TYPE, *PDSM_TYPE; • #define DSM_INIT_DATA_TYPE_1_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, Reserved)) • #define DSM_INIT_DATA_TYPE_2_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmType)) • #define DSM_INIT_DATA_TYPE_3_SIZE DSM_INIT_DATA_TYPE_2_SIZE • #define DSM_INIT_DATA_TYPE_4_SIZE (RTL_SIZEOF_THROUGH_FIELD(DSM_INIT_DATA, DsmVersion)) • #define DSM_INIT_DATA_TYPE_5_SIZE (sizeof(DSM_INIT_DATA))
Performance Enhancements • Improvements in Core MPIO stack • Elimination of unnecessary use of spinlocks • Conversion of a spinlock into a reader-writer lock • Minimizing unnecessary memory write operations • Re-laying members of a data structure to minimize CPU reads • MSDSM enhancements • Make gathering statistics optional • Eliminate unnecessary use of processor-intensive operations • New load balance policy: Least Blocks • Performance Gains in the MPIO stack (i.e. mpio.sys and msdsm.sys) • Preliminary results indicate up to 15% improvement on certain configuration under certain loads. (However, pre-Beta and Beta builds might not indicate what is expected for RTM performance numbers.)
MPIO Health Monitoring • Common Interface for basic statistical data • Querying interface is WMI • Granularity at three levels • LUN • Path • Device Instance (i.e. LUN-Path pairing) • Health packets maintained even after monitored entity has gone offline • Potential advantages • Improve diagnosability • Reduce DSM’s overhead for maintaining these counts • Consumers can implement custom triggers • Consistent interface for management applications, regardless of underlying DSM
MPIO Health Monitoring WMI event Reads Consumer A Writes MPIO Path Failures IO Errors Consumer B WMI event Retries
Health Monitoring WMI Class For LUN • // Embedded Disk Health Class • [WMI, guid("{6453c476-0499-42ab-9825-5133282b0b56}")] • class MPIO_DISK_HEALTH_CLASS • { • [WmiDataId(1), read, Description("Number of read requests sent to this device.") : amended] uint64 NumberReads; • [WmiDataId(2), read, Description("Number of write requests sent to this device.") : amended] uint64 NumberWrites; • [WmiDataId(3), read, Description("Cumulative number of bytes read by requests sent to this device.") : amended] uint64 NumberCharsRead; • [WmiDataId(4), read, Description("Cumulative number of bytes written by requests sent to this device.") : amended] uint64 NumberCharsWritten; • [WmiDataId(5), read, Description("Number of requests sent to this device that were retried.") : amended] uint64 NumberRetries; • [WmiDataId(6), read, Description("Number of requests sent to this device that failed.") : amended] uint64 NumberIoErrors; • [WmiDataId(7), read, Description("System time at which this health packet was created for this device.") : amended] uint64 CreateTime; • [WmiDataId(8), read, Description("Number of path failures experienced by this device.") : amended] uint64 PathFailures; • [WmiDataId(9), read, Description("System time at which this device went offline/failed.") : amended] uint64 FailTime; • [WmiDataId(10), read, Description("Flag that indicates if the device is offline/failed.") : amended] booleanDeviceDisabled; • [WmiDataId(11), read, Description("Count of the number of times that the NumberReads field wrapped.") : amended] uint8 NumberReadsWrap;
Health Monitoring WMI Class For LUN – Contd. • [WmiDataId(12), read, Description("Count of the number of times that the NumberWrites field wrapped.") : amended] uint8 NumberWritesWrap; • [WmiDataId(13), read, Description("Count of the number of times that the NumberCharsRead field wrapped.") : amended] uint8 NumberCharsReadWrap; • [WmiDataId(14), read, Description("Count of the number of times that the NumberCharsWritten field wrapped.") : amended] uint8 NumberCharsWrittenWrap; • [WmiDataId(15), read] uint8 Pad1[3]; • }; • // Provider Health Information Class • [WMI, Dynamic, Provider("WmiProv"), Description("MPIO Psuedo-LUN Health Information.") : amended, • Locale("MS\\0x409"), guid("{ef04568a-782b-443c-a3db-966ab43775f9}")] • class MPIO_DISK_HEALTH_INFO • { • [key, read] string InstanceName; • [read] boolean Active; • [WmiDataId(1), read, Description("Number of Psuedo-LUN Health Packets.") : amended] uint32 NumberPlPackets; • [WmiDataId(2), read, Description("Reserved for future use.") : amended] uint32 Reserved; • [WmiDataId(3), read, Description("MPIO Pseudo-LUN Health Info Array.") : amended, • WmiSizeIs("NumberPlPackets“)] MPIO_DISK_HEALTH_CLASS PlHealthPackets[]; • };
Health Monitoring WMI Classes – Path & Device Instance • Path Health Information • Embedded class: MPIO_PATH_HEALTH_CLASS • Provider class: MPIO_PATH_HEALTH_INFO • Device Instance Health Information • Embedded class: MPIO_DEVINSTANCE_HEALTH_CLASS • Provider class: MPIO_DEVINSTANCE_HEALTH_INFO • Health Packet Cleanup • Registry value: FlushHealthInterval • Default: 24 hours • Turning OFF Health Monitoring • Registry value: GatherHealthStats • Default: TRUE (i.e. ON)
MPIO Health Reporting – Example 1 Path Health, Disk (pseudo-LUN) Health and DeviceInstance Health Statistics
MPIO Health Reporting – Example 2 Health Statistics output after the user-specified “Health Flush” period has expired and the “orphan” Health packets (associated with failed path 000000077030001 have been discarded.
MPIO Configuration Snapshot • Uses existing WMI classes • Exports the existing MPIO configuration to a text file • Can be used by administrators for troubleshooting • Can be used by DSM writers during development and testing phases • Information written to a file in reverse chronological order (i.e. history maintained) • Default output file used: HKLM\System\CurrentControlSet\Services\mpio\Parameters, DefaultConfigOutputFile
MPIO Tunables Timer MIN MAX DEFAULT PathVerifyEnabled FALSE TRUE FALSE PathVerificationPeriod 0 MAXULONG 30s RetryCount 0 500 3 RetryInterval 0 MAXULONG 1s PDORemovePeriod 0 MAXULONG 20s Application IRP NTFS DISK DISK DISK Number of times retired <= RetryCount IRP InterpretError returns Retry = TRUE When PDORemovePeriod expires LUN continues residing in memory, waiting for a path to come back online IRP IRP PathVerify PathVerify IRP IRP IRP When PathVerificationPeriod expires B LUN LUN A DSM LUN PathVerifyEnabled DsmID(0) A, B HBA 1 HBA 0 DsmID(1) MPIO PCI PCI PNP LUN Adapter 0 Adapter 1
Getting a DSM To Work With MPIO UI // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, ... }; NTSTATUS DriverEntry(. . .) { DSM_INIT_DATA dsmInitData; // Get DSM’s version information DsmpGetVersion(&dsmInitData.DsmVersion); // Set-up the init data dsmInitData.InitDataSize = DSM_INIT_DATA_TYPE_4_SIZE; dsmInitData.DsmType = DsmType4; ... // Send the IOCTL to mpio.sys to register. DsmSendDeviceIoControlSychronous(IOCTL_MPDSM_REGISTER, ..., *dsmInitData); ... }
Getting MPIO UI To Restrict Allowable Path States // List of supported GUIDs GUID DSM_QueryLBPolicyV2GUID = DSM_QueryLBPolicy_V2Guid; ... #define DSM_QueryLBPolicyV2GUID_Index 0 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QueryLBPolicyV2V2GUID, 1, 0}, ... }; NTSTATUS DsmQueryData(...) { ... if (GuidIndex == DSM_QueryLBPolicyV2GUID_Index) { PDSM_Load_Balance_Policy_V2 LBPolicy = &(((PDSM_QueryLBPolicy_V2)Buffer)->LoadBalancePolicy); for (ULONG inx = 0; inx < DsmIds->Count; inx++) { LBPolicy->DSM_Paths[index]->Reserved = DSM_STATE_ACTIVE_OPTIMIZED_SUPPORTED; // Depending on supported states, OR them in if (activeUnoptimizedSupported) { LBPolicry->DSM_Paths[index]->Reserved |= DSM_STATE_ACTIVE_UNOPTIMIZED_SUPPORTED; } } ... }
Ensuring DSM Works With Virtual Disk Service // List of supported GUIDs GUID DSM_QuerySupportedLBPoliciesV2GUID = DSM_QuerySupportedLBPolicies_V2Guid; GUID DSM_QueryDsmUniqueIdGUID = DSM_QueryUniqueIdGuid; ... #define DSM_QuerySupportedLBPoliciesV2GUID_Index 0 #define DSM_QueryDsmUniqueIdGUID_Index 1 ... WMIGUIDREGINFO DsmGuidList[] = { {&DSM_QuerySupportedLBPoliciesV2GUID, 1, 0}, {&DSM_QueryDsmUniqueIdGUID, 1, 0}, ... }; NTSTATUS DsmQueryData(...) { ... if (GuidIndex == DSM_QueryDsmUniqueIdGUID_Index) { PDSM_QueryUniqueIddsmQueryUniqueId = Buffer; // Ensure that the 64-bit returned value will be unique dsmQueryUniqueId->DsmUniqueId = (ULONGLONG)((ULONG_PTR)DsmContext); } ... }
Avoiding Immediate LUN Tear-down Post-Initialization NTSTATUS DsmSetDeviceInfo( __in IN PVOID DsmContext, __in IN PDEVICE_OBJECT TargetObject, __in IN PVOID DsmId, __inout IN OUT PVOID *PathId ) { PDSM_DEVICE_INFO deviceInfo = DsmId; PSCSI_ADDRESS scsiAddress = deviceInfo->ScsiAddress; // It is possible that Port, Bus and Target are all zero // Ensure that the returned PathId is never zero (since MPIO // will treat that as NULL) pathId = DSM_PATHID_PREFIX; pathId <<= 8; pathId |= scsiAddress->PortNumber; // Port pathId <<= 8; pathId |= scsiAddress->PathId; // Bus pathId <<= 8; pathId |= scsiAddress->TargetId; // Target *PathId = ((PVOID)((ULONG_PTR)(pathId))); ... return status; }
Avoiding Bogus Path Flagging On Path Recovery BOOLEAN DsmIsPathActive(...) { ... // Set a flag that IsPathActive was successfully called deviceInfo->Usable = TRUE; return TRUE; } PVOID DsmLBGetPath(...) { for (inx = 0; inx < DsmList->Count; inx++) { deviceInfo = DsmList->IdList[inx]; // Don’t consider paths that aren’t yet usable if (deviceInfo->Usable == FALSE) continue; // Find the best candidate to return, even if not in A/O // Prefer: Active/Unoptimized > StandBy > Unavailable pathId = DsmpCheckIfIsBetterCandidatePath(deviceInfo,...); } return pathId; } ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONGRetryInterval){ // If SenseData indicates non-A/O path was chosen, retry IO if (addSenseQ == 0xA || addSenseQ == 0xB || addSenseQ == 0xC){ *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; } ... }
Handling IO In The Absence of Active/Optimized Path PVOID DsmLBGetPath(...) { ... // Find the best candidate to return, even if not in A/O return DsmpFindBestCandidatePath(...); } ULONG DsmInterpretErrorEx(...,PBOOLEAN Retry,PLONGRetryInterval,...) { // If SenseData indicates access state changed, or implict // transition failed, or TPG in non-active state, retry IO if((sKey==0x6 && addSn==0x2A && (aSQ==0x6 || aSQ==0x7)) || (sKey==2 && addSn==4 && (aSQ==0xA || aSQ==0xB || aSQ==0xC))){ sendTPG = TRUE; *Retry = TRUE; *RetryInterval = ALUA_STATE_CHANGE_TIME_TAKEN; errorMask = DSM_RETRY_DONT_DECREMENT; } // Send an RTPG asynchronously to get updated TPG states // If explicit-only transitions supported, this routine will // send an STPG first to make one of the TPGs Active/Optimized if (sendTPG) DsmpSetPathForIoRetryALUA(...); ... return errorMask; }
Reducing ALUA Storage Device Initialization Time NTSTATUS DsmInquire(...) { PDSM_DEVICE_INFO deviceInfo; // Represents this DeviceInstance ... // For ALUA storage, get the Target Port Groups (TPG) info status = DsmpReportTargetPortGroups(TargetDevice, ...); if (NT_SUCCESS(status) deviceInfo->IgnorePathVerify = TRUE; ... return status; } NTSTATUS DsmPathVerify(...) { ... // If storage is ALUA, and this is the first time PathVerify // is being called, we may be able to skip doing it if(deviceInfo->ALUASupport != DSM_DEVINFO_ALUA_NOT_SUPPORTED){ if (deviceInfo->IgnorePathVerify == TRUE) { status = STATUS_SUCCESS; // From now on, we should send PathVerify if asked to deviceInfo->IgnorePathVerify = FALSE; } } ... return status; }
Avoid Preventing Cluster Disk Resource Coming Online ULONG DsmCategorizeRequest(...) { if (DsmpReservationCommand(Irp, Srb)) return DSM_WILL_HANDLE; ... } NTSTATUS DsmSrbDeviceControl(...) { if (opCode == SCSIOP_PERSISTENT_RESERVE_OUT) ( status = DsmpPersistentReserveOut(...); } ... } NTSTATUS DsmpPersistentReserveOut(...) { if (serviceAction == RESERVATION_ACTION_RESERVE) { __RetryRequest: status = DsmSendRequest(...); if (!NT_SUCCESS(status) { if (Srb->SrbStatus & SRB_STATUS_AUTOSENSE_VALID && Srb->SrbStatus & SRB_STATUS_ERROR && Srb->ScsiStatus == SCSISTAT_CHECK_CONDITION) { // check if the error is retry-able if (DsmpShouldRetryPRcommand(senseData)) { goto __RetryRequest; } } } } ... }
Ensuring DSM Can Be Uninstalled Using MPIOCPL ... [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] ... AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16" ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16" ; Uninstall Section [DefaultUninstall] DelReg = Contosodsm_Delreg [DefaultUninstall.Services] DelService = contosodsm [Contosodsm_Delreg] HKLM, SYSTEM\CurrentControlSet\Control\MPDEV, MPIOSupportedDeviceList, %REG_MULTI_SZ_DELETE%, "Vendor 8Product 16“
Ensuring DSM Is Presented a Device before MSDSM NTSTATUS DriverEntry(...) { DSM_INIT_DATA dsmInitData; ... // Ensure this DSM is presented the device before MSDSM dsmInitData.Reserved = 0; ... // Send dsmInitData to mpio.sys via the IOCTL to register. DsmSendDeviceIoControlSynchronous(IOCTL_MPDSM_REGISTER, ...); ... } <File: CONTOSODSM.INF> ... [Contoso_Install.Services] AddService=contosodsm,%SPSVCINST_ASSOCSERVICE%,Contosodsm_Service [Contosodsm_Service] ... AddReg = Contosodsm_Addreg [Contoso_Addreg] HKR, Parameters, DsmSupportedDeviceList, %REG_MULTI_SZ%,\ "Vendor 8Product 16“ ; The following cannot be grouped (as above) HKLM, SYSTEM\CurrentControlSet\Control\MPDEV,\ MPIOSupportedDeviceList, %REG_MULTI_SZ_APPEND%, "Vendor 8Product 16” ...
Call To Action • Revisit existing DSM WMI classes to determine whether preconfiguration feature needs to be implemented • Assess whether any of the performance-related changes can be implemented in your DSM • Consider modifying management applications to implement new health WMI classes • Implement triggers • Implement Version 2 of the classes defined in mpioLBPo.mof • Test your storage with inbox MSDSM • Encourage adoption of SPC-3 ALUA for your storage
RESOURCES • Web Resources • Microsoft Storage Technologies - Multipath I/Ohttp://www.microsoft.com/MPIO • SCSI Specifications (SPC-3), ratified versionhttp://t10.org/ftp/t10/drafts/spc3/spc3r23.pdf • Microsoft Windows Server Failover Clustering (WSFC)http://www.microsoft.com/downloads/details.aspx?familyid=75566F16-627D-4DD3-97CB-83909D3C722B&displaylang=en • Windows Management Interface on MSDNhttp://msdn.microsoft.com/en-us/library/aa394572.aspx • Contact Information (for feedback, future feature asks) • mpiopm@microsoft.com