1.48k likes | 2.14k Views
VNX Parts Replacements. Boris Sobolev Lennox Robin. Parts replacement. Each procedure consist : 1. Before you go onsite - ETA’s / Warnings Primus can be found here – http://csexplorer.isus.emc.com/eservice/iviewcs/ui/eserver.asp
E N D
VNX Parts Replacements Boris Sobolev Lennox Robin
Parts replacement Each procedure consist : 1. Before you go onsite - ETA’s / Warnings Primus can be found here – http://csexplorer.isus.emc.com/eservice/iviewcs/ui/eserver.asp 2. Collect environment configuration informationRun SP collect Using available tools and review SP collects 3. Handling FRU’s 4. Part replacement specific information
Parts replacement – USM • Unisphere Service Manager (USM) can be downloaded from –
Parts replacement – parts which can be replaced by user on Block
Parts replacement – Other Hardware https://support.emc.com/ https://mydocs.emc.com/VNX/
Parts replacement – Other Hardware https://mydocs.emc.com/VNX/
Parts replacement – Other Hardware https://mydocs.emc.com/VNX/
Parts replacement – Other Hardware https://mydocs.emc.com/VNX/
Parts replacement – Other Hardware https://mydocs.emc.com/VNX/
Parts replacement – Common Tasks • Disable ConnectHome and Email notifications. • Enable ConnectHome and Email notifications. • Disable write cache. • Enable write cache. • Restore trespassed LUNs. • Check system state.
Parts replacement – Common Tasks 1. Disable ConnectHome and Email notifications. Use a HyperTerminal session to disable ConnectHome must be SU: a. From the root directory, disable ConnectHome: # /nas/sbin/nas_connecthome -service stop b. Disable the email notification service: # /nas/bin/nas_emailuser -modify -enabled no c. Verify that the email notification service has stopped (is not enabled): # /nas/bin/nas_emailuser -info ConnectHome and email notifications are now disabled.
Parts replacement – Common Tasks 2. Enable ConnectHome and email notification a. From the root directory, clear any existing ConnectHome files and enable ConnectHome: # /nas/sbin/nas_connecthome -service start –clear b. From the ConnectHome configuration, determine the connections that are enabled: # /nas/sbin/nas_connecthome -i c. Verify that ConnectHome works with the /nas/sbin/nas_connecthome -test connec tion_name command for each enabled connection. For example: # /nas/sbin/nas_connecthome -t -email_1 or # /nas/sbin/nas_connecthome -t -email_2 or # /nas/sbin/nas_connecthome -t -https or # /nas/sbin/nas_connecthome -t -modem_1
Parts replacement – Common Tasks 2. Enable ConnectHome and email notification (continue) d. Verify email notifications is configured: # /nas/bin/nas_emailuser -info If the Recipient Address(es) field is empty, email notifications has not been configured and does not need to be enabled. If you want to configure email notifications, use the /nas/bin/nas_emailuser command or Unisphere to configure it. If the Recipient Address(es) field is populated, email notifications was enabled. Re-enable email notifications: e. Enable email notifications: # /nas/bin/nas_emailuser -modify -enabled yes f. Verify that email notification works: # /nas/bin/nas_emailuser -info g. Test the configuration and verify that the configured Recipient Address(es) received the test email: # /nas/bin/nas_emailuser -test
Parts replacement – Common Tasks 3. Disable write cache Display and record the current write cache settings: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 getcache |grep "Cache Size“ Disable and zero out the system write cache: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wsz 0 -wc 0
Parts replacement – Common Tasks 4. Enable write cache. Using the open HyperTerminal session, set the write cache size to match the previous setting [root@VNX5700-CS0 nasadmin]# grep SP /etc/hosts 10.5.22.160 A_APM00112800336 SPA # CLARiiON SP 10.5.22.161 B_APM00112800336 SPB # CLARiiON SP # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wsz <write_cache_size> Enable the write cache: # /nas/sbin/naviseccli -h <IP_address> -user <name> -password <password> -scope 0 setcache -wc 1
Parts replacement – Common Tasks 5. Restore trespassed LUNs. Using the CLI, do the following: a. Log in to the primary Control Station as nasadmin and change to the root user: $ su root b. Determine the storage-system serial number (storage-system ID): # nas_storage –list [nasadmin@VNX5700-CS0 ~]$ nas_storage -list id acl name serial_number • 0 APM00112800336 APM00112800336 c. Restore the LUNs to the correct SP: # nas_storage -failback storage-system-name For example: # nas_storage -failback APM00070300923 id = 1 serial_number = APM00070300923 name = APM00070300923 acl = 0 done
Parts replacement – Common Tasks 5. Restore trespassed LUNs. (Continue) To restore all LUN’s which by default owned by SP use “mine” command, “mine” command must be issued from the SP that the LUN will trespass to. C:\>naviseccli -h 10.5.22.177 trespass mine C:\>naviseccli -h 10.5.22.177 getlun -default -owner LOGICAL UNIT NUMBER 29 Default Owner: SP B Current owner: SP B LOGICAL UNIT NUMBER 18 Default Owner: SP A Current owner: SP A LOGICAL UNIT NUMBER 17 Default Owner: SP B Current owner: SP B LOGICAL UNIT NUMBER 20 Default Owner: SP A Current owner: SP A
Parts replacement – Common Tasks 5. Restore trespassed LUNs. (Continue) For one LUN: C:\>naviseccli -h 10.5.22.193 getlun 2 -default -owner Default Owner: SP B Current owner: SP A C:\>naviseccli -h 10.5.22.193 trespass lun 2 Error: trespass command failed This command must be issued from the SP that the LUN will trespass to C:\>naviseccli -h 10.5.22.194 trespass lun 2 C:\>navicli -h 10.5.22.194 getlun 2 -default -owner Default Owner: SP B Current owner: SP B
Parts replacement – Common Tasks 6. Check system state. To view the system state enter the following command: # /nas/bin/nas_checkup Example: # /nas/bin/nas_checkup Check Version: <NAS_version> Check Command: /nas/bin/nas_checkup Check Log : /nas/log/checkup-run.100128-181007.log -------------------------------------Checks------------------------------------- Control Station: Checking if file system usage is under limit.............. Pass Control Station: Checking if NAS Storage API is installed correctly........ Pass If the output of the nas_checkup command indicates any problems, correct the problems and re-do the command before continuing.
Parts replacement – Common Tasks 6. Check system state (Continue)
Parts replacement - Drive Summary 1. Diagnose and identify the CRU to replace 2. Download and install the USM 3. Verify that you do not have multiple failure situation 4. Run the Disk Replacement wizard 5. Replace the drive
Parts replacement - Drive • Diagnose and identify the CRU to replace • The amber fault indicator lit on the disk module • The Unisphere “Fault report” indicates no problems other than a single disk failure • A “CRU removed” (920c, xx0d) message in the event log; and • Verify in the event log shows no “error” or “critical error” events for either other disks or other components.
Parts replacement - Drive Diagnose and identify the CRU to replace
Parts replacement - Drive Diagnose and identify the CRU to replace
Parts replacement - Drive How do I know when a single disk module is faulted and should be replaced? When is it not OK to remove or replace a disk module? How should I proceed when more than a single drive indicates a fault?
Parts replacement - Drive When is it not OK to remove or replace a disk module? • An NDU is in progress • The drive is protected by RAID type 0 • An array component in addition to a disk is indicating a fault • More than one disk is indicating a problem
Parts replacement - Drive When is it not OK to remove or replace a disk module? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A
Parts replacement - Drive When is it not OK to remove or replace a disk module? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A RG 2 RG 3 RG 1 Raid 1 Raid 1/0 Raid 3 Raid 5 Raid 6
Parts replacement - Drive When is it not OK to remove or replace a disk module? Parity Data 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LCC B LCC A RG 2 RG 3 RG 1 Another (previously replaced) disk is equalizing or rebuilding in the same RAID group Remember that removing more than one drive from a RAID group will certainly cause loss of data availability, and may cause loss of data.
Parts replacement - Drive What is going on when the drive failed ? D1 D2 D3 P123 H
Parts replacement - Drive What is going on when the drive failed ? X D1 D2 D3 P123 H
Parts replacement - Drive What is going on when the drive failed ? Hot spare will be invoked X D1 D2 D3 P123 H
Parts replacement - Drive What is going on when the drive failed ? Hot spare is rebuilt using XOR Calculation X D1 D2 D3 P123 H
Parts replacement - Drive What is going on when the drive failed ? Hot Spare assumes personality of the failed drive. X D1 D2 D3 P123 D3 Use event logs to verify that rebuilt finished – look for: A 08/05/06 03:13:22 Bus2 Enc0 Dsk8 67d All rebuilds for a FRU have completed
Parts replacement - Drive What is going on when a bad drive is replaced ? When bad drive is replaced a hot spare starts equalizing to a new drive X D1 D2 P123 D3
Parts replacement - Drive What is going on when a bad drive is replaced ? When a hot spare finished equalizing to a new drive host spare becomes available to other drives in the clariion. D1 D2 P123 H D3
Parts replacement - Drive How to find out which disks belong to the raid group?
Parts replacement - Drive How to find out which disks belong to the raid group?
Parts replacement - Drive How to find out which disks belong to the raid group?
Parts replacement - Drive What should I do when more than a single drive is faulted in the same RG? • Identify which drive reported a failure. • Run the SP collect script on both of the storage processors • Escalate to the call center.
Parts replacement - Drive Disk Replacement Wizard
Parts replacement - Drive Disk Replacement Wizard
Parts replacement - Drive Disk Replacement Wizard
Parts replacement - Drive Disk Replacement Wizard
Parts replacement - Drive Disk Replacement Wizard
Parts replacement - Drive Disk Replacement Wizard