250 likes | 388 Views
Update for Citi on ControlCenter 6.1. Response to the 7 issues. Citi Top 7 Issues. Lack of adherence December 2009 Get Well Plan specifically regarding Agent deployment and non standard host names
E N D
Update for Citi on ControlCenter 6.1 Response to the 7 issues
Citi Top 7 Issues • Lack of adherence December 2009 Get Well Plan specifically regarding Agent deployment and non standard host names • ECC Console Bug Case, Opened for 4 Months, that is causing an Audit Issue and Operational Problem. Citi is being told that the fix won’t be available for 6 months. “Whoever heard of a company that doesn’t immediately patch code problems?!” • AIX Native Packaging that requires a pre-requisite shell script is not Native Packaging according to Citi. • ECC’s inability to report on Celerra Storage accurately. • Concern that ECC won’t be able to support SubLUN reporting on V-MAX in Q4. • ECC does not support SubLUN reporting on CX which was promised to Citi in 2H10. • Last Discovery Time Does Not Work and the data associated with it is not Valid.
Timeline EMC design planning phase for 7 ControlCenter's Gathering existing site details, Citi growth estimates, planning for added functionality Oct 5th - Ionix CSE/Engineering Findings & Recommendations Several multi-day group meetings to were held to understand findings & recommendations Agent Remediation Project Goal to improve reporting Increased from 60% to 83.8 % Ionix CSE/Engineering Assessment of issues Citi Ops, Eng, SA Citi Citi Citi Citi Citi EMC PS EMC EMC EMC EMC EMC EMC Eng EMC Eng Jan 2010 Feb 2010 Mar 2010 Dec Nov Oct Sep Jul Aug Jun 2009
Timeline Jun 2009 Jul 2009 Aug 2009 Sep 2009 Oct 2009 Nov 2009 Dec 2009 Jan 2010 Feb 2010 Mar 2010 Apr 2010 May 2010 Jun 2010
#1 – Get Well Plan • Ionix investigated Started June 2009 • Primary Issue • Citi North America was only capturing Host Capacity data on about 60% of their business units Hosts. This affect Citi’s chargeback reporting. • Problems found • Thousands of Host Agents are not working • ControlCenter instability • Solutions Enabler instability
#1 – Get Well Plan - continued • Prioritized a recovery plan for Citi • Improved Capacity Reporting – Agent Remediation • Stabilize ControlCenter – Upgrade ControlCenter • Agent Remediation Project – July to Sept 2009 • Address inactive agents in ControlCenter • Seven ControlCenter Instances (Seven Data Centers) • Estimated 8,000 hosts • Determine cause of Hosts not reporting • Estimated 2,200 hosts • Created special tools to handle the inactive agents • Remediated ~1,100 Agents • Increased Host reporting to 83.8%
#1 – Get Well Plan – Moving Forward Plan • Findings and Recommendations presented to Citi • October 5, 2009 • Included Citi Use Cases and Gap Analysis • How Citi uses ControlCenter verses how ControlCenter was designed to be used • Cause analysis on why ControlCenter is instable in Citi’s environment • Cause analysis on why there are so many Agent problems • Recommendations • Need changes to ControlCenter Infrastructure (hw & sw) • Need new Agent deployment and Agent upgraded procedures • Citi request improvement are in ControlCenter 6.1 - Upgrade to latest version is needed
#1 – Stabilizing ControlCenter • In-Depth discussion on findings – October 2009 • Agent deployment issues • Host discoveries • Database discoveries • Server OS configurations • EMC best practices • Upgrading Planning – November to December 2009 • Gather growth information for each Datacenter • Analyze the existing installed ControlCenter instances (7) • Optimization and design changes needed • Add functionality and keep the ControlCenter foot small
#1 – ControlCenter 6.1 Upgrade Readiness • ControlCenter Upgrade redesign, reviews and acceptance by Citi • January 2010 to February 17, 2010 • Summary of Citi North America environment • Seven Datacenters, Seven different ControlCenter instances • 8,526 Hosts (1,188 Host increase) • 450 ESX, with 5,400 VMs (Net new) • 586 Databases (16 DB increase) • 30,927 SAN Ports (Net new) • 413 Storage Arrays (132 Array increase) • Add Performance data collection • Establish ControlCenter 6.1 design expectations per Datacenter • Establish a multi phase ControlCenter upgrade plan • Establish a project plan
#1 – Citi Upgrade Readiness • February to March 2010 • Citi certification of ControlCenter 6.1 and readiness • Citi upgrade documentation • Step-by-Step re-write of EMC documentation • Dry-run testing of the upgrade • Resolving server issues identified in Findings & Recommendations • Phase 1 ControlCenter 6.1 upgrade • Upgrade ControlCenter & core agents • Add SAN, Celerra and VMware discovery
#1 – ControlCenter Upgrade (Phase 1) April 2010 to June 18, 2010
#1 – ControlCenter Agent Readiness • Citi Engineering Certification • March 2010 to June 2010 • Testing EMC Native Install Package • Development and testing of the Citi Wrapper deployment code • EMC held a readiness meeting June 29, 2010 • Citi testing of the deployment code was not adequate • Citi Lab testing did not include all platforms • The Citi lab and servers used were not configure the same a in production • Additional testing completed July 13 • Uncovered problems with the discovery of the Host FQDN • Agent upgrade project put on hold Citi Wrapper & EMC Packages Citi Pre-Checks Sizing Dependencies EMC UB NI Package EMC SE Package
#1 – Current Status Since July 2010 • Discussion with Citi to explore options • Four options proposed to Citi • All four options rejected • Ionix Engineering continued to investigate other possible solutions • EMC Ionix Engineer decided to enhance the Agent code to Force a DNS FQDN lookup • Code enhancement made to the Agent SDK • ForceDNS option added to Agent Native Install package • Additional tools needed to clean-up ControlCenter after Hosts are rediscovered with “new” FQDN • Work is on-going to deliver this enhancement and tools with the release of UB9 and the UB9 Agent Native Install package
#1 – Updated moving forward plan • Meeting held Sept 14, 2010 • Citi certification of UB9 • Fast tracked Citi certification by Citi Engineering • UB9 certified by 12/31/2010 • UB9 NI & SE certified by 1/24/2011 • ControlCenter 6.1 Infrastructure upgrade to UB9 • Start on 1/10/2011 (target date) • Upgrade should take day hours per Datacenter instance • Target is six weeks to upgrade all 8 sites • Agent & SE upgrade to start 2/28/2011 (target date) • Time to complete agent upgrade of 8k hosts - TBD
#2 Console User Management • SR 35861334 opened 7/22/2010 • Open as slow console performance • Investigation by CS was over 5 days • CQ 528868 open by CS 7/29/2010 • Engineering marked this CQ as duplicate to CQ 493000 which was already fixed in UB8 (Slow Console Performance)
#2 Console User Management - Continued • CS & Ionix CSE investigation continued • CQ 528868 was updated with more details • Problem was better defined as: • Spinning “+” box for a long time, then disappears, User and Group list in console tree. Found NullPointerExecption in Console log • 8/20/2010 - Engineer gave work around as restarting the console • It was confirmed that this workaround was not successful at Citi • Engineering continued to work CQ 528868 • Fixing this NullPointerExecption problem uncovered two other problems 1. User --> Authorization --> new rule (Edit window never popped up). 2. Moving users to user group tasks appears not working. (no indications on the console for users to appear under the user group) • All three were fixed 9/17/2010 (HF 4855) – Still in QE
#2 Console User Management - Conclusion • Validation that this Citi problem is completely resolved is still needed. • The Complete fix Citi needs is in UB9 • HF4855 to UB8 solves • Console performance • NulPointerException • Users in multiple groups • Code changes made to UB9 • Coping users to groups • Adding/deleting administration rule • SR 35861334 opened 7/22/2010 was open 30 days after UB8 went GA • Citi has no plans to implement UB8 • Given this is an audit issue for Citi, EMC is investigating a SQL script to provide this data as a workaround • Still investigating this workaround
#3 AIX Native Install package for 6.1 • EMC provided the UB7 Native Install Package • February 24, 2010 • Citi Engineering Certification • March 2010 to June 2010 • August 2010 – Eliot Wilson rejects the AIX Native Install Package, claiming • Citi requires a .bff file to support for NIM (Network Install Manager) • This is an AIX only requirement • The .bff format was supported in EMC’s CC 6.0 Native Install
#3 AIX Native Install package for 6.1 • ControlCenter 6.1 NI (Native Install) package enhancements • Automatically determines if it is an agent Install, Upgrade or Patch • Runs silently • Agents are handle as they would be from a Console push • UB5 NI was the first EMC GA release • Core requirements for this NI came from Citi • Citi has certified and accepted the UB7 NI • June 2010 • The .bff file deployment uses a agent cloning methodology • Agent is installed on a platform and then cloned into a .bff file • Replication of cloned agents causes problems with CC 6.1 secure communications features
#4 Celerra (NAS) Reporting • Added to ControlCenter after the upgrade to UB7 • Citi has stated a lack of NAS reporting ability in ControlCenter • EMC has requested Citi to document what the gaps are for NAS reporting so EMC can address possible solutions to fill the gaps. • Citi has not provided this gap analysis as of Sept 29, 2010 • Citi has stated the lack of full discovery of the Celerra storage subsystem is a problem for the ControlCenter users • Storage Subsystems are not exposed to the IP network • EMC has significant NAS improvement for NAS in SRM7
#5 SubLUN reporting on V-Max • ControlCenter 6.1 UB7 provides V-Max discovery & Reporting • Installed at Citi today • ControlCenter 6.1 UB8 provides V-Max FAST VP at the device level (FAST v1) • Citi skipped UB* because they had no plans to support FASTv1 • ControlCenter 6.1 UB9 provides V-Max FAST VP at the block level (FAST v2) • Citi has always planned for UB9 – planned GA 11/24
#6 SubLUN for CX • ControlCenter 6.1 does not support CLARiiON VP • CLARiiON VP Threshold % Alert was adding in UB8 • CLARiiON VP was on the roadmap for a future UB • Target was UB9, but has been deferred to Q2/2011 • Target only, not confirmed • CLARiiON VP is also planned for SRM7
#7 Symmetrix LDT bug • Citi found the LDT (Last Discovered Time) field was being update for Symmetrix arrays that had lost Fibre Channel Connectivity • The FC connectivity was removed from the Agent Server • SR # and CQ # was opened Sept 10, 2010 • Engineering is investigating the problem • The Symmetrix has many Data Collection Policies. Some collect data from the array and some collect data from SE SYMAPI_DB • The LDT is being updated because Device Group information is being collected successfully and this is triggering an updated LDT