1 / 48

EMC Smarts ControlCenter and SIA Integration #246

EMC Smarts ControlCenter and SIA Integration #246. Lynda Em April 4, 2007. Agenda. SMARTS Technology What is SIA? SIA Architecture Cross-communication between CC and SIA Installation process Troubleshooting techniques Possible demo???. EMC Smarts Technology. Value Proposition.

moswen
Download Presentation

EMC Smarts ControlCenter and SIA Integration #246

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EMC SmartsControlCenter and SIAIntegration #246 Lynda Em April 4, 2007

  2. Agenda • SMARTS Technology • What is SIA? • SIA Architecture • Cross-communication between CC and SIA • Installation process • Troubleshooting techniques • Possible demo???

  3. EMC Smarts Technology Value Proposition • Automated, actionable intelligence • Pinpoint service-affecting problems in real time • Quantify impact to prioritize action • Update automatically to adapt to infrastructure changes • Cross-domain correlation • Correlate information, applications, infrastructure, and business services across management silos • Business-centric • Understand exactly how IT problems affect services and customers

  4. What is SIA? Overview SIA terms

  5. Storage Insight for Availability 1.0.0.1 • Release Date 8/21/06. • Smarts Storage Insight for Availability is the first offering in the new Smarts Storage Insight family. • Storage Insight for Availability automates root cause and impact analysis of availability problems across the EMC tiered storage infrastructure, resulting in a dramatic reduction in downtime and mean time to repair for existing ControlCenter V5.2 customers. • Based on patented Smarts technology, only Storage Insight for Availability automates fault management for the Fibre Channel SAN infrastructure.

  6. What Will SI for Availability do for you? • Automated problem diagnosis – symptoms vs. root causes • Root cause problems identified: • Symmetrix units, front-end directors, port links, devices • CLARiiON units, disks, Storage Processors and port links • Fibre Channel switch units and port links • Host Bus Adapter cards and port links • Celerra Data Movers • Impact Analysis • Impacted elements along the data path: • Host systems, Host file systems, power path devices, host physical devices, logical volumes • Celerra Data Movers and client shares • Cross-domain root cause and impact analysis for Celerra • Celerra gateway models • CLI/XHMP polling for client mapping to Celerra Data Movers • Cross domain analysis in conjunction with IP AM

  7. IP Availability Manager Storage Insight for Availability IP Network Infrastructure Storage Insight for Availability Deployment Architecture SMARTS Global Console • EMC ControlCenter 5.2 • Monitors SAN Infrastructure • Storage Insight for Availability • Automated SAN fault management • Service Assurance Manager • Integration point for Smarts products • Global Console • Focal point for monitoring and analysis • Business Impact Manager • Identify business impact of problems • IP Availability Manager • Correlation between FC SAN and IP network, through Celerra Gateway Business Impact Manager SMARTS Service Assurance Manager EMC ControlCenter 5.2 SP5 SAN Infrastructure

  8. 6 Root Cause 4 Codebook Correlation Business Impact 1 ICIM Library 3 ICIM Repository 2 Discovery 5 Polling/Pinging EMC Smarts Technology Automating Service Management—Start to Finish Analysis Context Collection

  9. Storage Insight Terms • PortLink • A physical connection between two ports: • A port on the FC switch and a port on the HostSystem, OR • A port on the FC switch and a port on the StorageSystem, OR • Two ports on peer FC switches • SCSITargetInitiatorPath • A physical connection in a SAN • Between a port on the HostSystem and a port on the StorageSystem • SCSITargetInitiatorPaths are layered over PortLinks • DataPath • A logical connection in a SAN • Between a HostPhysicalDevice on the HostSystem and an ArrayStorageVolume on the StorageSystem

  10. Storage Insight Terms • DataPathRedundancyGroup • Composed of two or more DataPaths for redundancy • Supports a logical connection between a HostFileSystem on the HostSystem and an ArrayStorageVolume on the StorageSystem. • Powerpath_Datapath • Layered over a SCSITargetInitiatorPath element • Is associated with one and only one PowerPathDevice on the HostSystem • Is always part of a PowerPathRedundancyGroup • PowerPathRedundancyGroup • Composed of two or more Powerpath_DataPaths • Supports a logical connection between a HostFileSystem on the HostSystem and an ArrayStorageVolume on the StorageSystem • Is used to model I/O paths managed by PowerPath

  11. Global Console Views • Notification Log • Can be filtered to create custom logs • Summary View • Map Views • Physical Maps, SAN Maps, NAS Maps, IP Maps • Topology View • Browse the detail of specific devices and relationships

  12. SIA Architecture Overview ControlCenter communication

  13. ControlCenter Mediation Layer • ControlCenter Agents used • Storage Agent for Symmetrix • Storage Agent for CLARiiON • Storage Agent for NAS • Fibre Channel Connectivity Agent • Host Agents for Windows, Solaris, AIX, HP-UX and Linux

  14. Alerts and DCP Schedules • SIA is listening for a small subset of ControlCenter Alerts • Imperative that alerting is working properly • Some SI-A subscribed alerts are on by default and some must be turned on • Powerpath alerts must be created from Alert Template folder • Most alerts are agent-controlled • CLARiiON alert schedules can be user-controlled • Powerpath alerts must be changed on each host • Switch alerts are DCP-controlled • FCC Agent dcp is 1 hour by default • Set SNMP traps from switch to FCC agent for immediate events

  15. Architectural Overview • Two Domain Managers • SIA Topology Server (STS) • SIA Analysis Server (SAS) • Both use ECC API NG 2.0 (GA3) – specifically Build Identifier “09JUN2006.1345.242” • Both establish JDBC connections to the ECC repository • Probe support for these • Symmetrix • CLARiiON • Celerra • Host • Switch/Fabric • AM + NAS needed only for Celerra RCA & cross domain • Additional Celerra probes talk directly to Celerra • NFS clients • CIFS clients

  16. SIA Probes • Hybrid approach using ECC API NG 2.0 as well as direct DB queries • ControlCenter probes use ECC API NG 2.0 to get all Symmetrix, CLARiiON, Celerra, Host, and Switch instances and subscribe for alerts • Individual Probes launched for each instance • (STS) Access DB for detailed topology • (SAS) Process alerts • Additional Celerra probes launched to get NFS and CIFS clients of each Celerra • Use XHMP (CIFS) or SSH (NFS) to get information

  17. New 1.0.0.1 Architecture (User’s Guide diagram)

  18. SIA component dependencies • SAM 6.5.1 (RP 38) server needed for monitoring and maps • IP-AM with NAS extensions needed only for Celerra • ControlCenter 5.2SP4 + SIA specific ECC hotfix 3655 • ControlCenter API NG 2.0 Server – GA3 • ControlCenter Host Agents • 1 SIA server set per ControlCenter Server • Dedicated machine for Smarts servers • OS Support • Windows 2000 Advanced Server 2004 • Windows 2003 Enterprise Edition SP1 • Windows Server 2003 R2 Enterprise Edition

  19. How SIA works Auto-discovery Root-cause analysis

  20. Single server

  21. Split server

  22. Server Interaction & Sequencing • Topology & Analysis servers are independent • Operations need to be coordinated • Analysis Server needs to know when to import new topology • Alert processing must be suspended during import • Analysis server needs to connect to Topology Server • etc • Sequence defined for following scenarios: • Server cold-start • Discovery • Rediscovery • Server restart

  23. Startup Sequence

  24. Discovery Sequence

  25. Rediscovery Sequence

  26. Restart Sequence

  27. Storage Insight for Availability Installation Process

  28. Installation • ControlCenter 5.2 SP 4 with SIA specific Hot Fix 3655 • ControlCenter API 2.0 Server • Smarts Broker 6.5.1 • Smarts SAM 6.5.1 • Smarts IP AM + NAS 6.5.1 (optional) • Uninstall any previous SIA product • Smarts SIA 1.0.0.1

  29. Post Install Steps • Start ECC & API NG 2.0 Server • Start Smarts Broker • Start Smarts SAM • Start IP-AM + NAS (if necessary) • Start SIA services (or servers) • [See Install & Config Guide for command-line options, if desired] • Launch Global Console (sm_gui) and attach to SIA-Topology • Launch Domain Manager Admin Console • Launch Polling and Thresholds • Configure ECC related credentials and Celerra related credentials (if necessary) • Back in DMAC, add Source for ControlCenter and IP-AM (if necessary)

  30. Start Services

  31. ECC credentials

  32. Celerra credentials

  33. To Start a Discovery - Add Source

  34. Storage Insight for Availability P & S Guidelines Support issues

  35. Scalability issues in 1.0 • Original single server ran out of memory for large topologies • Solution was to split into 2 servers • Topology server (STS) • Analysis server (SAS) • Servers run on same host • Server processes can be changed to access 3GB RAM on Windows for large environments • This change has to be made to the Windows OS before SIA installation • set /3GBand /PAEswitches in theBoot.inifile on the system

  36. Hardware specifications (from P&S Guide) • 3 GB extension may be needed. • Minimum 2 CPUs & 4GB RAM needed

  37. Discovery timings • Initial discovery of the ControlCenter Repository can take a long time, depending on the size of the topology. • Alert processing is suspended during discovery • Queued alerts are processed later • Rediscoveries are quicker • P&S Guidelines have processing downtime examples • Need to balance how often re-discoveries are done

  38. SI-A – Cisco support • Issue found in 1.0 • SI-A currently does not support Cisco VSANs properly • The switches are discovered, however, there are inaccuracies in the topology and root cause analysis • Resolution for 1.0.0.1 • Cisco switches will not be imported into the SIA topology in 1.0.0.1 • Rolling Patch will be provided for full Cisco switch support • Expected approximately10 weeks after 1.0.0.1 GA (late October) • What do I do if customer has Cisco switches? • Recognize sales and implementation cycles relative to 1.0.0.1 patch when targeting customers with Cisco switches

  39. Alert processing downtime

  40. SIA Troubleshooting

  41. Troubleshooting SIA Log Files • SIA Servers have separate logs: • in Incharge6/SI/smarts/local/logs/<server-name>.log.<ver> • e.g.: SIA-TOPOLOGY.log (default) • e.g.: SIA-ANALYSIS.log (default) • Probe framework creates additional logs: • also in local/logs in the format <server-name>-<ClassName>_0.log* • each one has an associated .lck as well • e.g.: SIA-TOPOLOGY-ClariionProbe_0.log • e.g.: SIA-ANALYSIS-EccAlertDispatcherMgr_0.log • ECC client API log files • in smarts/ECC/client/log/server/ *Note: on shutdown logs stay open as rps is saved. If restarted early you will see new probe logs with “.1” appended log filename.

  42. Other log files • SAM Log files • in SAM/local/logs/<server-name>.log.<ver> • \InCharge6\SAM\smarts\local\logs> • InCharge6\SI\smarts\local\logs> • InCharge6\IP\smarts\local\logs> • Repository files • *.rps files for SIA, SAM and IP AM • \InCharge6\SAM\smarts\local\repos> • \InCharge6\SI\smarts\local\repos> • \InCharge6\IP\smarts\local\repos> • ECCAPI NG log files • ECC\ECCAPING\server\log\server>

  43. SIA <-> ECC Connection Problems • SIA server connections can fail in these ways • ECC API connection is lost (ECCConnectionDown) • ControlCenter Connection Failed notification • DB connection is lost (DBConnectionDown) • Database Connection Failed notification • SIA Analysis Server detects these for itself and the SIA Topology Server • Two EMSAgent instances are monitored • ControlCenter // represents SIA Analysis Server • ControlCenter-Topology // represents SIA Topology Server • Notify an event on the appropriate EMSAgent instance • Events will clear when connection is re-established • Refer to the User Guide on corrective actions

  44. Discovery Errors • Discovery errors in SIA should be rare as most of the topology is pulled from Control Center • Objects may be deleted between time main probe in SIA gets the list of instances and individual probe goes to ECC to get the instance data • Will get a discovery error with database id as the instance name • Celerra clients may not be discovered if probe parameters are wrong or box is unreachable from SIA • AM may have discovery errors if it cannot reach a control station or data mover of a Celerra • Fabric may disappear from SI after initial discovery following a fabric split. Will reappear after subsequent discovery.

  45. Naming Issues • SIA naming is designed to be consistent with AM with respect to NAS entities and Hosts, but may not be consistent for other devices exposed to AM • This will result in multiple representations for the same instance in SAM, e.g. FiberChannelSwitch and Switch

  46. Timing issues • Celerra • AM may detect a Data Mover as unreachable before a failover happens • Will get notification of Data Mover down in SIA which will clear when AM detects the standby • Switch Alerts • Switch alerts require SIA to access the database for switch or port status • Database may not yet reflect status that caused alert • SIA will re-access database after 10 minutes (configurable) • Powerpath Alerts • Powerpath Alerts arrive based on a 30 minute DCP • May come after root cause has already been identified

More Related