1 / 29

IntelliMagic Vision for SAN DFW Lunch-n-Learn Series

IntelliMagic Vision for SAN DFW Lunch-n-Learn Series. Brett Allison, Director of Technical Services Brian Howard, VP of Sales. Agenda. Improving the Storage Infrastructure: Problem Process Proactive Process Planning Process. Improving the Problem Process.

pickney
Download Presentation

IntelliMagic Vision for SAN DFW Lunch-n-Learn Series

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IntelliMagic Vision for SAN DFW Lunch-n-Learn Series Brett Allison, Director of Technical Services Brian Howard, VP of Sales

  2. Agenda • Improving the Storage Infrastructure: • Problem Process • Proactive Process • Planning Process

  3. Improving the Problem Process

  4. Recent Customer Example - Background • Frequent performance problems • IT had been outsourced • Client owns the infrastructure • An outsourcer manages the infrastructure • Most of the resources reside off-shore, lack of deep performance skills off-shore and on-shore resources are over-subscribed  • Lack of visibility with current tools resulting in weeks of fire-fighting • Client / outsourcer relationship had a certain amount of friction

  5. Customer Problem - Summary • Specific instance of a problem is with a server named topaxdb2p051 • Summary of Findings: • IBM Spectrum Virtualize Global Mirror used for Remote Replication and Data Recovery • Restoration of database associated with topaxdb2p051 led to significant write increases and write performance degradation for topaxdb2p051 and likely many other systems within the shared ecosystem

  6. IntelliMagic Visibility Approach • Install lightweight collector to gather necessary storage systems to monitor replication to monitor fabric and key storage systems • Send data to IntelliMagic SaaS environment • Analyze the data and provide context and technology centered results using White Box Artificial Intelligence

  7. Compatible Machine Assistance Approaches Black-box Analysis White-box Analysis aka Availability Intelligence Typical for most statistical approaches  Reactive Pro-active • Platform-specific interpretation • Algorithms access expert knowledge • Focused on root causes for predictive and prescriptive insights • Can each subcomponent handle work? • Platform-agnostic • Quick, relative correlations only • Focused on problem symptom metrics, not truly predictive • Has the workload changed?

  8. Front-End Write Response [rating: 0.28] For Serial 'TOP_SVC_03_GM' by Storage Pool Rating based on DSS Storage Pool data using DSS Thresholds Primary Site Write Latency to all storage pools with replicated volumes increased significantly between 9:15 AM and 5:00 PM on 11/27/2018

  9. Response Time for Replication Writes [rating: 2.92] For Serial 'TOP_SVC_03_GM' Rating based on DSS Storage Pool data using DSS Thresholds Secondary Site Write Latency to all storage pools with replicated volumes increased significantly between 9:15 AM and 5:00 PM on 11/27/2018

  10. Replication Writes for Spectrum Virtualize For Serial 'TOP_SVC_03_GM' The number of Replicated Write tracks (64KB) increased significantly during the problem period.

  11. Replication Send [rating: 0.00] For Serial 'TOP_SVC_03_GM' Rating based on DSS Links data using DSS Thresholds The send MB/sec increased from and average of around 180 MB/sec to 360 MB/sec during this peak period.

  12. Top 10 Replication Writes Tracks For Serial 'TOP_SVC_03_GM' by Volume Label The majority of the increase was related to writes tracks to SO_purescalecluster_appp_00* LUNs

  13. Top 10 Replication Write Response Time For Serial 'TOP_SVC_03_GM' by Volume Label The secondary write latency increased from <100 ms to > 200 ms for SO_purescalecluster_appp_00* LUNs

  14. Port to Remote Node Response Time [rating: 0.17] by Serial Rating based on Host Adapters data using DSS Thresholds The average increased latency from primary site TOP_SVC_03_GM to secondary site is easy to spot in this chart over last 7 days. This is a good way to monitor if the condition is occurring.

  15. Zero Buffer to Buffer Credits [rating: 2.86] For Switch WWN 'topfddcxp003' Rating based on Switch Ports data using Switch and Port Thresholds When global mirror is forcing synchronous writes over a congested link with high latency, the writes on the primary site consume valuable buffer credits. This can impact users without any replication as buffer credits are limited shared resources on the switches. This chart shows this increase in buffer credit shortages.

  16. Summary of Findings • Host write response time is very poor during peak periods • Restores are happening to systems with replicated volumes resulting in unnecessary traffic • All symptoms point to bandwidth constrained replication environment

  17. IntelliMagic Recommendations • Two ways to resolve this issue within current technology: • Add additional bandwidth • Add additional storage capacity at the primary site and configure Global Mirror with Change Volume • Best Practice: • Coordinate large restorations to ensure volumes are not replicated during restoration processes

  18. IntelliMagic Recommendations - Continued • Implement processes and tools with the goal of improving the predictability of the performance of the environment: • Implement IntelliMagic Vision to provide deep visibility into all facets of the SAN infrastructure • Provide training for staff and customized dashboards to provide quick root cause analysis and understanding of what to do • Provide alerting when bandwidth requirements exceed available bandwidth • Provide investigation processes to quickly identify hosts/applications causing issues and remediation steps. • Provide ongoing performance analysis services

  19. Improving the Proactive Process

  20. Proactive Best Practice #1 Daily Review of Vendor Specific Storage Array Key Performance Indicators

  21. Proactive Best Practice #2 Daily Review of SAN Fabric Health

  22. Proactive Best Practice #3 Daily Review of Host I/O Workload

  23. Proactive Best Practice #4 Daily Review of Key Capacity Indicators

  24. Proactive Best Practice #5 Configuration: Audit SAN Zoning Health

  25. Improving the Planning Process

  26. Plan Your Capacity Growth Quarterly: Plan for storage capacity

  27. Capacity Forecast Over Time Quarterly: Plan for storage capacity

  28. IntelliMagic Storage Infrastructure Visibility Improves Your: Problem Process: Proactive Process: Planning Process:

  29. Thank You for Coming! • We will now do the drawing • We’re local, ready to help! • Amy Quick will call you seeing about setting up discovery meetings www.intellimagic.com

More Related