290 likes | 301 Views
Join the online session to learn about improving storage infrastructure processes, with real customer examples and IntelliMagic Visibility approach. Get actionable recommendations for better performance predictability.
E N D
IntelliMagic Vision for SAN DFW Lunch-n-Learn Series Brett Allison, Director of Technical Services Brian Howard, VP of Sales
Agenda • Improving the Storage Infrastructure: • Problem Process • Proactive Process • Planning Process
Recent Customer Example - Background • Frequent performance problems • IT had been outsourced • Client owns the infrastructure • An outsourcer manages the infrastructure • Most of the resources reside off-shore, lack of deep performance skills off-shore and on-shore resources are over-subscribed • Lack of visibility with current tools resulting in weeks of fire-fighting • Client / outsourcer relationship had a certain amount of friction
Customer Problem - Summary • Specific instance of a problem is with a server named topaxdb2p051 • Summary of Findings: • IBM Spectrum Virtualize Global Mirror used for Remote Replication and Data Recovery • Restoration of database associated with topaxdb2p051 led to significant write increases and write performance degradation for topaxdb2p051 and likely many other systems within the shared ecosystem
IntelliMagic Visibility Approach • Install lightweight collector to gather necessary storage systems to monitor replication to monitor fabric and key storage systems • Send data to IntelliMagic SaaS environment • Analyze the data and provide context and technology centered results using White Box Artificial Intelligence
Compatible Machine Assistance Approaches Black-box Analysis White-box Analysis aka Availability Intelligence Typical for most statistical approaches Reactive Pro-active • Platform-specific interpretation • Algorithms access expert knowledge • Focused on root causes for predictive and prescriptive insights • Can each subcomponent handle work? • Platform-agnostic • Quick, relative correlations only • Focused on problem symptom metrics, not truly predictive • Has the workload changed?
Front-End Write Response [rating: 0.28] For Serial 'TOP_SVC_03_GM' by Storage Pool Rating based on DSS Storage Pool data using DSS Thresholds Primary Site Write Latency to all storage pools with replicated volumes increased significantly between 9:15 AM and 5:00 PM on 11/27/2018
Response Time for Replication Writes [rating: 2.92] For Serial 'TOP_SVC_03_GM' Rating based on DSS Storage Pool data using DSS Thresholds Secondary Site Write Latency to all storage pools with replicated volumes increased significantly between 9:15 AM and 5:00 PM on 11/27/2018
Replication Writes for Spectrum Virtualize For Serial 'TOP_SVC_03_GM' The number of Replicated Write tracks (64KB) increased significantly during the problem period.
Replication Send [rating: 0.00] For Serial 'TOP_SVC_03_GM' Rating based on DSS Links data using DSS Thresholds The send MB/sec increased from and average of around 180 MB/sec to 360 MB/sec during this peak period.
Top 10 Replication Writes Tracks For Serial 'TOP_SVC_03_GM' by Volume Label The majority of the increase was related to writes tracks to SO_purescalecluster_appp_00* LUNs
Top 10 Replication Write Response Time For Serial 'TOP_SVC_03_GM' by Volume Label The secondary write latency increased from <100 ms to > 200 ms for SO_purescalecluster_appp_00* LUNs
Port to Remote Node Response Time [rating: 0.17] by Serial Rating based on Host Adapters data using DSS Thresholds The average increased latency from primary site TOP_SVC_03_GM to secondary site is easy to spot in this chart over last 7 days. This is a good way to monitor if the condition is occurring.
Zero Buffer to Buffer Credits [rating: 2.86] For Switch WWN 'topfddcxp003' Rating based on Switch Ports data using Switch and Port Thresholds When global mirror is forcing synchronous writes over a congested link with high latency, the writes on the primary site consume valuable buffer credits. This can impact users without any replication as buffer credits are limited shared resources on the switches. This chart shows this increase in buffer credit shortages.
Summary of Findings • Host write response time is very poor during peak periods • Restores are happening to systems with replicated volumes resulting in unnecessary traffic • All symptoms point to bandwidth constrained replication environment
IntelliMagic Recommendations • Two ways to resolve this issue within current technology: • Add additional bandwidth • Add additional storage capacity at the primary site and configure Global Mirror with Change Volume • Best Practice: • Coordinate large restorations to ensure volumes are not replicated during restoration processes
IntelliMagic Recommendations - Continued • Implement processes and tools with the goal of improving the predictability of the performance of the environment: • Implement IntelliMagic Vision to provide deep visibility into all facets of the SAN infrastructure • Provide training for staff and customized dashboards to provide quick root cause analysis and understanding of what to do • Provide alerting when bandwidth requirements exceed available bandwidth • Provide investigation processes to quickly identify hosts/applications causing issues and remediation steps. • Provide ongoing performance analysis services
Proactive Best Practice #1 Daily Review of Vendor Specific Storage Array Key Performance Indicators
Proactive Best Practice #2 Daily Review of SAN Fabric Health
Proactive Best Practice #3 Daily Review of Host I/O Workload
Proactive Best Practice #4 Daily Review of Key Capacity Indicators
Proactive Best Practice #5 Configuration: Audit SAN Zoning Health
Plan Your Capacity Growth Quarterly: Plan for storage capacity
Capacity Forecast Over Time Quarterly: Plan for storage capacity
IntelliMagic Storage Infrastructure Visibility Improves Your: Problem Process: Proactive Process: Planning Process:
Thank You for Coming! • We will now do the drawing • We’re local, ready to help! • Amy Quick will call you seeing about setting up discovery meetings www.intellimagic.com