1 / 19

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast)

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast). Torben Kling Petersen, PhD Principal Architect High Performance Computing. The Challenge. The REAL challenge. File system Up/down Slow Fragmented Capacity planning

ura
Download Presentation

Advanced Lustre ® Infrastructure Monitoring (Resolving the Storage I/O Bottleneck and managing the beast)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Lustre®Infrastructure Monitoring(Resolving the Storage I/O Bottleneckand managing the beast) Torben Kling Petersen, PhD Principal Architect High Performance Computing

  2. The Challenge

  3. The REAL challenge • File system • Up/down • Slow • Fragmented • Capacity planning • HA (Fail-overs etc) • Hardware • Nodes crashing • Components breaking • FRUs • Disk rebuilds • Cables ?? • Software • Upgrades / patches ?? • Bugs • Clients • Quotas • Workload optimization • Other • Documentation • Scalability • Power consumption • Maintenance windows • Back-ups

  4. The Answer ?? • Tightly integrated solutions • Hardware • Software • Support • Extensive testing • Clear roadmaps • In-depth training • Even more extensive testing …..

  5. ClusterStor Software Stack Overview • ClusterStor 6000 Embedded Application Server • Intel Sandy Bridge CPU, up to 4 DIMM slots • FDR & 40GbE F/E, SAS-2 (6G) B/E • SBB v2 Form Factor, PCIe Gen-3 • Embedded RAID & Lustre support ClusterStor Manager Lustre File System (2.x) Data Protection Layer (RAID 6 / PD-RAID) Linux OS Unified System Management (GEM-USM) Embedded server modules CS 6000 SSU

  6. ClusterStor dashboard Problems found

  7. Hardware inventory ….

  8. Hardware inventory ….

  9. Finding problems ???

  10. But things brake ….Especially disk drives …What then ???

  11. Let’s do some math …. • Large systems use many HDDs to deliver both performance and capacity • NCSA BW uses 17,000+ HDDs for the main scratch FS • At 3% AFR this means 531 HDDs fail annually • That’s ~1.5 drives per day !!!! • RAID 6 rebuild time under use is 24 – 36 hours • Bottom line, the scratch system would NEVER be fully operational and there would constantly be a risk of loosing additional drives leading to data loss !!

  12. Drive Technology/Reliability • Xyratex pre-tests all drives used in ClusterStor™ solutions • Each drive is subjected to 24-28 hours of intense I/O • Reads and writes are performed to all sectors • Ambient temperature cycles between 40 °C and 5°C • Any drive surviving, goes on to additional testing • As a result Xyratex disk drives deliver proven reliability with less that 0.3% annual failure rate • Real Life Impact • On a large system such as NCSA BlueWaters with 17,000+ disk drives, this means a predicted failure of 50 drives per year • *“Other vendors” publically state a failure rate of 3%* which (given equivalent number of disk drives) means 500+ drive failures per year • With fairly even distribution, the file system will ALWAYS be in a state of rebuild • In addition as a file system with wide stripes will perform according to the slowest OST, the entire system will always run in degraded mode ….. *DDN, Keith Miller, LUG 2012

  13. Annual Failure Rate of Xyratex Disks • Actual AFR Data (2012/13) Experienced by Xyratex Sourced SAS Drives • Xyratex drive failure rate is less than half of industry standard ! • At 0.3%, the annual failure would be 53 HDDs

  14. Evolution of HDD technology: Impacts System Rebuild Time • As growth in areal density growth slows (<25% per generation), disk drive manufacturers are having to increase the number of heads/platters per drive to continue to increase max capacity per drive y/y • 2TB drives today typically includes just 5 heads and 3 platters • 6TB drives in 2014 will include a minimum of 12 heads and 6 platters • More components will inevitably result in an increase in disk drive failures in the field • Therefore systems using 6TB must be able to handle the increase in the number of array rebuild events

  15. Why Does HDD Reliability Matter? • The three key factors you must consider are drive reliability, drive size and the rebuild rate of your system • The scary fact is: new generation HDD, bigger drives will fail more often • Such drive failures are even more impactful on the file system performance and the risk of data loss when using bigger drives such as 6TB or larger !! • The rebuild window is bigger and risk of data loss is greater • Traditional RAID technology will take up to days to rebuild a single failed 6TB drive • Therefore Parity De-clustered RAID Rebuild technology is essential for any HPC system

  16. Parity Declustered RAID - Geometry • PD RAID geometry for an array is defined as: P drive (N+K+A) example: 41 (8+2+2) • P is the total number of disks in the array • N is the number of data blocks per stripe • K is the number of Parity blocks per stripe • A is the number of distributed spare disk drives

  17. Grid RAID advantage • Rebuild speed increased by more than 3.5 x • No SSDs, no NV-RAM, no accelerators ….. • PD-RAID as it was meant to be …

  18. Thank you …. tkp@xyratex.com

More Related