70 likes | 291 Views
IDC HPC User Group April 16-17, 2012. Lustre Development U pdate. Dan Ferber Whamcloud, Inc. dferber@whamcloud.com. Lustre Community - stability.
E N D
IDC HPC User Group April 16-17, 2012 Lustre Development Update • Dan Ferber Whamcloud, Inc. dferber@whamcloud.com
Lustre Community - stability "The Spider parallel file systems at Oak Ridge National Laboratory, three of the largest parallel file systems in the world, powered by Data Direct Networks, Dell, Mellanox, and Lustre supported by Whamcloud, have provided unprecedented reliability in supporting the I/O needs of all major compute systems at our Leadership Computing Facility. On our largest file system we have achieved 100% availabilityin 8 of the 12 months of 2011 with scheduled availability of 99.26% over the entire year. The stability of this system is unprecedented given its scale.” -- Galen M. Shipman, Technology Integration Group Leader, National Center for Computational Sciences (NCCS) at ORNL
Lustre Community Work Lustre 2.2 Landings by Company Lustre Landings by Contract for Q1 2012
Areas of Development and Interest • Performance, Back-End Options, and HSM • Parallel directory operations, more granular locking • Add ZFS as a back-end Lustre option • HSM code has begun to land • Networking • LNET Dynamic Configurations, • Channel Bonding, • Health Networks, • IPv6 • Storage Management • Tiered Storage: Policy-driven object storage placement • Migration: OST rebalancing, Async mirroring • Small File Performance: Unified Targets • Other • Administrative Shutdown • Test frameworks • JobStats • Hadoop Integration
Model 1 /user Exascale Challenges Model 2 OODB metadata OODB metadata OODB metadata OODB metadata OODB metadata App metadata OODB metadata /project1 OODB metadata OODB metadata App metadata /project2 data data data data data data data data data data data data data data data data data data data data data data data data data data data Application data + metadata • Explosive growth • Large, sophisticated models • Uncertainty Qualification • Billions – trillions of “Leaf” data objects • Complex analysis • Filesystem namespace pollution • Keep filesystem namespace for storage management / administration • Separate namespace for application data + metadata • Distributed Application Object Storage (DAOS) containers • Preserve model integrity in the face of all possible failures • Very large atomic, durable transactions • Integrity APIs at all levels of the I/O stack • Search / query / analysis • Non-resident index maintenance & traversal / non-sequential data traversal • Move query processing to global storage • Same programming model as apps? data data data Model 3 OODB metadata OODB metadata OODB metadata OODB metadata App metadata data data data data data data data data data data data data data data data
Chroma Manager v1.0 features • Format and manage Lustre filesystems • Add/remove devices • Historical filesystem stats • Monitor performance and status • High-level dashboard view of all monitored objects • Drill down to filesystems and individual components • Ingest syslogs from Lustre servers • Defined conditions detected in parameters and logs • Raise alerts • Vendor integrated appliances • Monitor/manage backend hardware • Plug-in interface to integrate with vendor API • Customizable GUI to reflect vendor's Branding • Built with specific Lustre Knowledge • Not a tool integration project w/ off the shelf gadgetry