120 likes | 226 Views
NWfs. A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham. Science Drivers Three different domains with different requirements. High Performance Computing – Chemistry Low storage volumes (10 TB)
E N D
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham
Science DriversThree different domains with different requirements • High Performance Computing – Chemistry • Low storage volumes (10 TB) • High performance storage (>500MB/s per client, GB/s aggregate) • POSIX access • High Throughput Proteomics – Biology • Large storage volumes (PB’s) and exploding • Write once, read rarely if used as an archive • Modes latency okay (<10s to data) • If analysis could be done in place it would require faster storage • Atmospheric Radiation Measurement - Climate • Modest side storage requirements (100’s TB) • Shared with community and replicated to ORNL 2
Overview The proteomics driven storage explosion is casing us to: • Developing filesystems that enable lower cost hardware • Continued write on fileserver failure (route around) • Mirrored fileservers so we can use direct attached disk • Increasing filesystem technology to meet scalability and performance metrics needed by the science • 10,000+ clients accessing a POSIX 10+PB filesystem • >500MB/s single client rate • Add advanced technologies into the filesystem to increase performance and make it “smarter” • Scalable content management • Move the computation into the storage • It must work in production (not a research project) 3
EMSL’s Current Storage StrategyEMSL’s Storage Strategy has focused on capacity Estimated $/TB as a function of time and technology We use tools like Lustre to help us bridge this gap. Our storage sales rep want us here We want to be here 4
EMSL’s Current Storage StrategyDeveloping filesystems that enable lower cost hardware Our experience has shown that expensive disks fail about as often as cheap disks. We have a large sampling of disks: • 1,000 FC-SAN drives to make a 53TB filesystem • 20% duty cycle – The drives don’t fail much (1-3 disks per month) • Entire filesystem (all 1,000 drives) down once every two months. Mostly due to vendor required firmware updates to SAN switches or hardware failures. • 7,500 SCSI drives to provide ½ PB of scratch space • 100% duty cycle. Average ~3 disk failures per day (should be 0.5 per day). Experiencing bugs in the Seagate disks • 1,000 ATA/SAN to provide 200TB archive • 10% duty cycle. Average 1-3 disk failures per month 5
EMSL’s Current Storage Strategy NWfs Hardware Low Cost, High Performance StorageWe have replaced all our tapes with low-cost ATA storage. NWfs Project: • Includes; Lustre, Cluster mgt tools, minor Metadata capturing tools and a custom client side GUI to support gridFTP, striped and parallel data transfers. Linux-based OSTs Containing: • 2 CPU’s & RAM • Multiple 3Ware ATA RAID Adapters • 16 SATA Disk Drives • “Hot-Swap” RAID5 with multiple hot spares per node. • $3.5K/TB after RAID5 • Infiniband 4X backbone • New SATA drives include rotational vibration safeguard 400TB ≈ $1.5M 6
Increasing filesystem technology to meet scalability and performance metrics needed by the science • Lustre has been in full production since last Aug and used for aggressive IO from our supercomputer. • Highly stable • Still hard to manage • We are expanding our use of Lustre to act as the filesystem for our archival storage. • Deploying a ~400TB filesystem 660MB/s from a single client with a simple “dd” is faster than any local or global filesystem we have tested. We are finally in the era where global filesystems provide faster access 7
MetaData Server Cluster Index2 Index3 Remote Index EMSL’s Current Storage StrategyScalable Content Management Client Storage Pool1 Remote Storage Pool2 8
EMSL’s Current Storage StrategyLooks a lot like Lustre Client OST MDS Index2 Index3 9
MDS Index2 Index3 Remote Index EMSL’s Current Storage StrategyAdd replication to support DAS & collaboration Client OST OST 10
EMSL’s Current Storage Strategy Active StorageMoving the computation into the storage rather than moving the data to the compute power. Classic parallel file systems stripe at the block level. This requires the distributed data to be reassembled in order to post process Classical Storage Data Stream Parallel file system Reassemble & post process Active Storage PNNL is developing code that will allow post processing to be performed on objects inside the file system and make use of the computational power on the file servers. Data Stream Post process in object based parallel file system Demonstrated 1.3GB/s FT stream 11
MDS Index2 Index3 Remote Index EMSL’s Current Storage StrategyNWFS V3.0Lustre with replication, Content Mgt, Active Storage API Client OST OST 12