10 likes | 77 Views
Korea. Russia. UK. FermiLab. U Florida. Caltech. UCSD. FIU. F latech. FSU. FLR: 10 Gbps. 3000 physicists, 60 countries 10s of Pet abytes/yr by 2010 CERN / Outside = 10-20%. FlaTech Tier3. CMS Experiment. Online System. CERN Computer Center. 200 - 1500 MB/s. Tier 0. 10-40 Gb/s.
E N D
Korea Russia UK FermiLab U Florida Caltech UCSD FIU Flatech FSU FLR: 10 Gbps • 3000 physicists, 60 countries • 10s of Petabytes/yr by 2010 • CERN / Outside = 10-20% FlaTech Tier3 CMS Experiment Online System CERN Computer Center 200 - 1500 MB/s Tier 0 10-40 Gb/s Tier 1 UF HPC 10 Gb/s UF Tier2 OSG Tier 2 1.0 Gb/s Tier 3 Desktop or laptop PCs, Macs… FIU Tier3 Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez†, P. Avery*, T. Brody†, D. Bourilkov*, Y.Fu*, B. Kim*, C. Prescott*, Y. Wu* †Florida International University (FIU), *University of Florida (UF) Introduction We explore the use of the Lustre cluster filesystem over the WAN to access CMS (Compact Muon Solenoid) data stored on a storage system located hundreds of miles away. The Florida State Wide Lustre Testbed consist of two client sites located at CMS Tier3s, one in Miami, FL, one in Melbourne, FL and a Luster storage system located in Gainesville at the University of Florida’s HPC Center. In this paper we report on I/O rates between sites, using both the CMS application suite CMSSW and the I/O benchmark tool IOzone. We describe our configuration, outlining the procedures implemented, and conclude with suggestions on the feasibility of implementing a distributed Lustre storage to facilitate CMS data access for users at remote Tier3 sites. Lustre is a POSIX compliant, network aware, highly scalable, robust and reliable cluster filesystem developed by Sun Microsystems Inc. The system can run over several different types of networking infrastructure including ethernet, Infiniband, myrinet and others. It can be configured with redundant components to eliminating single points of failure. It has been tested with 10,000’s of nodes, providing petabytes of storage and can move data at 100’s of GB/sec. The system employs state-of-the-art security features and plans to introduce GSS and kerberos based security in future releases. The system is available as Public Open Source under the GNU General Public License. Lustre is deployed on a broad array of computing facilities, Both large and small, commercial and public organizations including some of the largest super computing centers in the world are currently using the Lustre as their distributed file system. Computing facilities in the distributed computing model for the CMS experiment at CERN. In the US, Tier2 sites are medium size facilities with approximately 106 kSI2K of computing power and 200TBs of disk storage. The facilities are centrally managed; each with dedicated computing resources and manpower. Tier3 sites on the other hand range in size from a single interactive analysis computer or small cluster to large facilities that rival the Tier2s in resources. Tier3s are usually found at Universities in close proximity to CMS researchers. The Florida State Wide Lustre Testbed IO Performance with CMSSW: FIU to UF Lustre version 1.6.7 Network: Florida Lambda Rail (FLR) • FIU: Servers were connected to the FLR via a dedicated Campus Research Network (CRN) @ 1Gbps, however local hardware issues limits FIU’s actual bandwidth to ~ 600 Mbps • UF: Servers connected to FLR via their own dedicated CRN @ 2x10Gbps • Flatech: Servers connected to FLR @ 1Gbps • Server TCP buffers set to max of 16MB Lustre Fileserver at UF-HPC/Tier2 Center: Gainesville, FL • Storage subsystem: Six each, RAID INC Falcon III with redundant dual port 4Gbit FC RAID controller shelves with 24x750 GB HDs, with raw storage of 104 TB • Attached to: Two dual quad core Barcelona Opteron 2350 with 16 GB RAM, three FC cards and 1x10GigE Chelsio NIC • Storage system clocked at greater than 1 GBps via TCP/IP large block I/O FIU Lustre Clients: Miami, FL • CMS analysis server: medianoche.hep.fiu.edu, dual 4 core Intel X5355 with 16GB RAM, dual 1GigE • FIU fileserver: fs1.local, dual 2 core Intel Xeon, with 16GB RAM, 3ware 9000 series RAID cntlr, NFS ver 3.x, RAID 5 (7+1) with 16TB disk raw • OSG gatekeeper: dgt.hep.fiu.edu, dual 2 core Xeon with 2GB RAM single GigE Used as in Lustre tests, experimented with NAT ( it works, but not tested) • System configuration: Lustre patched kernel-2.6.9-55.EL_lustre1.6.4.2, both systems mounted UF-HPC’s Lustre filesystem on local mount point Flatech Lustre Client: Melbourne, FL • CMS server: flatech-grid3.fit.edu, dual 4 core Intel E5410 w/8GB RAM, GigE • System configuration: unpatched SL4 kernel. Lustre enabled via runtime kernel modules Site Configuration and Security • All sites share common UID/GID domains • Mount access restricted to specific IP’s via firewall • ACLs and root_squash security features are not currently implemented in testbed Using the CMSSW application we tested the IO performance of the testbed between the FIU Tier3 and the UF-HPC Lustre storage. An IO bound CMSSW application was used for the tests. Its main function was to skim objects from data collected during the Cosmic Runs at Four Tesla (CRAFT) in the Fall of 2008. The application is the same as that utilized by the Florida Cosmic Analysis group. The output data file was redirected to /dev/null. Report on aggregate and average read I/O rate • Aggregate IO rate is the total IO rate per node vs. number of jobs concurrently running on a single node • Average IO rate is per process per job vs. number of jobs concurrently running on a single node Compare IO rates between Lustre NFS and local disk • NFS: fileserver 16TB 3ware 9000 over NFS ver. 3 • Local: single 750GB SATAII hard drive Observations • For NFS and Lustre filesystems the IO rates scale linearly with the number of jobs, not so with local disk • Average IO rates remain relatively constant as a function of jobs per node for distributed filesystem • The Lustre IO rate are significantly lower than seen with IOzone and lower than obtained with NFS We are now investigating the cause of the discrepancy between the Lustre CMSSW IO rates and the rates observed with IOzone Lustre clients are easy to deploy, mounts are easy to establish, are reliable and robust Security established by restricting IPs and sharing UID/GID domains between all sites Summary and Conclusion IO Performance with the IOzone benchmark tool FIU to UF Summary: • Lustre is very easy to deploy, particularly so as a client installation • Direct I/O operations show that the Lustre filesystem mounted over the WAN works reliably and with high degree of performance. We have demonstrated that we can easily saturate a 1 Gbps link with I/O bound applications • CMSSW remote data access was observed to be slower than expected when compared to IO rates using IO benchmarks and when compared to other distributed filesystems • We have demonstrated that the CMSSW application can access data located hundreds of miles away with the Lustre filesystem. Data accessed this way can be done seamlessly, reliably, with a reasonable degree of performance even with all components “out of the box” • The IOzone benchmark tool was used to establish the maximum possible I/O performance of Lustre over the WAN between FIU and UF and between Flatech and UF. Here we report on results between FIU and UF only. • Lustre fs at UF-HPC was mounted on local mount point on medianoche.hep.fiu.edu located in Miami • File sizes set to 2XRAM, to avoid cacheing effects • Measurements made as function of record length • Checked in multi-processor mode: 1 thru 8 concurrent processes • Checked with dd read/write rates • All tests consistent with IOzone results shown IO performance of the testbed between FIU and UF. The plot shows sequential and random read/write performance, in Mbytes per second using the IOzone as a function of record length. Conclusion: The Florida State Wide Lustre Testbed demonstrates an alternative method for accessing data stored at dedicated CMS computing facilities. This method has the potential of greatly simplifying access to data sets, large, medium or small, for remote experimenters with limited local computing resources. With large block IO, we can saturate the network link between UF and FIU using the standard IO benchmark tool IOzone