240 likes | 410 Views
APAN Workshop on Exploring eScience Aug 26, 2005 Taipei, Taiwan. Gfarm Grid File System for Distributed and Parallel Data Computing. Osamu Tatebe o.tatebe@aist.go.jp Grid Technology Research Center, AIST. [Background] Petascale Data Intensive Computing. High Energy Physics
E N D
APAN Workshop on Exploring eScience Aug 26, 2005 Taipei, Taiwan Gfarm Grid File System for Distributed and Parallel Data Computing Osamu Tatebe o.tatebe@aist.go.jp Grid Technology Research Center, AIST
[Background] Petascale Data Intensive Computing • High Energy Physics • CERN LHC, KEK-B Belle • ~MB/collision, 100 collisions/sec • ~PB/year • 2000 physicists, 35 countries Detector forLHCb experiment Detector for ALICE experiment • Astronomical Data Analysis • data analysis of the whole data • TB~PB/year/telescope • Subaru telescope • 10 GB/night, 3 TB/year
Petascale Data-intensive ComputingRequirements • Peta/Exabyte scale files, millions of millions of files • Scalable computational power • > 1TFLOPS, hopefully > 10TFLOPS • Scalable parallel I/O throughput • > 100GB/s, hopefully > 1TB/s within a system and between systems • Efficiently global sharing with group-oriented authentication and access control • Fault Tolerance / Dynamic re-configuration • Resource Management and Scheduling • System monitoring and administration • Global Computing Environment
Goal and feature of Grid Datafarm • Goal • Dependable data sharing among multiple organizations • High-speed data access, High-performance data computing • Grid Datafarm • Gfarm Grid File System– Global dependable virtual file system • Federates scratch disks in PCs • Parallel & distributed data computing • Associates Computational Grid with Data Grid • Features • Secured based on Grid Security Infrastructure • Scalable depending on data size and usage scenarios • Data location transparent data access • Automatic and transparent replica selection for fault tolerance • High-performance data access and computing by accessing multiple dispersed storages in parallel (file affinity scheduling)
/gfarm ggf jp file1 file2 aist gtrc file2 file1 file3 file4 Gfarm file system (1) • Virtual file system that federates local disks of cluster nodes or Grid nodes • Enables transparent access using Global namespace to dispersed file data in a Grid • Supports fault tolerance and avoid access concentration by automatic and transparent replica selection • It can be shared among all cluster nodes and clients Global namespace mapping File replica creation Gfarm File System
Gfarm file system (2) • A file can be shared among all nodes and clients • Physically, it may be replicated and stored on any file system node • Applications can access it regardless of its location • In cluster environment, shared secret key is used for authentication Client PC /gfarm Gfarm file system metadata File A File A Note PC File B File C File C File A File B File B … File C
Grid-wide configuration • Grid-wide file system by integrating local disks in several areas • GSI authentication • It can be shared among all cluster nodes and clients • GridFTP and samba servers in each site Gfarm Grid file system /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm /gfarm Japan Singapore US
Feature of Gfarm file system • A file can be stored on any file system (compute) node (Distributed file system) • A file can be replicated and stored on different nodes (Fault tolerant, access concentration tolerant) • When there is a file replica on a compute node, it can be accessed without overhead (High performance, scalable I/O)
More Scalable I/O Performance User’s view Physical execution view in Gfarm (file-affinity scheduling) User A submits that accesses is executed on a node that has File A File A Job A Job A User B submits that accesses is executed on a node that has File B File B Job B Job B network Cluster, Grid CPU CPU CPU CPU Gfarm file system File system nodes = compute nodes Shared network file system Do not separate storage and CPU (SAN not necessary) Move and execute program instead of moving large-scale data Scalable file I/O by exploiting local I/O
GfarmTM Data Grid middleware • Open source development • GfarmTM version 1.1.1 released on May 17th, 2005 (http://datafarm.apgrid.org/) • Read-write mode support, more support for existing binary applications, metadata cache server • A shared file system in a cluster or a grid • Accessibility from legacy applications without any modification • Standard protocol support by scp, GridFTP server, samba server, . . . Metadata server • Existing applications can accessGfarm file system without any modification using LD_PRELOADof syscall hooking library or GfarmFS-FUSE application gfmd slapd Gfarm client library CPU CPU CPU CPU gfsd gfsd gfsd gfsd . . . Compute and file system nodes
GfarmTM Data Grid middleware (2) • libgfarm – Gfarm client library • Gfarm API • gfmd, slapd – Metadata server • Namespace, replica catalog, host information, process information • gfsd – I/O server • Remote file access Metadata server application File, host information gfmd slapd Gfarm client library Remote file access CPU CPU CPU CPU gfsd gfsd gfsd gfsd . . . Compute and file system nodes
Access from legacy applications • libgfs_hook.so – system call hooking library • It emulates to mount Gfarm file system at /gfarm hooking open(2), read(2), write(2), … • When it accesses under /gfarm, call appropriate Gfarm API • Otherwise, call ordinal system call • Re-link not necessary by specifying LD_PRELOAD • Linux, FreeBSD, NetBSD, … • Higher portability than developing kernel module • Mounting Gfarm file system • GfarmFS-FUSE enables to mount Gfarm file system using FUSE mechanism in Linux • released on Jul 12, 2005 • Need to develop a kernel module for other OSs • Need volunteers
Gfarm – Application and performance result http://datafarm.apgrid.org/
Scientific Application (1) • ATLAS Data Production • Distribution kit (binary) • Atlfast – fast simulation • Input data stored in Gfarmfile system not NFS • G4sim – full simulation (Collaboration with ICEPP, KEK) • Belle Monte-Carlo/Data Production • Online data processing • Distributed data processing • Realtime histgram display • 10 M events generated in a few daysusing a 50-node PC cluster (Collaboration with KEK, U-Tokyo)
Scientific Application (2) • Astronomical Object Survey • Data analysis on the wholearchive • 652 GBytes data observed by SUBARU telescope • Large configuration data from Lattice QCD • Three sets of hundreds of gluon field configurations on a 24^3*48 4-D space-time lattice(3 sets x 364.5 MB x 800 = 854.3 GB) • Generated by the CP-PACS parallel computer atCenter for Computational Physics, Univ. of Tsukuba (300Gflops x years of CPU time)
Performance result of parallel grep • 25 GBytes text file • Xeon 2.8GHz/512KB, 2GB memory NFS 340 sec (sequential grep) Gfarm 15 sec (16 fs nodes, 16 parallel processes) 22.6 times superlinear speed up Compute node Compute node Compute node Compute node . . . NFS Gfarm file system *Gfarm file system consists oflocal disks of compute nodes
GridFTP data transfer performance Client Client Client Client Client Client Client Client ftpd Local disk vs Gfarm (1~2 nodes) Two GridFTP servers can provide almost peak performance (1 Gbps)
Gaussian 03 in Gfarm • Ab initio quantum chemistry Package • Install once and run everywhere • No modification required to access Gfarm • Test415 (IO intensive test input) • 1h 54min 33sec (NFS) • 1h 0min 51sec (Gfarm) • Parallel analysis of all 666 test inputs using 47 nodes • Write error! (NFS) • Due to heavy IO load • 17h 31m 02s (Gfarm) • Quite good scalability of IO performance • Elapsed time can be reduced by re-ordering test inputs Compute node NFS vs Gfarm Compute node Compute node Compute node . . . NFS vs Gfarm *Gfarm consists of local disks of compute nodes
Bioinformatics in Gfarm • iGAP (Integrative Genome Annotation Pipeline) • A suite of bioinformatics software for protein structural and functional annotation • More than 140 complete or partial proteomes analyzed • iGAP on Gfarm • Install once and run everywhere using Gfarm’s high performance file replication and transfer • no modifications required to use distributed compute and storage resource Burkholderia mallei (Bacteria) Gfarm makes it possible to use iGAP to analyzethe complete proteome (available 9/28/04)of the bacteria Burkholderia mallei,a known biothreat agent, on distributed resources.This is a collaboration under PRAGMA andthe data is available through http://eol.sdsc.edu. Participating sites: SDSC/UCSD (US), BII (Singapore), Osaka Univ, AIST (Japan), Konkuk Univ, Kookmin Univ, KISTI (Korea)
Protein sequences structure info sequence info Prediction of : signal peptides (SignalP, PSORT) transmembrane (TMHMM, PSORT) coiled coils (COILS) low complexity regions (SEG) NR, PFAM SCOP, PDB Step 1 Building FOLDLIB: PDB chains SCOP domains PDP domains CE matches PDB vs. SCOP 90% sequence non-identical minimum size 25 aa coverage (90%, gaps <30, ends<30) Structural assignment of domains by WU-BLAST Step 2 Structural assignment of domains by PSI-BLAST profiles on FOLDLIB Step 3 Structural assignment of domains by 123D on FOLDLIB Step 4 Functional assignment by PFAM, NR assignments FOLDLIB Step 5 Domain location prediction by sequence Step 6 Data Warehouse
Preliminary performance result • Multiple cluster data analysis NFS 4-node cluster A 30.07 min Gfarm 4-node cluster A + 4-node cluster B 17.39 min
Development Status and Future Plan • Gfarm – Grid file system • Global virtual file system • A dependable network shared file system in a cluster or a grid • High performance data computing support • Associates Computational Grid with Data Grid • Gfarm Grid software • Version 1.1.1 released on May 17, 2005 (http://datafarm.apgrid.org/) • Version 1.2 available real soon now • Existing programs can access Gfarm file system using syscall hooking library or GfarmFS-FUSE • Distribute analysis shows scalable I/O performance • iGAP/Gfarm – bioinformatics package • Gaussian 03 – Ab initio quantum chemistry package • Standardization effort with GGF Grid File System WG (GFS-WG) https://datafarm.apgrid.org/