330 likes | 426 Views
Connecting arbitrary data sources to the grid. Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide. Background. Australian Research Collaboration Service A successor of APAC Services HPC Data
E N D
Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University of Adelaide
Background • Australian Research Collaboration Service • A successor of APAC • Services • HPC • Data • Collaboration tools: AccessGrid, EVO, Plone, drupal, Sakai
ARCS Data Fabric (cont.) • A national service • Provided to all Australian researchers • Based on iRODS
The Problem • Interoperability with “The Grid” • “The Grid”: Globus, gLite, condor, etc. • Data sources • GridFTP-compatible: dCache • Non GridFTP-compatible: iRODS, SRB • Possible solutions • “Manual” copy (or do it in PBS script) • Copy queue
The Problem (cont.) • Movement of massive data • Both ends use same software (talks same protocol) • Different systems are used (talks different protocol) • Efficiency • Possible solutions • Transfer via an intermediate point
A solution - old fashioned • AWS Import/Export for Amazon S3 • Ship the hard-disks by courier company
Our Solution - GridFTP • De facto standard • Compatible with the Grid, and many grid clients • Efficiency • Parallel transfer • Data channel reuse • Large file transfer - in small blocks • Compatible with many file transfer services • Monitoring • Scheduling
An overview of GridFTP protocol • Based on FTP with extensions • Third-party transfer • Intermediate point not needed • Security - GSI • Extended block mode • Parallel transfer • Striped transfer • Partial transfer • Reliable and restartable • TCP and UDP
The Architecture GridFTP interface Generic File System Framework Data Source Plugin Data Source
FileSystem creates FileSystemConnection creates FileObject creates RandomAccessFileObject Generic File System Framework
FileSystem interface public String getSeparator(); public void init() throws IOException; public FileSystemConnection createFileSystemConnection(GSSCredential credential) throws FtpConfigException, IOException; public void exit();
FileSystemConnection interface public FileObject getFileObject(String path); public String getHomeDir(); public String getUser(); public void close() throws IOException; public boolean isConnected(); public long getFreeSpace(String path);
FileObject interface public String getName(); public String getPath(); public boolean exists(); public boolean isFile(); public boolean isDirectory(); public int getPermission(); public String getCanonicalPath() throws IOException; public FileObject[] listFiles(); public long length(); public long lastModified(); public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException; public boolean delete(); public FileObject getParent(); public boolean mkdir(); public boolean renameTo(FileObject file); public boolean setLastModified(long t);
RandomAccessFileObject interface public void seek(long offset) throws IOException; public int read() throws IOException; public int read(byte[] b) throws IOException; public int read(byte[] b, int off, int len) throws IOException; public void close() throws IOException; public String readLine() throws IOException; public void write(int b) throws IOException; public void write(byte[] b) throws IOException; public void write(byte[] b, int off, int len) throws IOException; public long length() throws IOException;
GridFTP client Grid job submission system Data transfer service GridFTP interface Griffin Generic file system framework Adaptor for iRODS Adaptor for local file system Other adaptors iRODS Local File System Other data source The Implementation - Griffin
Features • GridFTP protocol version 1 • Java-based • Spring framework • OS-independent • Lightweight, stand-alone, self-contained • No need to install Globus Toolkit • Two plugins included • iRODS plugin • Local file system plugin • Open source (Apache 2 & GPL)
WAN LAN/localhost Client Griffin Data Source Parallel transfer with Griffin
Authentication • GSI • iRODS plugin • User mapping • local file system plugin • XML file • Maps GSI authentication (certificate DN) to internal user management system
Use case • Integration of the Grid and Data Fabric • iRODS plugin for Data Fabric • Third-party transfer to cluster (Globus GridFTP) • Tested with • Globus.org • Globus-url-copy (5.0 and 4.x) • Globus GridFTP GUI
Performance Evaluation • Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory • Client: IBM xSeries 346 with two hyper-threaded Intel Xeon 3.20GHz CPUs, 4GB memory • Network: 1Gbps LAN • WAN: two 10Gbps links • Transfer: 256MB, 512MB, 1GB, 2GB, 4GB, 8GB, 16GB • iCommands • Globus-url-copy
Client globus-url-copy iCommands Griffin Jargon Adaptor iRODS Local File System Evaluation Set up - Griffin vs iCommands
Client globus-url-copy Griffin Globus GridFTP server Local FS Adaptor Local File System Evaluation Set up - Griffin vs Globus GridFTP
Related work • Client library • SAGA/jSAGA • Commons-vfs • Data transfer service • Stork • PAFTP • Globus • XIO • DSI
Conclusion • A generic solution to connect arbitrary data sources to the grid • Data in/out of the grid • Data transfer between different data sources • Java-based implementation • Standalone, lightweight • Plugable • Not depend on Globus
Future work • Currently working on a plugin for MongoDB • Java NIO • UDP • Striped transfer
MongoDB plugin • MongoDB • NOSQL database • Stores JSON-style documents • GridFS component • Stores files • Plugin for griffin • Read/write files via GridFS
Acknowledgements • ARCS funded
Current Status • ARCS production service • Used to transfer data in/out of ARCS Data Fabric • Website • https://projects.arcs.org.au/trac/griffin
Thank you! Questions/Comments?