280 likes | 458 Views
FAST: Flexible Automated Synchronization Transfer. Rosa Filgueira – University of Edinburgh Iraklis Klamapnos - University of Edinburgh Yusuke Tanimura - AIST, Tsukuba Malcolm Atkinson- University of Edinburgh. Index. Introduction Problem description Hypothesis
E N D
FAST: Flexible Automated Synchronization Transfer Rosa Filgueira – University of Edinburgh IraklisKlamapnos- University of Edinburgh Yusuke Tanimura- AIST, Tsukuba Malcolm Atkinson- University of Edinburgh
Index • Introduction • Problem description • Hypothesis • Rock Physics laboratory experiments • Objective • Proposal • Related developments • Data transfer protocols • Data transport systems • FAST • Selecting the best data transfer protocol • Data transfer experiments • Implementation and evaluation • Future work and Questions
Problem description • Large number of rock physics (RP) laboratories • Runs many experiments (Experimentalists) • Large number of rock physicists • Develops computational codes (Code builders) • Sharing experimental data among this community is still in its early days • No facilities to transfer experimental data automatically in real time with their associated description (metadata)
Problem description • Several tools for providing reliable and high performance data transfer capabilities • Dropboxor Globus Online • Not optimized for the RP requirements
Hypothesis • The RP community will benefit from tool • Transfers data and metadata in near-real time • Repository and DB accessible from a website • For experimentalists • Collection and comparison of experiments from many labs • For code builders • Find test data for running their models
Laboratory experiments features-I • Laboratory rock property measurements • Properties of the rock sample are studied under different conditions • High-pressure vessels to apply pore pressures and stresses to cylindrical rock sample • Until the sample has failed, different features (e.g stress, porosity, temperature, etc, ....) are recorded at several time intervals • In each interval, data transferred to a local computer machine (channel. 1 channel per rock)
RP laboratory experiment Pressure Vessel UCL- RP Laboratory Rock Samples
Complex laboratory experiment-Creep 2 Initial target: 30 months Deploy under the sea- Mediterranean 8 rock samples- different features Different interval of times and data sizes
Laboratory experiments features-II • Each experiment can record data differently • Events can be written in a new file or appended • Files can be stored in the same directory or not • Intervals for writing data can be shorts or long • Number of rocks samples could be one or several • Duration of an experiments can be short or long • Data intensive problem for transferring the data
Objective • To transfer RP experimental data from one location to another • Automated data transfer until the end-experiment • Transfer experimental data • Near real time and non-real time • Synchronization • Incremental (File) and Directory • Possible interruptions and fails • Record and transfer the metadata
Proposal • FAST: Flexible automated synchronization transfer • Data and metadata in real timeand non-real time • Incremental (file) and directory sync • Selection of the data-transfer protocol • Compatible with all O.S • Simple to set up and manage • Monitors the transmission, detects errors and recovers from them. • Data collected in a repository, metadata in DB, and web site for accessing them • Proposal is triggered by our work • EFFORT project • Using data provided by the Creep-2 project
Data transfer protocols- TCP • File transfer Protocol (FTP) • Control and data are un-encrypted • Easy to use, lack of security • FTP security extension (FTPS) • Control encrypted (TLS or STLS), but data might not be • Secure Copy (SCP) • SSH for transferring data and authentication (more secure than previous ones) • File transfer only • Ideal for quick transfer of single files • SSH File Transfer Protocol (SFTP) • Based in SSH-2: best for secure access (packet confirmation) • File transfer, creating and delete remote directories and files • Directory synchronization, • Rsync • Incremental file transfer (delta algorithm) • File and directory synchronization • Can provide encrypted transfer by using SSH • On-the-fly compression option • Idea for back-ups
Data transfer protocols- UDP • UDP-(UDT) • UDP protocol for data-intensive applications • UDT can transfer data a higher speed than TCP-based protocols • UDT Enabled Rsync (UDR) • Uses Rsync for the transport mechanism (delta) • Sends data over the UDT protocolIdeal for large data over long distance • Ideal for large data over long distance
Data transport systems • GridFTP: • HP secure, reliable data rate via high bandwidth • many-to-many • difficult to use • GlobusOnline • Uses GridFTP protocol • Automates the management of files: • monitoring performance, retrying files, recovering from failes • Do not support file synchronization. • Dropbox: • Centralize cloud storage, file and directory synchronization • Rsync-delta protocol • Data stored on the Amazon S3 (Third party) • One-to-one file transfer • BTSync • Decentralized cloud storage, P2P file synchronization (No Third party). • Connecting the devices to communicate with UDP • Many-to-many file transfers • WinSCP • SFTP and FTP client for Windows
Data transport systems Email from Globus Online Support We recently noticed that you are creating many CLI sessions tocli.globusonline.org, each with a single blocking transfer. This is asuboptimal way to use Globus Online and in fact is causing us someresource usage issues.
Data transport systems • Previous tools • Different data-transfer protocols • Some automated data synchronization • No one • Select the best protocol depending on requirements • Methods for tracking metadata and transferring it • Our work automatically • Selects a protocol among FTPS, SFTP, Rsync, and UDR • Injects a minimum of metadata • GridFTP and P2P discarded: communications 1-to-1 • FTPS instead of using FTP: minimum security level • SFTP derives from SCP
Selecting the best protocol FTPS, SFTP, Rsync and UDR
Data transfer experiments- Same local network • Two machines located in Edinburgh • VLAN Network 100MB/s • Synthetic program to generate events • Data size written to files: 50KB, 500KB, 1MB, 10MB, 100MB, 500MB, 1GB and 10GB. • Measures: transfer rate and elapsed time • Repetition: 10 times
Data transfer experiments-Same local network SFTP fastest < 500MB Rsync fastest >= 500MB ** without compression
Data transfer experiments- Different networks • UDR has been specially designed • Large data transfer over long distance • UDR vsRsync by using two machines • Located in different local networks • University of Edinburgh 1GbE • AIST-Tsukuba 10GbE • Generated Files: 1MB, 500MB, 1GB, 10GB and 30GB.
Data transfer experiments- Different networks UDR fastest ** without compression
Implementation and evaluation • Front-end: GUI using Java SWING • Back-end: Decision tree • Data and Metadata • Data stored in a remote repository (NAS) • Metadata collected in remote database (MySQL) • Science gateway (Web tool) connected with the repository and database • Searching • Visualizing • Analyzing • Download
Implementation and evaluation • FAST has been evaluated: • By using synthetic programs for generating data • real time and non-real time • For each type of synchronization • Different data sizes, and different types of network locations • Short and Long term experiments • Stop and restart • For transferring data from a real rock physic experiment • Laboratory- UCL (London) and Edinburgh • Days: 45 days • Interval: Every minute • Rock Samples: 1
Future work • Use FAST in the Creep-2 experiment • Implement FAST policies • Data available in the repository for specific users during a reasonable period • Sharing data from many-to-many locations • Decision-tree • Automating generation and maintenance • Keep up-to-date the by measuring transfers • Use FAST in more rock physics laboratories • Use FAST in other disciplines
Thanks & Questions • email: rosa.filgueira@ed.ac.uk