170 likes | 250 Views
Objective. What is RFT ? How does it work Architecture of RFT RFT and OGSA Issues Demo Questions. What is Reliability. The ability of a system or component to perform its required functions under stated conditions for a specified period of time. (IEEE)
E N D
Objective • What is RFT ? • How does it work • Architecture of RFT • RFT and OGSA • Issues • Demo • Questions
What is Reliability • The ability of a system or component to perform its required functions under stated conditions for a specified period of time. (IEEE) • What is Reliability in the context of File Transfer (What is the scope of the problem ) • How much of it we want to address ? • Reliability can mean different things to different people • Hash out something that is most general and acceptable to wide range of applications.
Our Goal • To design and implement a Service that allows byte streams to be transferred in a Reliable manner • Reliability,in our context, means that problems of less than a certain,user defined magnitude are dealt with automatically. • Build prototypes using different technologies. • Java • Web Services • Etc..
Our Goal (cont..) • A non user based service • GridFTP already provides restart markers for recovery but however the client needs to be active. • Loss of client requires a manual restart from scratch • Store transfer state persistently • Recover from a set of Failure conditions reliably
Failure Conditions • List of Failure conditions we want to address • Network Failures like dropped connections • Machine crashes • Temporary Network outages • Failure of File Systems • Etc…
Interface • submitTransfer() • Set of URLs • File size for partial file transfers • getStatus() • cancelTransfer() • resumeTransfer()
OGSA and RFT • How does RFT fit in OGSA? • Things that are different from SC Demo • RFT is a Web Service • Single Transfer Reliability • Service Definition in WSDL • Talks XML over SOAP just like any standard Web Service
RFT Web Service Interface • submitTransferJob() • Intput message:fromURL and toURL (strings) • Output message: transferJobID (integer) • commitTransferJob() • Input message: transferJobID • Output message: transferJobID • getStatus() • Input message: transferJobID • Output message: status (integer) • getStatistics() • Input message: transferJobId • Output message: Statistics ( complex type)
Our Experience • SC2001 Demo • List of Tests • Longest – 3 days transferring 0.3 Terabytes of data from ANL to NERSC • Failures recovered from • NFS Failures • Network outages • Server crashes
Issues • Language • Prototypes are in Java • Language issue should not matter since it is a service whose interface is a socket • Persistence Mechanism • We used PostGreSQL as database to store the transfer state • Can we use File based persistence mechanism • Scalability • Multiple instances of RFT which may appear as a single logical entity • Request redirection ?
Issues (cont..) • XRM Functionality • Reservations for disk and bandwidth • Higher Level Services • Interaction between RFT and Higher level Services like Reliable Replication Service • Scheduler ?? • Services like NWS that can give performance estimates • ComputeJob Submission • File transfer as a Job ?
Issues(cont..) • Security • CAS • Proxy renewal • Error Propagation
DEMO • SC2001 Demo • OGSA and RFT
More Info • http://www.mcs.anl.gov/~madduri/RFT.html