1 / 22

A Managed Object Placement Service (MOPS) using NEST and GridFTP

A Managed Object Placement Service (MOPS) using NEST and GridFTP. Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu. SCIDAC Center for Enabling Distributed Petascale Science (CEDpS) www.cedps.org. Overview. Brief CEDPS overview Focus on data movement

brooks
Download Presentation

A Managed Object Placement Service (MOPS) using NEST and GridFTP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Managed Object Placement Service (MOPS) using NEST and GridFTP Dr. Dan Fraser John Bresnahan, Nick LeRoy, Mike Link, Miron Livny, Raj Kettimuthu SCIDAC Center for Enabling Distributed Petascale Science (CEDpS) www.cedps.org

  2. Overview • Brief CEDPS overview • Focus on data movement • Managed Object Placement Service (MOPS) • Internal resource management (awareness) • GFork capability • External awareness & interaction • NEST (Network Storage Technology)

  3. U Petascale Data Challenge U U Remotedistributed users • DOE facilities generatemanypetabytes of data(2 petabytes = all U. S. academic research libraries!) U U • Remote users (at labs universities, industry) need data! U Massive data U • Rapid, reliable accesskey to maximizingvalue of $B facilities U DOE facilities

  4. Bridging the Divide (1):Move Data to Users When & Where Needed A “Deliver this 100 Terabytes to locations A, B, C by 9am tomorrow” B C • Fast: >10,000x faster thanusual Internet • Reliable: recoverfrom many failures • Predictable: data arrives when scheduled • Secure: protect expensive resources & data • Scalable: deal with manyusers & much data

  5. Bridging the Divide (2):Allow Users to Move ComputationNear Data A “Perform mycomputation F ondatasets X, Y, Z” • Science services:provide analysisfunctions neardata source • Flexible: easyintegration of functions • Secure: protect expensive resources & data • Scalable: deal with manyusers & much data X F Z Y

  6. Bridging the Divide (3):Troubleshoot End-to-EndProblems A “Why did my datatransfer (or remoteoperation) fail?” B C • Identify & diagnose failures & performanceproblems • Instrument: includemonitoring points inall system components • Monitor: collect data inresponse to problems • Diagnose: identify thesource of problems

  7. What is GridFTP • Widely used, open source, productionquality data mover • Separate control and data channels • Parallel streams (~3-5x faster than TCP/IP) • Parallel stripes (multiple servers) • Partial file transfer • Multiple security options (GSI, SSH) • Third party control • Extensible for both file system & protocols

  8. GridFTP Modularity Clients File Systems I/O Data Storage Interfaces (DSI) -POSIX -SRB -HPSS -NEST GridFTP Server -separate control, data -striping Client Interfaces -Globus-URL-Copy -C Library -RFT (3rd party) XIO Drivers -TCP -UDT (UDP) -parallel streams -GSI -SSH

  9. GridFTP Advanced Configurations • GFork (Internal awareness) • Robust unix fork/setuid model • Allows server state to be maintained across connections • Dynamic backends • Stability in the event of backend failure • Growing resource pools for peak demands • Storage/Access Allocation (External awareness) • NEST (Network Storage Technology)

  10. Why is awareness important? • Currently, GridFTP does everything it is asked • If asked, GridFTP in a worst case scenario could: • Use all available memory & buffers on the server • Write until the file system is full • Slow down all the transfers when overloaded (Worst case scenarios do not happen very often) • Many tools designed to work around these limitations • SRM, DCache, … Services should be able to protect both themselves and their environments

  11. Fork GFork (Internal Awareness) Server Host GridFTP Plugin GFork Server Control Channel Connections Client Client Client Inherited Links State Sharing Link GridFTP Server Instance GridFTP Server Instance GridFTP Server Instance

  12. External Awareness:Why storage allocations ? • Users need both temporary storage, and long-term guaranteed storage. • Administrators need a storage solution with configurable limits and policy. • Administrators will benefit from NeST’s autonomous reclamations of expired storage allocations.

  13. NeST Client globus-url-copy GridFTP Server NeST Server External Awareness: GridFTP + NeST (Lot operations, etc.) (File transfers) (GSI-FTP) NeST Callout Negotiator Disk Storage

  14. Overview of NeST • NeST: Network Storage Technology • Lightweight: Configuration and installation can be performed in minutes. • Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP • Chirp is NeST’s internal protocol • Secure: GSI authentication • Allocation: NeST negotiates “mini storage contracts” between users and server.

  15. Storage allocations in NeST • Lot – abstraction for storage allocation with an associated handle • Handle is used for all subsequent operations on this lot • Client requests lot of a specified size and duration. Server accepts or rejects client request.

  16. External Awareness Architecture GridFTP Server ACL Plugin Main Codebase Client NEST DSI Plugin

  17. ACL Plugin • Authorize/Init • Grant access Yes/No • Plugin establishes context (initializes state for future requests) • Create/Modify/Read a file • Given pathname and size • Creates a transaction • Update Transaction • Plug in may timeout waiting • Progessively commit bytes as ‘complete’ • Finished flag

  18. Granting Access GridFTP Server ACL Plugin Main Codebase ID GSI Client Y 230 Enter NEST Allow? DSI Plugin Client connects GSI Handshake Do whatever needed to determine if allowed Notify client of access Now known ID sent to auth plugin

  19. Recieving a File GridFTP Server ACL Plugin RECV file Main Codebase Path/size Client Y 150 Begin NEST Allow? Start transfer Reserve Space DSI Plugin Transaction Complete Update Transaction Receive Bytes 01010101010101010101

  20. Notes • Sending a file • Same interactions as receiving, only simpler (no space reservation) • ACLs can be chained together • Chaining semantics still being worked out

  21. Using NeST • Init • NeST can use the client username/GSI subject to initialize. • Create/modify • Reserve space with a given timeout • Pathname is key to transaction • If expires reservation and uncommitted data is lost • Update • Commit bytes, reset timeout. • Complete • Clean up state

  22. Conclusion • Services Must be able to protect themselves • Awareness of environment (Internal & External) is key • Managed Object Placement Service • Straight-forward technology advancements • Capability greater than sum of parts • Invitation to work together…

More Related