230 likes | 327 Views
WP2: Data Management. Gavin McCance RAL Middleware Workshop 24 February 2003. Outline. WP2 Tasks Review of TB1 Components Changes and review of current components Plans for final year. WP2 Tasks. Replication Services Keep track of all the files and their copies
E N D
WP2: Data Management Gavin McCance RAL Middleware Workshop 24 February 2003
Outline • WP2 Tasks • Review of TB1 Components • Changes and review of current components • Plans for final year
WP2 Tasks • Replication Services • Keep track of all the files and their copies • Copy them about (to order and automatically) • Optimization of replication • Give me the ‘best replica’ for my job • Simulate the grid to tune the algorithms needed for this • Meta-data • Where will the replication stuff keep its meta-data • Where will the applications keep their meta-data • Security • Authenticate with grid certificates • Authorize users appropriately (better than just a grid-mapfile)
TB1 Replication: Replica Catalogue • Edg-replica-catalogue • The repackaging of the much-loved Globus replica catalogue • Based on LDAP • LFN -> PFN (1:many) • One logical file name mapping to many physical instances of the file • With appropriate utility functions, applications might never need to know the PFN. Use the LFN, and the middleware does the mapping for you in the background.
TB1 Replication: Copying to order • Edg-replica-manager • Initially, it was a repackaging of Globus replica manager • Rewritten for TB1+ with better client interfaces • Both command-line and C++ • copyAndRegister: ‘brings your new file to the grid’ • replicateFile: makes a new replica of a file
TB1 Replication: Copying ~automatically • GDMP: Grid Data Mirroring Package • Born in CMS • Implements subscription-based replication Furious Monte Carlo generation “Subscribe me!” Lots of New files Notify: “I’ve got some new files!” “Send me them” GridFTP of new files Site A Site B New replicas at site B Replica catalogue
Replication Optimization • Most research-oriented task • Early TB1 getBestFile absent • RB matches LFNs against local storage elements • Jobs only go where their data already is • No clever data movement • OptorSim developed to test replica optimization ideas • Data-centric grid simulation • Simulates job times as function of replication mechanism and job data access patterns • UK JANet and EU GEANT network modelled
Fill in the web-page form Netscape Web browser The result comes back to client Spitfire Browser Meta-data storage • Spitfire meta-data storage • Two faces • Spitfire browser • Spitfire client API • Spitfire-browser allows a client to use web-browser to view the results of canned queries from a database or make canned inserts into the database. • Client uses their grid cert embedded in their web browser to authenticate (and then authorize) to the service. DB
Meta-data storage 2 • Spitfire client API • Imagine where you would use ODBC / JDBC in an application • To do something with a database from inside your application • That’s where you use this API, except… • Accesses DB over WAN • Grid security (both authentication and authz) • You shouldn’t have to know what the DB backend is • NB. The API is not the same as ODBC!
Security • WP2 task feeding into EDG security group • Server side: • Mostly JAVA • Proper certificate trust-manager for java server applications (special plug-in for Tomcat) • Flexible authorization manager to define whatever authz policies you like upon the server. • Client side • Proper JAVA trust-manager for certificate checking • Web services GSI-enabled for Java and C++
Changes: Web services • Most software has been redesigned to use web services • Much of the server-side stuff now written in Java • Retain security: GSI-enabled web-services • Services have been modified to expose an API in WSDL • For client programming, the client API libraries are auto-generated from the WSDL • For command-line, the tools are still there, but now talk to the server using web services. • What the applications user sees should not have changed as a result of adopting web services!
Changes: Replica Catalogue to RLS • edg-replica-catalogue being phased out • For Replica Location Service (RLS) • collaboration with Globus • Local Replica Catalogs (LRCs) on the SEs hold the actual GUID -> PFN mappings [GUID is what used to be LFN] • Replica Location Indices (RLIs) redirect inquiries to LRCs actually having the file • LRCs are configured to send index updates to any number of RLIs • Much more scalable architecture • The lookup time for an entry is independent of the number of catalogs. Tested for up to 108 entries. • The catalog withstands simultaneous user queries of over 1000 queries or inserts per second.
Changes: unified interface for replication • Many services, some the same, some new, with a bewildering array of acronyms… • All these services have their own APIs, and are individually accessible on the grid. • From applications point of view, it’s more appropriate is the have a single client facing interface (both programming and command line) that you can use to talk to all these services. • Simpler… you only need to read one document ;-) • Allows this single client to take care of transactional issues • This is the new EDG Replica Manager (ERM) for TB2
TB2 Replica Manager Componentsand name changes… ERM: EDG Replica Manager client interface and API • Entry point for all clients ROS: Replication Optimization Service • Replica selection based on network metrics (WP7) RSH: Replication Storage Handler (what was GDMP) • Subscription-based replication RLS: Replica Location Service (replacing replica catalogue) • Local Replica Catalog services LRC: Logical to Physical file mappings • Replica Location Index services RLI: index on Logical names RMC:Replication Metadata Catalogue • Similar to Spitfire with RDBMS backend and specialized schema NEW !
ERM ROS GridFTP RMC RLS RSH TB2 Components TB2: Replica Management Services “Reptor” Replica Management Services Client Optimization File Transfer Replica Metadata Replica Location Subscription
Replica Location Service Replica Meta-data Catalogue LFNs, PFNs, GUIDs • Due to application requirement from LCG, a couple of changes: LFN1 PFN1, Glasgow GUID PFN2, CERN LFN2 1223423-ASSDF4-11223-35465464 PFN3, Lyon LFN3
TB2: RMC and Spitfire • Replication meta-data considered sufficiently ‘specialized’ and vital to the replica management service that it has been split off from Spitfire • Now called Replica Metadata Catalogue (RMC) • Resolves LFNs to GUIDs • Underlying technology is identical to Spitfire • Exposed API is different • More tailored for specific things you’d like to do with replication meta-data. • Application specific section for application meta-data that is keyed on LFNs or GUIDs. • Spitfire is still available for other meta-data • e.g. storing calibration constants, etc.
Replica Optimization Service: ROS • Provides getAccessCosts( LFN[] , CE[] , … ) method to RB • Allows RB to take into account the distribution of a job’s files when deciding where to run it • Provides listBestFile ( LFN , toSE )[in the ERM interface] • Uses networking bandwidth + storage cost measurements (WP7 and WP5) to determine the best replica to get. • Provides getBestFile ( LFN , toSE, … )[in the ERM interface] • The same, except it actually does this replication, if needed. • For TB2, simple replication algorithms will be deployed initially. • More adventurous ones can be added without impacting the interface, since the replication algorithm is internal to the RMS
Current OptorSim status • OptorSim used to simulate possible algorithms for ROS • Simulation now includes sampled network background (UCL) • Live network simulator GUI • Or offline in a compute farm to get useful results..!
Current OptorSim results • Initial results from simulation show that including network background increases job times by ~10%. • Further study underway… - 6 experiments, 22 sites - predicted available CPUs & storage - realistic file sizes (1GB) and dataset sizes (1TB) - realistic number of jobs (~60 users) - inclusion of background network traffic • Study of different replication algorithms and access patterns • Data access pattern has a large effect • Further study here • Economic models do well for sequential data access
Plans for final year: Meta-data • RMC is now ~fixed in functionailty • Spitfire will evolve a bit more • To allow authorized users to hot-deploy their own interfaces onto the service to do something useful. • e.g. you as an analysis-group hardware person can ‘invent’ an method call (an interface) to extract some data from an obscure calibration constant table. • Spitfire (which sits in front of DB containing these tables) will then expose your newly invented interface so that people can use it by standard web-services remote procedure call • And web-services will write the client stub for you automatically… • Keep working on OGSA (and GGF DAIS standard)
Plans for final year: RMS • RMS architecture is now defined • Consolidate and concentrate on quality • Few new features • Support LCG • Software was developed alongside LCG requirements • Work will continue on improving the algorithms used internally by the ROS (replica optimization) • Work towards EGEE…