230 likes | 349 Views
Replica Management Services in the European DataGrid Project. Work Package 2 European DataGrid. Outline. The need for the European DataGrid and replica mgt. Overview of replica management services Performance evaluation of services Future work – replica management in EGEE Conclusion.
E N D
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid
Outline • The need for the European DataGrid and replica mgt. • Overview of replica management services • Performance evaluation of services • Future work – replica management in EGEE • Conclusion
Why do we need a Grid? 100s MB/s data output -> several PB of data per year. Distributed Grid computing… Equivalent to 2 million CDs of data/year needing 20,000 PCs per exp to analyse.
The European DataGrid • Ran from Jan 2001 – March 2004 • Aim: to develop a Grid infrastructure for data-intensive scientific applications • High energy physics, biology and Earth observation producing several PB of data per year • Developed Grid middleware for job, data and fabric management, information and monitoring
Grid Architecture Scope of EDG middleware Scope of EDG-WP2
Data Management • Requirements: • Enable secure access to massive amounts of data in a global name space • Move and replicate data at high speed from one geographical site to another • 1st generation: GDMP + edg-replica-manager • Used Globus for secure file transfer • C++ based – gave basic replication functionality and cataloging
Data Management • 2nd generation – uses web services • Easy and standardised way to connect distributed services via XML • Services include • Replica Manager Client • main user interface • Replica Location Service • stores physical locations of replicas • Replica Metadata Catalog • stores logical file name mappings and metadata attributes • Replica Optimization Service • provides optimised access to replicas • Security • HTTPS + Globus’ GSI
Replica Location Service • Implementation of RLS framework co-developed with Globus • Maps unique identifier (GUID) to multiple replicas (SURLs) • Local catalog (LRC) with distributed index (RLI) RLI RLI RLI GUID:LRCs soft-state update GUID:SURL LRC LRC LRC LRC
Replica Metadata Catalog • GUIDs are unfriendly and non-intuitive • guid:131f9940-f501-11d8-9669-0800200c9a66 • Use user-definable Logical File Names • lfn:cal-test-data-2004-09-01-005a • RMC stores LFN:GUID mappings (n:1) • Can also store ~10 metadata attributes • eg file owner, file size • Together with RLS gives complete LFN:GUID:SURL view RLS RMC LFN SURL LFN GUID SURL LFN SURL
Replica Optimization Service • Gives optimised access to replicas by choosing replicas with quickest access (based on network measurements) • Automatically replicates files to sites on which they are needed Simulation research (OptorSim) continues to investigate more complex replica management strategies
Replica Manager • Client-side tool acts as user interface to services (although services can also be accessed directly) • Coordinates service interactions • Interfaces with external services • information service (MDS, R-GMA) • storage services (SRM, EDG-SE) • file transfer services (GridFTP)
Implementation • Servers written in Java, clients auto-generated (Java, c++ etc) from WSDL • Web services run on Apache Axis inside Java servlet engine (Tomcat/Oracle AS) • Use MySQL/Oracle as back-end DB to store persistent information • RLS used already in production for LCG (Oracle AS/DB) • CMS Data Challenge 04 – 2 million entries stored
Service Interactions User Interface 2. getGuid(LFN) 1. replicateFile(LFN, SE2) Replica Metadata Catalog 3. listReplicas(GUID) Replica Manager Replica Location Service 4. listBestFile(SURLs, SE2) 5. copyFile(SE1, SE2) Replica Optimization Service Storage Element 1 Storage Element 2 “Make a replica of the file specified by LFN to SE2” 6. registerFile(GUID, SURL)
RLS performance • In production use, only single LRC used so far • Test performance using Java and c++ API to insert and query GUID:SURL mappings c++ query Java vs c++ insert • Excellent query performance, c++ more stable than Java
RLS performance Using Java API and multiple concurrent threads Insert 500,000 mappings 5 insert and 5 query threads • Throughput peak ~20 threads, again stable query performance
Security • Security adds significant overheads! • Problem caused by new connection for each transaction • Could be reduced by using bulk operations
RMC performance • Test multiple LFNs per GUID and multiple metadata attributes c++ query Java insert • Scales well with no. of LFNs per GUID and no. of attributes
RMC Performance • Command Line Interface: edg-rmc addAlias • Very slow compared to API calls (2 orders of mag slower) • Recommended for testing an installation only
The Future of EDG Services • EGEE - building production quality Grids • Lessons learned from EDG: • Less is more: stability and usability most important • User interface and documentation difficult to get • first time • Need easy integration of different providers • G-Lite - middleware (re)engineering and integration • using many concepts/experience from EDG • but geared towards service-oriented architecture
EGEE Data Mgt Services • Replica Manager -> Data Scheduler + Transfer Fetcher + File Placement Service + File Transfer Service From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1
EGEE Data Mgt Services • RLS + RMC -> Combined Catalog Interface to: File Catalog + Replica Catalog (+ Metadata Catalog) From EGEE Middleware Architecture and Planning (Release 1.0) DJRA1.1
Conclusion • EDG WP2 has developed a set of integrated replica management services • Can cope with demanding Grid conditions • already used in production environment • A lot of concepts now being taken forward into EGEE project