150 likes | 280 Views
Don Quijote. Data Management for the ATLAS Automatic Production System CHEP 2004. Miguel Branco – CERN PH-ATC miguel.branco@cern.ch. Overview. Introduction Architecture End-user tools and APIs Future Plans, Conclusion and Additional Information. ATLAS Data Challenges.
E N D
Don Quijote Data Management for the ATLAS Automatic Production System CHEP 2004 Miguel Branco – CERN PH-ATC miguel.branco@cern.ch
Overview • Introduction • Architecture • End-user tools and APIs • Future Plans, Conclusion and Additional Information Don Quijote - CHEP 2004
ATLAS Data Challenges • ATLAS decided to undertake a series of Data Challenges in order to validate its Computing Model, its software, its data model • Started summer 2004: • ATLAS DC-2 • Introduced the new ATLAS Automatic Production System: • Unsupervised production across many sites spread over three different Grids (US Grid3, NorduGrid, LCG-2) • 3 major components: • Windmill – ATLAS Production Supervisor • Job Executors – one executor per “grid-flavor” • Common Data Management system • The decision was taken to implement a single data management system capable of accessing all ATLAS Data Challenges data Don Quijote - CHEP 2004
Don Quijote • Don Quijote (DQ) is a high-level interface for grid data management for the ATLAS Automatic Production System • Allow transparent registration and movement of replicas between all grid “flavors” used by ATLAS • US Grid3, NorduGrid and LCG-2 • Avoid creating yet another replica and metadata catalog • Use existing catalogs and data management tools • Find common features between tools and catalogs • “Bridge” them and provide a unified interface • Accessible as a Service • lightweight clients Don Quijote - CHEP 2004
Overview • Introduction • Architecture • End-user tools and APIs • Future Plans, Conclusion and Additional Information Don Quijote - CHEP 2004
LCG-2 NorduGrid US Grid3 LCG RLS Globus RLS 2.x Globus RLS 2.x Architecture Client Servers • One per “Grid” • GSI-enabled version and insecure version (with service certificate) • Multiple configuration settings Client • C++ client API • User interface tools in Python • Configuration file indicating endpoint of each server Don Quijote - CHEP 2004
3rd party-transfer castorgrid.ific.uv.es Source Storage from NG Whomever owns castorgrid.ific.uv.es please copy a file from this Transport URL and register the replica in the replica catalog maintaining these metadata attributes. Ok. Taking care of it. Will let you know when it’s done. Who has replicas of the LFN? These are my replicas Ok. Stage this one and return me a GridFTP Transport URL Here is the TURL DQ Client Architecture DQ-LCG server DQ-Grid3 server DQ-NG server Who has replicas of the LFN? Replicate this LFN to castorgrid.ific.uv.es Don Quijote - CHEP 2004
DQ modules • Current structure: DqCore C++ Client Module DqPoolRls DqGlobusRls dq.py Python Module C++Python wrapper DqLcgReplicaAccess DqClassicReplicaAccess DqLcgInfoService DqVdtInfoService DqNgInfoService dms.py Production User Interface DqFactory dms2.py End-user Client tool DqConfigFile DqInterface DqMonitor DqUI DqServerLcg DqServerNg DqServerVdt Don Quijote - CHEP 2004
Overview • Introduction • Architecture • End-user tools and APIs • Future Plans, Conclusion and Additional Information Don Quijote - CHEP 2004
Functionalities provided by API • What can be done using client API or command-line tools? • Search for replicas of logical files as well as metadata attributes • List storage locations • Replicate files between storage locations • Get a locally accessible physical file from a grid-storage • Put a file into a grid storage • Validate a file – md5 checksum, file size • Subject to security: • Renaming logical files • Removing logical files and physical replicas • All actions above can be executed within or across different grids Don Quijote - CHEP 2004
End-user tools • Provide a single tool for end-users to manage data files • Integrates all tools that users would have to know about into a single one: • POOL, EDG, Globus, Castor, …Act as a Replica Manager • Although being “POOL-aware”, there is nothing ATLAS or HEP-specific • Eases security requirements for end-users • Temporarily and for some requests only! Don Quijote - CHEP 2004
Overview • Introduction • Architecture • End-user tools and APIs • Future Plans, Conclusion and Additional Information Don Quijote - CHEP 2004
Future plans • Decouple DQ modules into full Service Oriented Architecture • Outsource module implementations • Monitoring of Server requests • Most commonly accessed files/partitions/datasets, … • Reliable File Transfer service (Tier0 exercise) • Working on Documentation • Twiki-based • Interface to EGEE/gLite from ARDA project • Prototype being developed by Frederik Orellana • Future? No plans for major rewrite, only refactoring • Most important is to maintain the same interface for end-users and for the production system Don Quijote - CHEP 2004
Conclusion • Don Quijote is becoming the default grid data file access layer for ATLAS • “New catalogs are coming from grid projects; we should stick with our present DQ insulation layer” ATLAS Database and Data Management project • Accomplished goal of exposing different grids middleware with a unified interface • Client tools for end-users as well as for production managers • DQ usage: • Can access ~32 TB of data and ~140K files produced so far by the ATLAS DC since early June • Total requests: over 600 000, mostly to the replica catalogs without file movement • File Transfers: only around 3 TB so far; will increase to around 35 TB with Tier 0 exercise in coming weeks • Overall, still a bit to go to provide a unified system to access ATLAS production data • DQ aims to help building that unified system Don Quijote - CHEP 2004
Additional information • DQ web page: • http://cern.ch/mbranco/cern/donquijote/ • DQ docs (twiki): • https://uimon.cern.ch/twiki/bin/view/Atlas/DonQuijote • Feel free to contact me: • miguel.branco@cern.ch Don Quijote - CHEP 2004