10 likes | 175 Views
ATLAS DQ2 Deletion Service Danila Oleynik (JINR), Artem Petrosyan (JINR), Dmitry Kekelidze (JINR), Vincent Garonne (CERN).
E N D
ATLAS DQ2 Deletion ServiceDanila Oleynik (JINR), Artem Petrosyan (JINR), Dmitry Kekelidze (JINR), Vincent Garonne (CERN) One of the most essential operations in any distributed data management system is deleting data in a performant, reliable and scalable way. The Grid deletion service has been built for serving deletion requests on the Grid. This service is in production and used by the ATLAS DDM system. It is a distributed service which interacts with a 3rd party Grid middleware and the DDM catalogs. ATLAS DQ2 Deletion Service System architecture The Deletion service is divided into the following components: Server: collects deletion requests, stores detailed information about datasets, their contents (files) and the ongoing state of deletion requests. Client: provides communication between server and other components of the service. Deletion Agent: realizes the deletion process on storages. Monitoring: display the deletion process with the current and errors. Client/Server Deletion Client/Server has been implemented with web services technologies: Apache, mod_python and gridsite. They are also designed to encapsulate the interaction with the Oracle DBMS. Using the web service technology increases scalability of service and provides easy integration with other DQ2 services. Deletion Agent Deletion Agent provides logic of the deletion process: resolve list of files for deletion on site, deletion from LFC catalog, deletion from mass storage systems. The deletion agent works as three independent (parallel) processes: resolver, catalog cleaner, and storage cleaner. Resolver: resolves a content of the dataset (list of files) for deletion, stores this content in a database for future processing. Catalog cleaner: deletes files on file catalogs (LFC). It depends on the results of operation, 'catalog cleaner': marks files as deleted or deleted with error, or marks files for repeating deletion. In case all files of dataset were deleted from LFC - this dataset is marked as deleted from LFC. Storage cleaner: deletes files on mass storage systems. It depends on the results of deletion, files can be set in states: deleted or to repeat deletion again. For balancing the loading of storage systems provides delays between deletion operations. In case all files of dataset were deleted on storage - this dataset is marked as deleted on storage. If dataset is marked as deleted on storage and deleted from LFC – this dataset is marked as successfully removed. A lot of parameters like chunk size for bulk operation, delays between operation are configurable for each site. It gives a possibility to determinate different deletion strategy for different sites. ‘Grace period’ For safety reasons and to have some insurance in case of a human mistake, a grace period has been implemented such a way that deletion requests in grace period can be easy canceled. ‘Storages plug-ins’ Interfaces with different mass storage systems can be done via different protocols. Using the storage specific protocol can significant increases performance of operations. The idea of ‘storage plug in’ is to be able to implement the same method for any protocol type. Integration with other services Blacklisting service – provides information about blocked endpoints; Consistency service – verifies and recovers broken datasets; Victor – automatic cleanup of sites. Deletion Monitoring Deletion monitoring is a web application based on The Django framework, which provides live graphical and stats reports about deletion process at ATLAS sites, involved. The info is available at the cloud/site/endpoint levels. It allows one to select statistics at different periods: from last hour/last 4 hours/last 24 hours/last week. Also, there is a future plan report, which gives the amount of datasets to be deleted in next 24 hours. In addition to graphical reports, monitoring generates a table with info about waiting/resolved/ queued/deleted datasets, amount of files deleted, GBs deleted and amount of errors. Table is expandable. There are datasets and errors browsers. The information is generated via jQuery AJAX calls and uses BBQ plug-in to maintain history and bookmarks. Statistic Deletion Service serves around 200 of sites. It deletes up to 20 000 000 filesper week which corresponds up to 2Pb. References DQ2:https://twiki.cern.ch/twiki/bin/view/Atlas/DistributedDataManagementSRM V2: https://sdm.lbl.gov/srm-wg/,dCache: http://www.dcache.org/, CASTOR: http://castor.web.cern.ch/castor/Apache:http://www.apache.org, Django:http://www.djangoproject.com , Matplotlib:http://matplotlib.sourceforge.net/ Reported on International Conference on Computing in High Energy and Nuclear Physics (CHEP) Taipei 18-22 October 2010..