1 / 19

Data Management Highlights in TSA3.3 Services for HEP

This article highlights the data management strategies and challenges in the High Energy Physics field, including accounting, consistency between storage elements and file catalogues, distributed data management, and site cleaning.

randim
Download Presentation

Data Management Highlights in TSA3.3 Services for HEP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino, Domenico Giordano, Maria Girone, Elisa Lanciotti, Daniele Spiga on behalf of CERN-IT-ES-VOS and SA3 EGI Technical Forum – Data management highlights 22.9.2011

  2. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  3. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  4. WLCG today • 4 experiments • ALICE • ATLAS • CMS • LHCb • Over 140 sites • ˜150k CPU cores • >50 PB disk • Few thousand users • O(1M) file transfers/day • O(1M) jobs/day EGI Technical Forum – Data management highlights 22.9.2011

  5. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  6. LHCb Accounting • Agent that on a daily basis generates an accounting report based on the information available on the book-keeping system • Metadata breakdown • Location • Data type • Event type • File type • Display information in dynamic web-page • Reports used are currently the main input for clean-up campaigns EGI Technical Forum – Data management highlights 22.9.2011

  7. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  8. Storage element and file catalogue consistency • Grid Storage Elements (SEs) are decoupled from the File Catalogue (FC). • Inconsistencies can arise: • Dark data:Data in the SEs, but not in the FC. Waste of disk space • Lost/corrupted files:Data in the FC, but not in the SEs. Operational problems, e.g. failing jobs • Dark datais identified through consistency checks using full storage dumps • Need one common format and procedure that covers • various SEs: DPM, dCache, StoRM and CASTOR • three experiments: ATLAS, CMS and LHCb • Decision • Text format and XML format • Required information is: Spacetoken, LFN (or PFN), file size, creation time and checksum • The storage dump should be provided on a weekly/monthly basis or on demand EGI Technical Forum – Data management highlights 22.9.2011

  9. Example of good synchronization: LHCb storage usage at CNAF • CNAF provides storage dumps daily • Checks are done centrally with LHCb Data Management tools • Good SE-LFC agreement • Preliminary results: Small discrepancies (O(1TB)) are not a real problem. They can be due to a delay between uploading to the SE and registration to LFC and delay to refresh the information in the LHCb database EGI Technical Forum – Data management highlights 22.9.2011

  10. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  11. Original data distribution model • Hierarchical tier organization based on Monarc network topology • Developed over a decade ago • Sites are grouped into cloudsfor organizational reasons • Possible communications: • Optical Private Network • T0-T1 • T1-T1 • National networks • Intra-cloud T1-T2 • Restricted communications: General public network • Inter-cloud T1-T2 • Inter-cloud T2-T2 • But the network capabilities are not the same anymore! • Many use-cases require breaking these boundaries! EGI Technical Forum – Data management highlights 22.9.2011

  12. Machinery in place Purpose: Generate full mesh transfer statistics for monitoring, site commissioning and to feed back the system EGI Technical Forum – Data management highlights 22.9.2011

  13. Consequences • Link commissioning • Sites optimizing network connections • E.g. UK experience http://tinyurl.com/3p23m2p • Revealed different network issues • E.g. asymmetric network throughput for various sites (affecting also other experiments) • Definition of T2Ds: “Directly connected T2s” • Commissioned sites with good network connectivity • These sites benefit from closer transfer policies • Gradual flattening of the ATLAS Computing Model in order to reduce limitations on • Dynamic data placement • Output collection of multi-cloud analysis • Current development of generic, detailed FTS monitor • FTS servers publishing file level information (CERN-IT-GT) • Expose info through generic web interface and API (CERN-IT-ES) EGI Technical Forum – Data management highlights 22.9.2011

  14. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  15. CMS Popularity • In order to understand how to manage storage more efficiently, it is important to know what data (i.e. which files) is being accessed most and what are the access patterns • The CMS Popularity service now tracks the utilization of 30PB of files over more than 50 sites External systems (e.g. cleaning agent) Popularity web frontend CRAB CMS distributed analysis framework Dashboard DB Popularity DB Popularity information Input files Input Blocks LumiRanges Pull and translate jobs to file level entities EGI Technical Forum – Data management highlights 22.9.2011

  16. CMS Popularity Monitoring EGI Technical Forum – Data management highlights 22.9.2011

  17. Automatic site cleaning • Equally important to know what data is not accessed! • Automatic procedures for site clean up Popularity service & PheDEX Group pledges & PheDEX Replica popularity Used&pledged space Victor Replicas to delete Space information 1. Selection of groups filling their pledge on T2s 2. Selection of unpopular replicas PheDEX • Project initially developed for ATLAS, now extended for CMS • Plug-in architecture • Common core • Experiment specific plug-ins wrapping their Data Management API calls Deleted replicas, Group-site association information 3. Publication of decisions Popularity Web Agent running daily on a dedicated machine EGI Technical Forum – Data management highlights 22.9.2011

  18. Outline • Introduction: WLCG today • LHCb Accounting • Storage Element and File Catalogue consistency • ATLAS Distributed Data Management: Breaking cloud boundaries • CMS Popularity and Automatic Site Cleaning • Conclusions EGI Technical Forum – Data management highlights 22.9.2011

  19. Conclusions • First 2 years of data taking experiences on the LHC were successful • Data volumes and user activity keep increasing • We are learning how to operate the infrastructure efficiently • Common challenges for all experiments • Automate daily operations • Optimize the usage of the storage and network resources • Evolving computing models • Improving data placement strategies EGI Technical Forum – Data management highlights 22.9.2011

More Related