1 / 16

Consolidation of Grid operations

Consolidation of Grid operations. Costin Grigoras ALICE Offline. Preamble. In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and MC production and also (quite successfully) for end user analysis

Download Presentation

Consolidation of Grid operations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Consolidation of Grid operations CostinGrigoras ALICE Offline

  2. Preamble • In the period of steady LHC operation, The Grid usage is constant and high and, as foreseen, is used for massive RAW and MC production and also (quite successfully) for end user analysis • To help the Grid users and administrators, many applications have been developed in the early years of the Grid. ALICE has made an effort to consolidate all of these in a coherent set of monitoring and control tools • The following presentation is a quick overview of some of them Consolidation of Grid operations

  3. Central production management - LPM • Speed is of the essence – the RAW reconstruction follows promptly the data taking, allowing for immediate QA and physics analysis • LPM (Lightweight Production Manager) • Several triggers to assure RAW and conditions data integrity • Fully automatic • Does also replication of RAW to T1 • Manages not only Pass1, but all central RAW and MC productions and the organized analysis trains • Up to now, 360 production cycles have been handled by LPM Consolidation of Grid operations

  4. Dependent tasks - LPM chains • Data processing jobs which must be launched only when a previous process has successfully completed • For example, the QA tasks are ‘cascaded’ after Pass1 RAW reco. is completed • Same for AOD production, data merging • The depth of cascading is unlimited • Speeds up considerably the data production! Consolidation of Grid operations

  5. LPM chains logic Reco. 1job/chunk QA 1job/chunk QA merging Delete partial output Merge ROOT tags When complete, start in parallel Resubmit error jobs AOD 1job/chunk AOD Merging Delete partial output Same mechanism is used also for MonteCarlo productions and analysis trains on MC and RAW data

  6. LPM chains logic – example • Parallel productions are possible • With different weights / priorities • Branches can be temporarily disabled • Tasks can be simple JDLs or more complex code that prepares the execution (creating collections, checking conditions) Consolidation of Grid operations

  7. Integration of Grid status monitoring • Monitoring data (MonALISA) is used to trigger the LPM activity • New jobs are submitted when the number of waiting tasks pass below a threshold • Pre-staging of data from tape is triggered before the reconstruction jobs are submitted • Running jobs are tracked individually for resources usage • Automatic alerts in case of unreasonable disk/memory/CPU consumption, jobs can be terminated… Consolidation of Grid operations

  8. Resource usage alerts • Trigger now at 2GB RSS • Mail sent toboth adminsand the user Consolidation of Grid operations

  9. Opportunistic storage discovery • A client-to-storage metric allows the automatic discovery of the closest (working) storage elements from every job • Based on the network topology information collected by MonALISA • Continuous functional tests of storages • SE occupancy status • Users specify the number of output replicas and type of storage (disk, custodial), but not the SEs Consolidation of Grid operations

  10. France Nordic Countries Italy Russia USA Consolidation of Grid operations

  11. User catalogue and job management • Web-based access to the AliEn catalogue (with certificate authentication) • Insert your favorite plugin (ROOT) here

  12. Catalogue browser – view and edit • Viewer with syntax highlight and catalogue links • SE discovery syntax is highlighted Consolidation of Grid operations

  13. Jobs management • Full job tracking, with submission and resubmission capabilities Consolidation of Grid operations

  14. Jobs management • Detailed view of a particular masterjob • All trace logs can be accessed online

  15. Summary • The Grid is in a full production mode since almost one year • Its operation is very successful, providing millions of CPU days and PBs of storage • To efficiently use there resources, consolidated tools Consolidation of Grid operations

  16. Thanks a lot for your attention! Questions please? http://alimonitor.cern.ch/

More Related