1 / 29

Grid Accounting Status at INFN

Learn about CPU and storage accounting in a distributed computing environment, using DGAS to track resource usage for users, VO managers, site managers, and ROC managers, ensuring security and privacy.

simonb
Download Presentation

Grid Accounting Status at INFN

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Accounting Status at INFN Riccardo Brunetti INFN-TORINO

  2. Why Accounting ? • CPU Accounting and Storage Accounting • DGAS Features • DGAS Components • Security and Privacy • Storage Accounting • DGAS Deployment in the Italian Grid • Work in Progress Summary

  3. In a distributed computing environment people want to know who used the resources and how many resource have been used. • Users : How many resources am I using ? • VO managers : how many resources is my VO using ? • Site Managers : who is using my resources ? • ROC Managers : How many resources have been used in my federation and who used them ? A good accounting system should be able to answer these questions taking care of all the security and privacy issues. Why Accounting ?

  4. Is the task of collecting usage metering records in terms of computing resources used and/or some other derived quantity (ex. SpecInt(SpecFloat)/t) • Primary source of information : CE gatekeeper log + batch system logs • Already deployed in InfnGrid (though in evolution) by mean of the DGAS software package CPU Accounting

  5. Is the task of collecting usage metering records in terms of storage space used and/or some other derived quantity • Primary source of information: transfer services logs + storage system logs • Work in progress (recently started as activity) in InfnGrid to define the specifications. Storage Accounting

  6. DGAS is a distributed accounting system able to perform a resource usage metering and Economic Accounting (eventually as a basis for billing) in the Grid environment. It is based on a client/server infrastructure relying on a network of independent accounting servers. Developed inside the EDG/WP1 and EGEE/JRA1 project by INFN-TORINO people (A. Guarise, R. Piro, G. Patania) What is DGAS ?

  7. Sensors on CEs • Build usage records from LRMS accounting files • Resource (site) HLRs (Multilevel structure) • Collect usage records from one or more sites • User (VO) HLRs • Collect usage records for a whole VO • Query clients and visualization tools • Allow to retrieve data from HLRs DGAS Components

  8. Granularity • Resource accounting at single job level or in aggregate form per user, per VO, per resource (site) or per infrastructure (collection of sites). • Scalability • Arbitrary number of Resource/VO HLRs can be deployed. • Hierarchical Design • HLRs can be interconnected, in order to have multiple levels of aggregation. DGAS Features

  9. Completeness • Capability to collect information both for grid and local jobs. • Accounting works with all the EGEE (gLite and LCG)RBs • Customization • Possibility to choose which resource and/or type of jobs must be accounted. • Designed to perform pricing (not used) • Possibility to assign “computing credits” (something similar to quotas) that can be used as a basis for billing the resource consumptions. DGAS Features II

  10. Usage Record L2 HLR 3 DGAS Workflow 3 Usage Record VO HLR 3 Site HLR CE 1 job 2 WN job

  11. Information are retrieved parsing the LRMS accounting log files and the CE gatekeeper log file • LRMS log files : resource used (cpu-time,wall-time,lrmsid,user,group) • gatekeeper log files : grid info (user DN, grid-jobid,VO) • Some other config files are used in order to manage local jobs, for which the VO could not be defined through the proxy certificate. Information retrieval

  12. Information confidentiality is guaranteed by the use of different authorization levels to access the Usage Records. • Users (can access their own detailed records and aggregates) • Site Managers (Can access their own site detailed records and aggregates) • VO Managers (Can access detailed records and aggregates of all VO members) • Full VOMS integration in query authorization is available (now on L2 HLR, on every HLR in future releases) • (e.g. /atlas/Role=vomanager/Group=NULL) • Security and integrity of the data flow is guaranteed by the use of GSI and data encryption. • No sensitive information sent in clear text Security and Privacy

  13. DGAS Usage Records tables can be converted into APEL LCGRecords table structure. • This instance of the table can then be sent to the GOC accounting database through R-GMA. This can be performed using the already existing APEL publisher. • The user credentials are sent in encrypted form • The translation tool (Dgas2Apel) is already tested and working. • Possibility to choose which records must be converted and then sent (resource, grid/local jobs) • Still some stability problems with R-GMA, but already running at T2 and T1 sites. Interface to APEL

  14. Different types of Storage Elements: • classic SE (gridftp + rfio) • dCache SE (gridftp + gsidcap) • dpm SE (gridftp + srfio) • castor (gridftp + (s)rfio or gsidcap) • storm (gridftp + posix-like access) • Need to cross check different log files in order to collect both storage and user/VO information • Some type of SE does not log all type of operations • At the moment no standard specifications available for “usage record” • Used space ? • Time of permanence on SE ? • Stored files only or transferred files too ? • ….. Storage Accounting

  15. It is a software architecture to monitor the storage space used (usage metering). • It works on Disk Pool Manager (DPM) based SE • No modifications to DPM requested • Generates Usage Records which refer to disk usage • Usage Records are build by looking to GridFTP-DPM e RFIO log files • DPM internal DB maintains history of operations, certifiates, turls ecc.. • It is foreseen to forward storage Usage Record to DGAS HLRs as well. Catania:SAGE (Storage Accounting in a Grid Environment)(F.Scibilia, C. Cherubino, D. Russo)

  16. Table DPM_REQ Information on user requesting operation SAGE data collection Tables containing specific information on the type of operation ( PUT, GET o COPY)

  17. Castor: • Monitoring of data transfers from gridftp log • User info from “messages” log • Info stored in a local db • Availlable information: • Type of operation • Transferred files • Hosts involved • User Info • Timing information Bari:Castor & dCache monitoring/accounting(G. Cuscela, G. Donvito)

  18. dCache: • Information mainly from “billing” files • Local db used • Availlable information: • File “storage-class” • Type of operation (put, get, local access) • Hosts involved (pool node & remote host) • User, Group, VO • User DN • Bytes transferred and protocol • File deletion Bari:Castor & dCache monitoring/accounting(G. Cuscela, G. Donvito)

  19. DGAS Deployment in the Italian Grid • DGAS deployed in 43 sites (RPMs+YAIM) • L1 HLR in 1 T1 site (CNAF-T1) • L1 HLR in 9 T2 Sites + Padova and Catania (2 of them registering data for small T3 sites)

  20. L2 HLR in 1 Site (Torino) collecting data for T2 sites (Torino,Milano,Catania,Frascati,Pisa,Bari,Napoli) • Roma1,Legnaro will be added soon DGAS Deployment Italian Grid (L2 HLR) . INFN-ROMA 1-2-3

  21. Data collected by site L1 HLR have been verified by mean of a cross check with LRMS accounting log files • Once a site has been validated, it is admitted to send data to II level HLR. • At the moment, validation was successful for 7 T2 sites. • 3 are pending (including T1) because of : • a known problem of DGAS with some particular configuration of LSF batch system. • a discovered problem with LSF log rotation (a patched version was just certified and is going to be deployed) • Validation of the small sites (not having his own HLR) is on going. • Validated sites started also to send data to GOC. DGAS data validation

  22. Site level Information from L1 HLR Aggregate per VO Examples Aggregate per User

  23. HLR Query Client: Aggregate (job number, hours of CPU(WALL) time, ”efficiency”) for the VO running jobs in Torino. Examples

  24. Multi Site Information from L2 HLR Jobs per site (T1 included) Examples Jobs per VO (T1 included)

  25. Multi Site Information from L2 HLR Examples Jobs per day (T1 included)

  26. Web Interface to II Level HLR(Work in progress) F. Pescarmona S. Dalpra F. Rosso G. Misurelli E. Fattibene G. Patania

  27. Shows accounting data in aggregate form • A set of predefined aggregates are built using data on II level HLR • User is identified by mean of his certificate and is allowed to plot charts according to his own VO role. • Capability to completely customize the queries is foreseen (need to carefully play with authorizations) Web Interface to II Level HLR

  28. General information about DGAS can be found at: • DGAS website: http://www.to.infn.it/grid/accounting/ • DGAS User Guides: https://edms.cern.ch/cedar/plsql/doc.info?cookie=3881073&document_id=571271&version=1 DGAS References

  29. Bari:Castor & dCache monitoring/accounting(G. Cuscela, G. Donvito)

More Related