270 likes | 398 Views
October 11th, CNAF GDB Meeting. DGAS Theoretical aspects & Software description. Andrea Guarise Rosario M. Piro. www.eu-egee.org. EGEE is a project funded by the European Union under contract INFSO-RI-508833. Introduction.
E N D
October 11th, CNAF GDB Meeting DGASTheoretical aspects & Software description. Andrea Guarise Rosario M. Piro www.eu-egee.org EGEE is a project funded by the European Union under contract INFSO-RI-508833
Introduction A generic Grid Accounting process involves many subsequent phases that can be divided in: • Metering: collection of usage metrics on computational resources. • Usage Accounting: storage of such metrics for further analysis. • Usage Analysis: Production of reports from the available records. And in case of Economic Accounting / Account Balancing: • Pricing: Assign and manage prices for computational resources. • Billing: Assign a cost to user operations on the Grid and charge them. In this presentation we briefly describe these steps and give an overview of DGAS: a Distributed Grid Accounting System. (DataGrid is now almost forgotten…time to change the name…)
Metering The metering phase in Grid accounting is probably the most important of the whole process since it is its fundation. During ‘metering’ the user payload on a resource needs to be correctly measured, and unambiguously assigned to the Grid User that directly or indirectly requested it to the Grid. This requires the system to collect information from the operating system (or the LRMS) and from the grid middleware. These information forms the Usage Record for the user process. This usage record must comprehend at least unique identifiers for: • Grid User • Grid Computing Resource • Grid Job ID • Metrics of the resource consumption. Grid Job Usage metrics
Usage Accounting The collected usage records need to be properly archived in databases for further analysis. There is a minimum set of requirements for the usage accounting service: • Information should be available to: • the Users responsible for the payload, • the Site Managers of the Grid Resources, • the VO administrator of the user, • It must not be available to other people. Or, in other words, information must be confidential. • Usage records must be sent encrypted and signed to the accounting services. Signing the Usage Records with the user credential assures that they become irrefutable when users have to be charged. • A distributed architecture is essential, as well as reliable and fault tolerant communication mechanisms.
Accounting &/vs. Monitoring FAQ: Is accounting nothing but monitoring? • Accounting is about tracing single jobs, file requests, etc. It is about single transactions (and their logic aggregations) associated to accounts. • Monitoring can use accounting information. But not only. • Get/archive Usage Record for a particular job. • How many jobs has a given User submitted? • On which CEs did a given user submit his jobs? • Jobs submitted/executed per VO, per Resource, per Site. • Does a VO use more resources than it provides? Accounting • What is the actual CPU load on a CE? • What is the average QWT on a queue? • What is the storage occupancy on a SE? • Are services on a CE up&running? Monitoring Economic Accounting • Has the user enough credits to submit a job? • Get account balance. • Credits spent/earned per VO, per resource • Does a Vo spend more than it earns?
Usage Analysis Information archived in the Accounting databases are rather complex. Not all the ‘customers’ are interested in all of them. So there is the necessity for a system that analyses them and produces reports. Different types of customers are interested in different views of the usage records, for example: • A user will simply want to know how he used the grid resources. • A site manager needs to know who used his resources and how. For example the percentage of usage per VO. Then he needs to be able to track a single job, for example because it caused some problems. • A VO manager needs to trace what the VO users are doing on the Grid.
Pricing & billing Resource owners may want to charge the users, thus it may be necessary to establish a cost for the service furnished to the user. A cost is usually computed according to a price (Pi) assigned to the unit of usage of a computing resource i (CPU, memory, stroage …) and to the usage (Ui) measured for the same resource. Thus a service responsible for managing the resource prices and communicating them to all the entities involved is needed. The way prices are set contributes to the creation of an economic market that deeply influences the behaviour of the Grid as a dynamic system. Once the resource consumption is known and a price is assigned to the computational resources, it is possible to define a cost that can be charged to the Grid User. The final cost applied to the User is influenced also by policy issues like discounts or offers.
What about Quota Management? • Quota Management is not the duty of an accounting system, that collects accounting information and provides detailed or aggregate information on user/resource accounts. • Quota Enforcement, however, as well as Pricing & Billing has to rely on irrefutable accounting information that is unequivocally mapped to grid users, resources and jobs. • Quota Enforcement may be combined with Pricing & Billing (e.g. assigning a number of virtual “Grid Credits” to each user). • Currently, the integration of DGAS with the gPBox policy management system is being studied for integrating quota management issues with policy management.
DGAS The Distributed Grid Accounting System was originally developed within the EU Datagrid Project and is now being maintained and re-engineered within the EU EGEE Project. The Purpose of DGAS is to implement Resource Usage Metering, UsageAccounting and Account Balancing (through resource pricing) in a fully distributed Grid environment. It is conceived to be distributed, secure and extensible. The system is designed in order for Usage Metering, Accounting and Account Balancing (through resource pricing) to be indipendent layers. Account balancing, resource pricing, (billing) accounting data Usage accounting Usage Analysis usage records Usage Metering
DGAS: Metering Usage Metering on Computing Elements is done by lightweight sensors installed on the Comuting Elements. These sensors parse PBS/LSF/Torque event logs to built Usage Records that can be passed to the accounting layer. For a reliable accounting of resource usage (essential for billing) it is important that the collected data is unequivocally associated to the unique grid ID of the user (certificate subject/DN), the resource (CE ID) as well as the job (global job ID). A process, completely transparent to the Grid User collects the necessary information needed by the Accounting. These, and the corresponding metrics are sent via an encrypted channel to the Accounting System signed with the user credentials. It is also possible, depending on site mgr choiches, to send Usage Records to accounting signed with resource host credentials.
Contents of a Usage Record with the DGAS sensors of gLite 1.4: user's certificate subject user's FQAN (VOMS certificates) user's VO local user & group ID job ID (both grid job ID and local LRMS ID) CE's grid ID SpecInt2000 & SpecFloat2000 number of processors CPU time & wall clock time physical & virtual memory usage accounting timestamp execution start/end timestamps ctime, qtime, etime job's exit status DGAS Usage Records
DGAS: Accounting The usage of Grid Resources by Grid Users is registered in appropriate servers, called Home Location Registers (HLRs) that manages both users and resources accounts. In order to achieve scalability, accounting records can be stored on an arbitrary number of independent HLRs. At least one HLR per VO is foreseen, although a finer granularity is possible. Each HLR keeps the records of all grid jobs submitted or executed by each of its registered users or resources, thus being able to furnish usage information with many granularity levels: Per user or resource, per group of users or resources, per VO. Accounting requires usage metering, but not necessarily resource pricing and billing. If all the relevant info are available, HLR can accept UR from third parties metering layer as well.
Balancing and Resource Pricing Resource pricing is done by dedicated Price Authorities (PAs) that may use different pricing algorithms: manual setting of fixed prices, dynamical determination of prices according to the state of a resource. In order to achieve scalability, prices can be established by an arbitrary number of independent PAs. At least one PA per site is foreseen (sites will want to retain control on the pricing of their resources). Price algorithms are dynamically linked by the PA server and can be re-implemented according to the resource owners' needs. The job cost is determined (by the HLR service) from resource prices and usage records. Account balancing is done by exchanging virtual credits between the User HLR and the Resource HLR.
What about billing/charging? The Account Balancing provided by DGAS is intentionally generic. It may be used for different use cases, such as: > Monitoring of overall resource consumption by users and resource contribution by owners. > Redistribution of credits earned by a VO's resources to the VOs users (for balanced resource sharing between VOs). > Billing/charging of users after resource usage. > Credit/quota acquisition by users before resource usage. The purpose of DGAS is not to define (and hence limit) the economic interactions between users and resource owners, but to provide the necessary means to enable them.
Metering Infrastructure: API • DGAS furnishes the APIs that can be used by each site to develop its own metering infrastructure. So that a site can freely decide how to measure its own Usage Records. • Two APIs are available: both provide a C++ library and a CLI tool. These push the information to the User HLR once the Usage Record for a job is available. • The only difference is the schema adopted to describe the Usage Records: The first implements a simple, proprietary, schema. The second implements the GGF URWG schema, more powerful, but rather complex to manage. • A minimum set of information need to be available in the usage record to make grid accounting possible: • The User HLR Location (comes with the job JDL) • A valid User Proxy or host certificate (Needed to secure the communication) • The CE ID (Needed to unambiguously assign the Usage) • The Grid Job Id (For grid accounting the LRMS job id is not enough) • The user’s certificate subject/DN (for charging, but also monitoring correct usage )
WN Grid level information … jobWrapper GIANDUIA Metering Infrastructure: GIANDUIA • GIANDUIA workflow (Gianduia Is A Nice Distributed Usage-metering Infrastructure for Accounting) LRMS LOG metrics CE gianduia conf User or Resource HLR async gianduiottiBox async
DGAS deployment DGAS architecture allows flexible deployment schemas. The deployment schema that we suggest to start with is to have: • A User HLR per VO (or more for VOs with large amount of users.) • It stores the Information and Usage Records for the users of the VO. The user specifies the address of his HLR in the job JDL. • One or more Resource HLRs, these can be one per site, but it is not tightly required, there can be HLR that stores information for many sites. • These store the Information and Usage Records for the resources registered. The site manager specifies the address of the resource HLR with a configuration file on the CE node. • The Gianduiametering system installed and configured on the Computing Elements. Or, at least, DGAS client APIs, in this case is site responsibility to collect the Usage Records. • One or more Price Authorities (only if Economic Accounting is desired). There should be one PA for every Resource HLR.
HLR 4 HLR 1 HLR 2 HLR 3 HLR 5 CE CE CE CE CE CE DGAS deployment and flow (1) VO1 site1 site2 VO2 Job1 UR1 UR1 UR2 Job2 VO3 site3 UR2
HLR 4 HLR 1 HLR 2 HLR 3 HLR 5 CE CE CE CE CE CE DGAS deployment and flow (2) VO1 site1 site2 VO2 Job1 UR1 UR1 Job2 VO3 site3 UR2
Examples of usage statistics Here we present two graphs: The upper one reports the number of jobs submitted by all the user on per-day basis. The lower one reports the number of jobs executed on all the resources with the same sampling period. It should be obvious that the graphs are Identical.
DGAS on the prototype testbed A User daily CPU graph Resources daily CPU graph a A User weekly CPU (per hour) Resources daily JOBS
DGAS vs. APEL (?) • DGAS and Apel aims are different: • DGAS: • Focused on storing detailed accounting information and controlling authorised access to it. • Provides resource&user(VO) level accounting. • Can serve as a basis for economic accounting and quota management. • Provides security and authorisation to information access. • APEL: • Focused on publishing accounting data and providing an easy graphical view to aggregate information. • Provides accounting suitable to upper (VO) level management view. • Focuses on after the fact, resource oriented accounting. • DGAS & APEL! • We believe that these two softwares are not competitors, alough they have some (needed) overlapping, If used together they can furnish what is actually needed for grid accounting and benefit from cooperation.
DGAS 2 APEL ... • DGAS provides a tool that converts it's accounting records into the format used by APEL (LcgRecords table): • can be periodically executed on the Resource HLR; • pushes the accounting records to an (either central or local) APEL database; • keeps track of previously converted accounting records (only new records will be processed if re-processing is not forced); • retries to convert accounting records with error conditions from previous executions; • for reasons of privacy the user's certificate subject is not provided (missing authorization mechanism for the access to information published via R-GMA).
... and beyond? • The conversion of DGAS accounting records to the APEL format is one step towards interoperability, BUT: • very specific solution • should be considered a temporary “workaround” • A common standard for all accounting systems would be preferable. • GGF Usage Record (UR) format (based on XML) for the exchange of accounting information? Problem: does not contain all required fields, but is extensible (which however might lead to different versions for different accounting systems). • GGF Resource Usage Service (RUS) interface for communication between accounting systems? Problem: 9 draft versions since July ...
FAQ • Why is DGAS so complex? • Grid Accounting is a complex business. However the DGAS architecture is not really that complex, its very flexible and its building blocks can be arranged in different ways to cope with many needs, so it seems to be rather complex: the truth is that is flexible. • Is DGAS a centralised system? • Absolutely no, it isn’t. The accounting system is implemented by a network of servers. Some servers treat user accounts, some others resource accounts. Usage Records for a job are available on both type of servers, so that a user can access, via his HLR, to his job info just like site managers can do the same via the HLR where their resources are registered. There’s non need for a centralised HLR. • It is possible to account for local jobs or jobs submitted without a RB? • Yes. But it is important to notice that limitation in system functionalities arise in such cases. Level of limitation depends on what information can be collected on resources (is the user proxy available? Is it possible to assign unique Id to the job? Etc…)
Issues & Future plans • Local and non-WMS jobs: accouting procedures for local submission to LSF and PBS currently being implemented. (Almost done). Accounting for jobs submitted to CREAM CE: ready for first tests. • DGAS2Apel: Implemented, ready to deploy in production. • DGAS2GridIce: To be used to have graphical access to aggregate information. • DGAS&gPBox: Quota enforcement. • Condor and SUN GridEngine sensors: In cooperation with gridIce team, studies on-going. • Standardisation: Further studies are needed.
References • Further information and documentation about DGAS can be found at: http://www.to.infn.it/grid/accounting