130 likes | 149 Views
Explore configuration management, resource management, monitoring, and AAA issues in the context of a grid computing fabric. Learn about job submission, credential management, fault tolerance, and more.
E N D
WP4 Security and AA(A) issues For WP4: David Groep hep-proj-grid-fabric@cern.ch
WP4 self-organization (1) • Configuration management • What should a system look like, what is installed • Systems Installation • Bootstrapping and installing software packages on 10.000 nodes • Resource Management • Queuing system, task scheduling, quotas ’n budget
WP4 self-organization (2) • Monitoring • Performance and functional monitoring • Fault Tolerance & Exception Recovery • Detect exceptions using monitoring information and schedule recovery actions, make self-healing nodes • Gridification • Job authorization, credential mapping, information abstraction and network accessibility
Internal and external AAA • External AAA: • interaction of a compute centre with “global” grid → through WP1 (ComputeElement) and WP2 (StorageElement) • Internal AAA: • recognizing trusted components and operators • authorization for jobs and files • access to information services • Protecting jobs and files whilst in the fabric (uid issues)
A use case for job submission • Accept a job from ComputeElement (the Grid) • Check authorization w.r.t. extra local policies • Assign necessary local credentials • Have the job run on the local fabric
Grid Info Serv (WP3) GridGATEprotocol gateway ComputeElmt GriFIS LRMS Fabric-localID-service Local CredentialMapping Serv LCAS Farms Farms Farms AuthZ plugins: Policy list User Rep. QuotaCheck Gridification of a Compute Centre Externally visible Local to the fabric Job Rep. GridJobMediating Serv
Job life cycle in a fabric • GjMS – Grid-job Mediating Service • Accept jobs from ComputeElement and shuffle them through the AAA chain • LCAS – Local Community Authorization Service • Authorize a job or store request to run on this fabric • Based on community-wide CAS (VO’s) add extra constrains like: budgets, banlists, wallclock limitations • LCMAPS – Local Credential Mapping Service • Obtain the `usual’ credentials for running (uid/gid) • Issues: additional credentials for AFS, K5, ….
LRMS Farms Farms Farms Gridification of a Compute Centre Externally visible Grid Info Serv (WP3) GridGATEprotocol gateway ComputeElmt GriFIS Local to the fabric Job Rep. GridJobMediating Serv Fabric-localID-service Local CredentialMapping Serv LCAS AuthZ plugins: Policy list User Rep. QuotaCheck
FLIDS (Fabric-local ID service) • within a fabric only a local certifying entity will be sufficiently trusted • Signing authority for LCAS accepted (job) requests • Identify trusted operators for installation of new systems • Identify and certify hosts within a fabric • FLIDS is (a tree of) certification authorities • Some of those “automated” CA’s • Sign certificates when request is singed by trusted operator
Information and Configuration • A configuration database existscontaining the desired state of the local fabric • Contains sensitive information • Prevent unauthorized read access • Prevent snooping information sent to other hosts • PM9 (and possibly beyond?):web-server XML over HTTPS • Write access limited to special operator interface only
Another FLIDS application • Adding a new host to a fabric • Possibly in a `hostile’ environment • We have a trusted operator with an install disk • Need to get initial configuration information • Which includes,e.g., a ssh host key Next slide is for your reference only (don’t be baffled by it)
CFG Configuration Database CFG data ACLs LCA root cert 11: CFG web server can checkhostname in cert againstrequesting IP addressand check ACLs 7: FLIDS checks signature of operator, and signsrequest with LCA key. Request DN namespace limited. Secured http server 3:https server checks CFG data ACL(operator has all rights), can verify IDof operator using LCA root cert 4: sens config data encryptedusing session key FLIDS engine 2:agent makes https requestusing operator credentials LCA cert and privkey Automated CA, Will sign when request Approved by `operator’ 6: request sent to FLIDS engine,signed by operator key (in cleartext)(FLIDS hostname known from CFG data) 10: https requests to CFGauthenticated with newsigned host certificate New host to be installed 5: host generates key pair(but without a passphrase to protecting private part) 9: host checks signature on cert using the LCA root cert on the boot disk 8: signed host cert back to host (in clear) • Operator install disk: • kernel and init • CFG https agent • Signed cert of operator • Protected private key of operator • LCA root certificate 1:Operator boots system
Issues not (yet) addressed • Information services • Use whatever security framework WP3 chooses • Will likely not publish list of authorized users • Networking issues • WP4 does not envision using network-layer security • IPv6 is being studied, but only for address space issues • GridGATE is not a VPN router and is not doing IPsec