1 / 13

Grid Fabric Self-Organization and Security Issues

Explore configuration management, resource management, monitoring, and AAA issues in the context of a grid computing fabric. Learn about job submission, credential management, fault tolerance, and more.

Download Presentation

Grid Fabric Self-Organization and Security Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP4 Security and AA(A) issues For WP4: David Groep hep-proj-grid-fabric@cern.ch

  2. WP4 self-organization (1) • Configuration management • What should a system look like, what is installed • Systems Installation • Bootstrapping and installing software packages on 10.000 nodes • Resource Management • Queuing system, task scheduling, quotas ’n budget

  3. WP4 self-organization (2) • Monitoring • Performance and functional monitoring • Fault Tolerance & Exception Recovery • Detect exceptions using monitoring information and schedule recovery actions, make self-healing nodes • Gridification • Job authorization, credential mapping, information abstraction and network accessibility

  4. Internal and external AAA • External AAA: • interaction of a compute centre with “global” grid → through WP1 (ComputeElement) and WP2 (StorageElement) • Internal AAA: • recognizing trusted components and operators • authorization for jobs and files • access to information services • Protecting jobs and files whilst in the fabric (uid issues)

  5. A use case for job submission • Accept a job from ComputeElement (the Grid) • Check authorization w.r.t. extra local policies • Assign necessary local credentials • Have the job run on the local fabric

  6. Grid Info Serv (WP3) GridGATEprotocol gateway ComputeElmt GriFIS LRMS Fabric-localID-service Local CredentialMapping Serv LCAS Farms Farms Farms AuthZ plugins: Policy list User Rep. QuotaCheck Gridification of a Compute Centre Externally visible Local to the fabric Job Rep. GridJobMediating Serv

  7. Job life cycle in a fabric • GjMS – Grid-job Mediating Service • Accept jobs from ComputeElement and shuffle them through the AAA chain • LCAS – Local Community Authorization Service • Authorize a job or store request to run on this fabric • Based on community-wide CAS (VO’s) add extra constrains like: budgets, banlists, wallclock limitations • LCMAPS – Local Credential Mapping Service • Obtain the `usual’ credentials for running (uid/gid) • Issues: additional credentials for AFS, K5, ….

  8. LRMS Farms Farms Farms Gridification of a Compute Centre Externally visible Grid Info Serv (WP3) GridGATEprotocol gateway ComputeElmt GriFIS Local to the fabric Job Rep. GridJobMediating Serv Fabric-localID-service Local CredentialMapping Serv LCAS AuthZ plugins: Policy list User Rep. QuotaCheck

  9. FLIDS (Fabric-local ID service) • within a fabric only a local certifying entity will be sufficiently trusted • Signing authority for LCAS accepted (job) requests • Identify trusted operators for installation of new systems • Identify and certify hosts within a fabric • FLIDS is (a tree of) certification authorities • Some of those “automated” CA’s • Sign certificates when request is singed by trusted operator

  10. Information and Configuration • A configuration database existscontaining the desired state of the local fabric • Contains sensitive information • Prevent unauthorized read access • Prevent snooping information sent to other hosts • PM9 (and possibly beyond?):web-server XML over HTTPS • Write access limited to special operator interface only

  11. Another FLIDS application • Adding a new host to a fabric • Possibly in a `hostile’ environment • We have a trusted operator with an install disk • Need to get initial configuration information • Which includes,e.g., a ssh host key Next slide is for your reference only (don’t be baffled by it)

  12. CFG Configuration Database CFG data ACLs LCA root cert 11: CFG web server can checkhostname in cert againstrequesting IP addressand check ACLs 7: FLIDS checks signature of operator, and signsrequest with LCA key. Request DN namespace limited. Secured http server 3:https server checks CFG data ACL(operator has all rights), can verify IDof operator using LCA root cert 4: sens config data encryptedusing session key FLIDS engine 2:agent makes https requestusing operator credentials LCA cert and privkey Automated CA, Will sign when request Approved by `operator’ 6: request sent to FLIDS engine,signed by operator key (in cleartext)(FLIDS hostname known from CFG data) 10: https requests to CFGauthenticated with newsigned host certificate New host to be installed 5: host generates key pair(but without a passphrase to protecting private part) 9: host checks signature on cert using the LCA root cert on the boot disk 8: signed host cert back to host (in clear) • Operator install disk: • kernel and init • CFG https agent • Signed cert of operator • Protected private key of operator • LCA root certificate 1:Operator boots system

  13. Issues not (yet) addressed • Information services • Use whatever security framework WP3 chooses • Will likely not publish list of authorized users • Networking issues • WP4 does not envision using network-layer security • IPv6 is being studied, but only for address space issues • GridGATE is not a VPN router and is not doing IPsec

More Related