60 likes | 233 Views
OSG PKI Contingency and Recovery Plans. Mine Altunay, Von Welch maltunay@fnal.gov , vwelch@indiana.edu October 16, 2012. Background. The Open Science Grid (OSG) relies on a public key infrastructure (PKI) built around an OSG Certificate Authority (CA) to support its operations.
E N D
OSG PKI Contingency and Recovery Plans Mine Altunay, Von Welch maltunay@fnal.gov, vwelch@indiana.edu October 16, 2012
Background • The Open Science Grid (OSG) relies on a public key infrastructure (PKI) built around an OSG Certificate Authority (CA) to support its operations. • The OSG PKI is operated by two parties: • The OSG itself operates a network of trusted agents (registration authorities and grid admins) who vet certificate requests and a web front-end OSG Information Management (OIM) System that provides interfaces for users for PKI functions • The DigiCert, a private company, operates the CA that, at direction of OSG and within the bounds of policy, performs the issuance of certificates.
Goals and Scope • Create a Recovery Plans document that present a recovery plan for PKI failure scenarios. • Not a risk analysis, does not attempt to analyze whether or not a PKI failure is something that the OSG should prepare for. • Analyzes the options for a recovery plan and recommends a broad course of action. • Describes all the steps necessary to bring the OSG PKI back to its normal functional state. • Focuses on the new OSG PKI, not the DOEGrids CA although most of the discussion is valid for DOEGrids CA as well.
OSG PKI Failure Cases • 2 Failure Types: compromise and loss of service • Back-End CA Compromise • OSG Information Management (OIM) Front-End Compromise • Back-End CA Loss of Availability • OSG OIM Front-End Loss of Availability
Recovery Plans • A recovery plan for each failure type is presented in the document available at http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=1121. The plan: • Is a workflow of specific steps that should be taken in the aftermath of a failure to restore PKI back to normal. E.g., forming the incident response team, revoking compromised certs, issuing replacement certs, community communications, and so on. • Considers slight variations in a failure type depending on the different levels of severity (e.g. all RA Agents compromised vs. only some are compromised), incorporates conditional branches into the workflow.
Recovery Plans • Each step is accompanied with specific timelines, estimating how long the plan execution would take. • Each step has a clear owner responsible for performing the activities in the event of a failure. • Due to time limitation and the complexity of each plan, I will not present them here. • Please contact me and Von Welch should you have any questions or feedback.