90 likes | 232 Views
UK e-Science CA Resilience. Jens Jensen, STFC RAL. GridPP 22, UCL, 1-2 April 2009. Disaster Planning. Like Graeme said (in a different context): Unfortunately not all theoretical Updated planning in connection with R89 move (Nov-Dec. ‘08) Service goals: availability (1-3),
E N D
UK e-Science CAResilience Jens Jensen, STFC RAL GridPP 22, UCL, 1-2 April 2009
Disaster Planning • Like Graeme said (in a different context): Unfortunately not all theoretical • Updated planning in connection with R89 move (Nov-Dec. ‘08) • Service goals: • availability (1-3), • integrity (1-3), • confidentiality (1-3), • ∏ (2-18)
Services Some ~18 services constitute the CA (of which 4 are people) Some very specialised stuff
More testing needed • E.g. HSM resilience • Setting up NGS-CA-TAG for review • Set up to review R89 move plan prior to move • Taken some time to set up secure comms • Simplify
Previous DP • HBI service review • Looks at infrastructure • But not CAologically • Probably too many staff involved at various levels? • Not as much problem with responsibility • More than no one knows everything (enough) about all layers
Coping with incidents • IGTF-RAT • International Grid Trust Federation • Risk Assessment Team • 2-4 Members from each PMA – 24 hr cover (ish) • Using GPG/PGP keys, secured email at NCSA • Assess incidents • E.g. MD5 “incident” 30-31 Dec. 2008 • E.g. DSA and ECDSA signatures in OpenSSL
Thoughts on resilience • LOCKSS • Redundancy not always possible • Expensive OR complicated OR security risk… • Software needs attention • Make machines do tedious tasks • Machines implement redundancy • People are important (cf Raja’s talk) • Good stuff to set OGF-CAOPS best practice
Thoughts on resilience • Hard to predict all the risks • Some are difficult to mitigate • Understanding services better will help • Trust machines and audit humans • Trust humans and make machines resilient • Complex software • ``Who are General Status and Major Failure, and what are they doing on my system?’’
Thoughts on infraoperations • Online database • Backed up hourly • (Risk of backing up bad data?) • Timeliness of revocations • IGTF: Currently being discussed for MICS profile • IGTF: How well are the Classics doing this • UK: Aim to bring signing online • UK: Direct revocation link for security officers