200 likes | 342 Views
LCG/EGEE Security Operations HEPiX, Fall 2004 BNL, 22 October 2004. David Kelsey CCLRC/RAL, UK d.p.kelsey@rl.ac.uk. Outline. Security Operations today (new) Site Registration procedures EGEE Operational Security Coordination Team Show (some) slides by Ian Neilson (CERN)
E N D
LCG/EGEE Security OperationsHEPiX, Fall 2004BNL, 22 October 2004 David KelseyCCLRC/RAL, UKd.p.kelsey@rl.ac.uk
Outline • Security Operations today • (new) Site Registration procedures • EGEE Operational Security Coordination Team • Show (some) slides by Ian Neilson (CERN) • Talk given at EGEE ROC Managers mtg (5 Oct 04) n.b. will not cover Authentication, Authorization, User Registration, VO management, Firewalls etc. David Kelsey, LCG/EGEE Security Ops, HEPiX
LCG Policy GOC Guides picture from Ian Neilson Incident Response Certification Authorities Audit Requirements Usage Rules Security & Availability Policy Application Development & Network Admin Guide User Registration & VO Management http://cern.ch/proj-lcg-security/documents.html David Kelsey, LCG/EGEE Security Ops, HEPiX
Security Operations today • Not part of LCG GOC activities • But the RAL GOC did write three Guides • Part of the Security Policy • Now starting integration with EGEE ROCs • See later • Ian Neilson (CERN) is the LCG Security Officer • Maria Dimou (CERN) is the LCG Registrar • Both also playing key roles in EGEE • When sites join LCG/EGEE • Security Contact details must be provided • Name, mail and phone numbers (out of band contact) • Local CSIRT mail list (for emergency use) • Mail list of these contacts used for discussion David Kelsey, LCG/EGEE Security Ops, HEPiX
Security Operations today (2) • Patching responsibilities • Operating System patches • Responsibility of the Site managers • Grid middleware patches • Pushed out by CERN Deployment Team • Help on security policy and procedures • mailto:project-lcg-security-support@cern.ch • All sites required to read, digest and follow the policy documents • There have been very few questions! David Kelsey, LCG/EGEE Security Ops, HEPiX
Security Operations today (3) • LCG Audit Requirements See https://edms.cern.ch/document/428037/ • Every site must keep (for at least 90 days) • Jobmanager and/or gatekeeper logfiles • Data transfer logs • Batch system and process activity records • Need to be preserved over system re-installs • Logs also needed for accounting David Kelsey, LCG/EGEE Security Ops, HEPiX
Security Operations today (4) • Agreement on Incident Response See https://edms.cern.ch/document/428035/ • What is an incident? • Investigation -> break in service • Misuse of remote Grid resources • Long-lived (>3 days) credentials stolen • Sites must • Take local action to prevent disruption • Report to local security officers • Report to others via Grid CSIRT mail list David Kelsey, LCG/EGEE Security Ops, HEPiX
Site Registration procedures Joint Security Policy Group • Working on more formal procedures • When Sites join LCG/EGEE • Need to collect all Contact details • System Managers and Security Contacts • Sites must confirm (sign?) acceptance of policy and procedures • EGEE sites need to be approved by local ROC • LCG sites approved by Deployment team or GDB David Kelsey, LCG/EGEE Security Ops, HEPiX
(OSCT: Operational Security Coordination Team) LCG/EGEE Security Coordination Ian Neilson Grid Deployment Group CERN
Security Coordination Objectives • Ownership of … • Security incidents • From notification to resolution • Liaise with national/institute CERTs • Middleware security problems • Liaise with development & deployment groups • Co-ordination of security monitoring • Post-mortem analysis • Access to team of experts • Security Service Challenges - LCG EGEE ARM-2 – 5 Oct 2004 - 10
OSG - Security Incident Handling and Response Guide (draft) • To guide the development and maintenance of a common capability for handling and response to cyber security incidents on Grids. • The capability will be established through • (1) common policies and processes, • (2) common organizational structures, • (3) cross-organizational relationships, • (4) common communications methods, and • (5) a modicum of centrally-provided services and processes. DPK comment: LCG/EGEE intends to base new procedures on the OSG document EGEE ARM-2 – 5 Oct 2004 - 11
GOC Guides The Joint Security Policy Group Incident Response Certification Authorities Audit Requirements Usage Rules Security & Availability Policy (1) Common policies and processes Application Development & Network Admin Guide User Registration http://cern.ch/proj-lcg-security/documents.html EGEE ARM-2 – 5 Oct 2004 - 12
Security Coordination - Groups • Parties from OSG IR • Security Operations Centre(s) (=?GOCs/CICs) • Organize, coordinate, track, report • Security contacts • Defined for every grid participant: users and resources • Incident Response & Technical Experts • Managed list of available expertise • Ad hoc Incident Response teams • Formed on demand • Security Operations Advisory group • Advise development and practice of SOC (=JSPG+?) • X-SOC coordination • SOCs participation/communication across grid boundaries (2) common organizational structures EGEE ARM-2 – 5 Oct 2004 - 13
CSIRT Media/Press “PR” CIC/GOC “External” GRID OSCT RC ROC Security Coordination - Channels EGEE operational channels still being established. Responsibilities and processes being defined. (3) cross-organizational relationships, EGEE ARM-2 – 5 Oct 2004 - 14
Security Coordination – Comms. • Incident Reporting List • INCIDENT-SEC-L@xxx.yyy • Security Contacts Discussion List • INCIDENT-DISCUSS@xxx.yyy • External contact • Reporting • Other grids • MUST be Encrypted • How is this achieved and managed? • Tracking system • MUST be secure • Press and Public Relations (4) common communications methods EGEE ARM-2 – 5 Oct 2004 - 15
Operational Security - Services • List Management • Alert/Discuss – ref: previous slide • Multiple ad-hoc IR Teams • Experts • Ticket Tracking System • Where do problems enter? – local contact • Can this be part of support lists? • Must be secure • Public Relations • Guidelines, practice statements • Policy interface to JSPG • Evidence gathering/preservation – use local law enforcement • OSCT must (help) define process behind all these services (5) a modicum of centrally-provided services and processes EGEE ARM-2 – 5 Oct 2004 - 16
Security Coordination - Issues • “Security Operations Centre”: what is it for EGEE/LCG? • Don’t think we can have “Central” control • So formulate activity as “coordination team” • Security contacts lists need management • Dead boxes, moderated boxes, etc etc • Do we have appropriate contact: site security or local admin? • Need to coordinate through Regional Operations Centres (ROC) • Need to utilise services from Core Infrastructure Centres (CIC) • Wherever possible - don’t duplicate channels • What is the relationships with LCG GOCs and EGEE CICs? • Are they the same? • Are we communicating with local site security team or grid ‘admin’ responsibles EGEE ARM-2 – 5 Oct 2004 - 17
Operational Security – where to start? • “Start small and keep it simple.” • Define basic structures • Where/how lists hosted • Where/how problems tracked • Who/where/how ‘experts’ organised • JSPG review and update policy documents • ROCs to take over management of contacts lists • Must integrate with site registration process • Establish what level of support is behind site security entries • Relationships with local/national CERT • Validate/test entries • Exercise channels and raise awareness by Security Challenges – next slide. EGEE ARM-2 – 5 Oct 2004 - 18
2004 Security Service Challenges • Objectives • Evaluate the effectiveness of current procedures by simulating a small and well defined set of security incidents. • Use the experiences of a) in an iterative fashion (during the challenges) to update procedures. • Formalise the understanding gained in a) & b) in updated incident response procedures. • Provide feedback to middleware development and testing activities to inform the process of building security test components. • Exercise response procedures in controlled manner • Non-intrusive • Compute resource usage trace to owner • Run a job to send an email • Storage resource trace to owner • Run a job to store a file • Disruptive • Disrupt a service and map the effects on the service and grid EGEE ARM-2 – 5 Oct 2004 - 19
Summary • There is much work ahead of us! • We need to work together to define and maintain better operational policies and procedures • Wherever possible should work towards common (or at least interoperable) procedures between Grid projects • Our applications are global • Must add to existing CSIRT procedures • JSPG and OSCT will be looking for input from site managers and security contacts • Please help! David Kelsey, LCG/EGEE Security Ops, HEPiX