180 likes | 193 Views
Learn about incident response for LCG/EGEE Grids, including background information, grid projects, operational aspects, security coordination, and incident handling guide.
E N D
LCG/EGEE Grid Incident Response Ian Neilson, Grid Deployment Group, CERN TERENA NRENS-Grids Workshop 12th May 2005, Amsterdam
TOC • Background • Grids • Grid Projects • Grid Environment • Incident Handling Guide • Requirements • Requests • Operational Aspects • Project Environment • Security Coordination Team • Planning • Use-case Testing • Service Challenges TERENA NRENS-Grids Workshop, Amsterdam
LCG – LHC Computing Grid Middleware Job Managers EGEE – Enabling Grids for e-Science in Europe OSG – Open Science Grid Globus Toolkit Computing Elements Resource Brokers Proxy Servers GridPP – Grid Particle Physics Virtual Data Toolkit PPDG – Particle Physics Data Grid Storage Resource Manager gLite Grids “[Grids] enable the sharing, exchange, discovery, and aggregation of resources distributed across multiple administrative domains ...”- Sun Microsystems Virtual Organisations TERENA NRENS-Grids Workshop, Amsterdam
EGEE in one slide • 70 institutions in 28 countries,federated in regional clusters • 32MEUR for first 2 years(plans for another 2 years) • Deployment andreengineering project • 50% operations & support,25% training & appl. support,25% reengineering TERENA NRENS-Grids Workshop, Amsterdam
Computing Resources: April 2005 • Country providing resources • Country anticipating joining • In LCG-2: • 131 sites, 30 countries • >12,000 cpu • ~5 PB storage • Includes non-EGEE sites: • 9 countries • 20 sites TERENA NRENS-Grids Workshop, Amsterdam
LCG/EGEE Security environment • The players Users VOs Personal data Roles Usage patterns … Experiment data Access patterns Membership … Grid Sites Resources Availability Accountability … TERENA NRENS-Grids Workshop, Amsterdam
The Risks • Top risks from Security Risk Analysis • http://proj-lcg-security.web.cern.ch/proj-lcg-security/RiskAnalysis/risk.html • Launch attacks on other sites • Large distributed farms of machines • Illegal or inappropriate distribution or sharing of data • Massive distributed storage capacity • Disruption by exploit of security holes • Complex, heterogeneous and dynamic environment • Damage caused by viruses, worms etc. • Highly connected and novel infrastructure TERENA NRENS-Grids Workshop, Amsterdam
Certification Authorities Audit Requirements VOSecurityPolicy (Draft) Usage Rules Security & Availability Policy SiteRegistration UserRegistration Application Development & Network Admin Guide Joint Security Policy Group Incident Response http://cern.ch/proj-lcg-security/documents.html TERENA NRENS-Grids Workshop, Amsterdam
Incident Response • Overview • LCG Security Group Agreement on Incident Response • June 2003 LCG-1 https://edms.cern.ch/document/428035/1 • Updated as The OSG Incident Handling and Response Guide • Developed with JSPG https://edms.cern.ch/file/428035/2/OSG_incident_handling_v1.0.pdf “To guide the development and maintenance of a common capability for handling and response to cyber security incidents on Grids.” • Aims to established • common policies and processes, organizational structures, • cross-organizational relationships, • common communications methods, and • a modicum of centrally-provided services and processes. Grid Incident definition: “..event that poses a .. threat [to] the integrity of services, resources, infrastructure, or identities.” TERENA NRENS-Grids Workshop, Amsterdam
Incident Response • The OSG Incident Handling and Response Guide • What it mandates (MUST do’s) • REPORT • RESPOND • PROTECT information gathered • ANALYSE • What it recommends (SHOULD do’s) • Provide monitored contact mailing lists at sites • Public Disclosure (summary) through site Public Relations • Use signed mails • See also Andrew Cormack’s draft “CSIRTs and Grids”comparison available here. TERENA NRENS-Grids Workshop, Amsterdam
Incident Response • Reporting (MUST) • Provide contact information • Individual contacts • Monitored list (optional but HIGHLY desirable) • Management through GOCDB (?soon) • Report to LOCAL site security • = sites should have local plan • Does not replace or interfere with local plans • Report to INCIDENT-REPORT-L@project.org • Initial incident notification only, no chat • Closed list • Filtered abuse@.. & security@.. • Currently we use project-lcg-security-csirts@cern.ch • -egee- alias • Open list hence no moderated lists TERENA NRENS-Grids Workshop, Amsterdam
Incident Response • Responding (MUST) • Initial Classification • Low, Medium, High classifications • Containment • Assumes local containment process in place • Attacks through the grid • Default action to block grid access initially • Authorization control MUST be provided for services • Attacks on the grid • Little/no possible central control • Notify the attacking site (NREN CSIRTS) • Coordination of blocking, restoration of service • Notification • INCIDENT-DISCUSS-L@project.org • User, VO if identity compromise • Management • Post-Incident Analysis TERENA NRENS-Grids Workshop, Amsterdam
CSIRT Media/Press “PR” CIC/GOC “External” GRID OSCT RC ROC Operational Security Coordination • Operational Security Coordination Team - OSCT • Incident Response Planning • Best Practice Information • Security Monitoring • Security Service Challenges • EGEE operational channels are still being established. • No central authority over sites TERENA NRENS-Grids Workshop, Amsterdam
Operational issues • Recognising and reporting • What is a local CSIRT? • Scale of coverage • 24x7 site/campus network operations team • Department Security Officer • LCG system administrator • Who is a security contact? • as above • Contact management • Intersection with local CSIRT procedures • Local quarantine and analysis • Keeping emergency channels clear • Discussions, cross-postings TERENA NRENS-Grids Workshop, Amsterdam
Incident Response Planning • Response Planning Objectives • Provide a framework to use when something happens • But must be usable flexibly • Can be tested • Classification Based ‘Use Cases’ • LOW • e.g. Local single non-privileged identity compromised, local denial of service. • MEDIUM • e.g. Local privileged identity compromised, attack on grid service not threatening grid stability. • HIGH • e.g. Exploitation of trust fabric, attack leading to grid instability or denial of service against all service replicas. TERENA NRENS-Grids Workshop, Amsterdam
Security Service Challenges • Objectives • Simulating small, well defined security incidents. • Learn and iterate to update procedures. • Formalise in updated incident response procedures. • Feedback to development and testing activities. • Exercise response procedures in controlled manner • Non-intrusive • Compute resource usage trace to owner • Run a job, can we trace it back to submission? • SSC1 in testing phase now. • Future ?SSC2 • Storage resource usage trace to owner • Run a job to store a file • Disruptive • Disrupt a service and map the effects on the service and grid TERENA NRENS-Grids Workshop, Amsterdam
Summary • Diverse and complex Grid environments • We have • Basic Incident Response proposals in place • Basic Organisational structures in place • We need to implement through • Testing and awareness through Service Challenges • Improving planning process in OSCT TERENA NRENS-Grids Workshop, Amsterdam
Thank You Thanks to UK PPARC for my funding in LCG TERENA NRENS-Grids Workshop, Amsterdam