290 likes | 301 Views
TeraGrid is a distributed, scalable, and secure cyberinfrastructure platform that provides access to high-performance computing, data analysis, collaboration, and education resources for the science and engineering community.
E N D
TeraGrid National Cyberinfrastructure for Terascale Science Dane Skow Deputy Director, TeraGrid www.teragrid.org The University of Chicago and Argonne National Laboratory February 2007 Petascale Slides courtesy of Charlie Catlett (UC/ANL), Tony Rimovsky (NCSA) and Reagan Moore (SDSC) TeraGrid is supported by the National Science Foundation Office of Cyberinfrastructure
“NSF Cyberinfrastructure Vision for 21st Century Discovery” 1. Distributed, scalable up to petaFLOPS HPC 2. Data, data analysis, visualization 3. Collaboratories, observatories, virtual organizations 4. Education and Workforce • provide sustainable and evolving CI that is secure, efficient, reliable, accessible, usable, and interoperable • provide access to world-class tools and services “sophisticated” science application software includes networking, middleware, systems software includes data to and from instruments Draft 7.1 CI Plan at www.nsf.gov/oci/ Adapted from: Dan Atkins, NSF Office of Cyberinfrastructure
TeraGrid Mission • TeraGrid provides integrated, persistent, and pioneering computational resources that will significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems. • Our vision requires an integrated approach to the scientific workflow including obtaining access, application development and execution, data analysis, collaboration and data management. • These capabilities must be accessible broadly to the science, engineering, and education community.
TeraGrid Facility Partners Grid Infrastructure Group (GIG) UW NIU PSC PU UC/ANL NCAR NCSA IU UNC/RENCI Caltech USC/ISI ORNL SDSC TACC Resource Provider (RP) Software Integration Partner
Networking Abilene NCSA UC/ANL PSC 3x10G each NCAR 1x10G 1x10G LA DEN CHI 1x10G 1x10G 1x10G ORNL 1x10G Cornell 2x10G IPGrid 2x10G 1x10G 1x10G each SDSC TACC PU IU
TeraGrid Usage Growth 200 Specific Allocations Roaming Allocations Normalized Units (millions) 100 TeraGrid currently delivers to users an average of 400,000 cpu-hours per day -> ~20,000 CPUs DC
TeraGrid User Community Growth Begin TeraGrid Production Services (October 2004) Incorporate NCSA and SDSC Core (PACI) Systems and Users (April 2006) Decommissioning of systems typically causes slight reductions in active users. E.g. December 2006 is due to decommissioning of Lemeux (PSC). (*FY06 new users/qtr excludes Mar/Apr 2006)
TeraGrid Projects by Institution Blue: 10 or more PI’s Red: 5-9 PI’s Yellow: 2-4 PI’s Green: 1 PI TeraGrid allocations are available to researchers at any US educational institution by peer review. Exploratory allocations can be obtained through a biweekly review process. See www.teragrid.org. 1000 projects, 3200 users
FY06 Quarterly Usage by Discipline 100 Percent Usage 50
TeraGrid Science Gateways Initiative:Service-Oriented Approach Web Services TeraGrid Grid-X Grid-Y The science and engineering community has been building discipline-specific cyberinfrastructure in the form of portals, applications, and grids. Our objective is to enable these to use TeraGrid resources transparently as “back-ends” to their infrastructure. The TeraGrid Science Gateways program has developed, in partnership with 20+ communities and multiple major Grid projects, an initial set of processes, policies, and services that enable these gateways to access TeraGrid (or other facilities) resources via web services.
TeraGrid User Community in 2006 Grid-y Users
Data Storage Resources • Local Cluster Files System • Global File System GPFS-WAN 250TB • Data Collections • Archive Storage Graphic courtesy of SDSC datacentral
Local Cluster Storage • Normal site user/group permissions apply • TeraGrid users typically have individual accounts connected with their project team via usual uid/gid groups • Therefore normal containment/forensic tools work inside the system • GridFTP transfer from one resource to another • Dedicated GridFTP mover nodes for parallel systems • Dynamic GridFTP mover “fleet” direct from apps • Central TeraGrid Listener to gather system aggregate data • Modification to standard set to lift “vail of privacy” within TeraGrid • System metrics and diagnostics • Forensics analysis database • Shared NFS-like file system within a single site • GPFS, Lustre, NFS, PVFS,QFS, CXFS, …
“Global” File System • TeraGrid has central GPFS-WAN server at SDSC mounted by several clusters across the grid. • Pros • Common namespace • POSIX syntax for remote file access • Single Identity space (x509) across WAN • High speed parallel systems available • Cons • GPFS-WAN: IBM licensing and availability • Lustre-WAN: Lack of WAN security model • No group authZ construct support
Archived Storage • Just now beginning to deal with archived storage as an allocated resource. • Issues • Retention policy/guarantee • Media migration • Privacy/Security/Availability on abandoned files • Economic Model (NCAR has a “Euro” approach with common currency)
Using an SRB Data Grid - Details DB Storage Resource Broker Metadata Catalog Storage Resource Broker • User asks for data • Data request goes to SRB Server • Server looks up information in catalog • Catalog tells which SRB server has data • 1st server asks 2nd for data • The data is found and returned
Lessons Learned • Lesson from Stakkato was not (just) scale of attack, but rather importance of being able to restore control • In a connected world with agents this means • Virtual borders -- ALL > collaborators > pair-wise trusts • Centralized logging for forensics/IDS • USE THE SAME SYSTEM FOR DAILY OPERATIONS/METRICS ! • We must be able to (perhaps painfully) outpace attackers in cleaning system • Ease of use and ubiquity are essential to adoption • AFS’s change to directory permission from file permissions had a huge adoption barrier cost
Lessons Learned • Work is needed on distributed group authorization/management tooling • Group membership and roles are best maintained by the leaders of the group • Policy rules are best kept and enforced by the data store • Security Triad: • Who you are • Where you can go • What you can do
Lessons Learned • Work is needed on distributed group authorization/management tooling • Group membership and roles are best maintained by the leaders of the group • Policy rules are best kept and enforced by the data store • Security Triad: • Who you are • Where you can go • What you can do
Lessons Learned • Work is needed on distributed group authorization/management tooling • Group membership and roles are best maintained by the leaders of the group • Policy rules are best kept and enforced by the data store • Security Triad: • Who you are • Where you can go • What you can do • Some actions are so dangerous that they deserve to have the 2 person rule enforced • (e.g. archive tape erasure)
Lessons Learned • Security is never “done” • The coordination team (building) from the Stakkato incident was THE most valuable result.
Storage Resource Broker Security in Distributed Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, sekar}@sdsc.edu http://www.sdsc.edu/srb http://irods.sdsc.edu/
Logical Name Spaces Logical User name Unique identifier for each person accessing the system {User-name, project-name} User groups - aggregations of users Membership in multiple groups Data grids (zones) {user-name, project-name, zone-name}
Authorization - SRB Assign access controls on each name space Files Metadata Storage Assign roles that represent sets of allowed operations Role - administrator, curator, read, write, annotate
Rule-based Data Management iRODS (integrated Rule Oriented Data System) Map from management policies to rules controlling execution of remote micro-services Manage persistent state information for results of micro-service execution Support an additional three logical name spaces Rules Micro-services Persistent state information
Controlling Remote Operations iRODS - integrated Rule-Oriented Data System
Rule-based Access Associate security policies with each digital entity Redaction, access controls on structures within a file Time-dependent access controls (how long to hold data proprietary) Associate access controls with each rule Restrict ability to modify, apply rules Associate access controls with each micro-service Explicit control of operation execution within a given collection Much finer control than provided by Unix r:w:e
For More Information Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/
Call for Participation • Papers, tutorials, posters, BOFs, and demonstrations are being accepted through February in three tracks: Science, Technology, and Education, Outreach and Training • Submissions are being accepted through April for three competitions for high school, undergraduate and graduate students: • Impact of Cyberinfrastructure • Research posters • On-site advancing scientific discovery