630 likes | 787 Views
Development & Implementation of an Inter-institutional Multi-purpose Grid SURAgrid, 11/22/05 UNC-Charlotte: Grid Computing-ITSC 4010-001. Mary Fran Yafchak, SURA Jim Jokl, University of Virginia Art Vandenberg, Georgia State University. Presentation agenda.
E N D
Development & Implementation of an Inter-institutional Multi-purpose GridSURAgrid, 11/22/05UNC-Charlotte: Grid Computing-ITSC 4010-001 Mary Fran Yafchak, SURA Jim Jokl, University of Virginia Art Vandenberg, Georgia State University
Presentation agenda • About SURAgrid - Mary Fran Yafchak • SURAgrid build/portal - MF Yafchak • SURAgrid authN/authZ - Jim Jokl • SURAgrid applications - Art Vandenberg • Q&A - All This is a living, breathing project. Exchange of ideas encouraged!
About SURAgrid • A “beyond regional” initiative in support of SURA regional strategy “Mini-About” SURA: • SURA region: 16 states & DC; Delaware to Texas • SURA membership: 62 SE research universities • SURA mission: Foster excellence in scientific research, strengthen capabilities, provide training opportunities • Evolved from the NMI Testbed Grid project, part of the NMI Integration Testbed Program • http://www1.sura.org/3000/NMI-Testbed.html
SURAgrid Goals • SURAgrid: Organizations collaborating to bring grids to the level of seamless, shared infrastructure • Goals: • To develop grid infrastructure that is scalable and that leverages local identity and authorization while managing access to shared resources • To promote use of this infrastructure for the broad research and education community • To provide a forum for participants to share experience with grid technology, and participate in collaborative project development
University of Alabama at Birmingham* University of Alabama in Huntsville* University of Arkansas* University of Florida* George Mason University* Georgia State University* Great Plains Network University of Kentucky* University of Louisiana at Lafayette* Louisiana State University* University of Michigan Mississippi Center for SuperComputing Research* University of North Carolina, Charlotte North Carolina State University* Old Dominion University* University of South Carolina* University of Southern California Southeastern Universities Research Association (SURA)** Texas A&M University* Texas Advanced Computing Center (TACC)* Texas Tech University Tulane University* Vanderbilt University* University of Virginia* Resources on grid *SURA member **Project planning SURAgrid Participants
Focus Areas • Authentication & Authorization • Themes: maintain local autonomy, leverage enterprise infrastructure • Grid-Building • Themes: heterogeneity, flexibility, interoperability, scalability • Application Development • Themes: immediate benefit to applications, applications drive development • Project Planning • Themes: cooperative, representative, sustainable
In the Coming Months… • Continue evolving key areas • Grow and solidify grid infrastructure • Continue expanding and exploring authN/authZ • Identify & “grid-enable” new applications • “Formal” work on organizational definition • Charter, membership, policies, governance • Develop funding & collaboration opportunities • Some areas of interest: scalable mechanisms for shared, dynamic access; interoperability in grid products; grid-enabling applications; grids for education; broadening participation; support and management of large-scale grid operations
(Ashok Adiga, Texas Advanced Computing Ctr.)Building SURAgrid& SURAgrid portal
SURAgrid Software Requirements • SURAgrid supports dedicated & non-dedicated compute nodes • Non-dedicated nodes are typically shared across multiple grids, • Could have constraints on the software that can be installed • Must allow resource owner to set usage policies • Dedicated nodes run only SURAgrid jobs • Common software stack being defined for dedicated nodes • Will consider using packaged Grid solutions • Virtual Data Toolkit (VDT) • NSF Middleware Initiative (NMI Grids)
Configuring Non-dedicated nodes • Non-dedicated nodes support basic grid services • Document simple process to add resources to the grid • Job & data management • Install Globus (pre-web services GRAM & gridftp) • Authentication • Cross sign CA certificates with Bridge CA • Work with individual resource owners to get authorized • Resource monitoring • Install GPIR perl provider scripts on resource and add resource description to User Portal
SURAgrid Resource Status • Number of Compute Clusters: 14 • Total number of CPUs: 611 • Peak GigaFlops: 1,367 • Memory (GigaBytes): 621 • Storage (GigaBytes): 5,645
Motivation for User Portals • Make joining the SURAgrid easier for users • Single place for users to find user information and get user support • Certain information can be displayed better in a web page than in a command shell • Allow novice users to start using grid resources securely through a Web interface • Increase productivity of SURAgrid researchers – do more science!
What is a Grid User Portal? • In general - a gateway to a set of distributed services accessible from a Web browser • Provides • Aggregation of different services as a set of Web pages • Single URL • Single Sign-On • Personalization • Customization
Characteristics of a User Portal • A User Portal can include the following services: • Documentation Services • Notification Services • User Support Services • Allocations • Accounts • Training • Consulting
User Portal Characteristics (cont’d) • Collaborative Services • Calendar • Chat • Resource sharing • Information Services • Resource • Grid-wide • Interactive Services • Manage Jobs & Data • Doesn’t replace the command shell but provides a simpler, alternative interface
Service Aggregation User Support Consulting Notification User News Collaborative Calendar Chat Documentation User Guides Information Resource Grid Interactive Job Submission File Transfer HTTP/SSL/SOAP GSI User Portal HTTP/SSL Client Browser
Portal Built Using GridPort 4 • Developed at TACC & San Diego State • Interface to grid technologies • GRAM, GridFTP, MyProxy, WSRF, science applications • Includes: • Portal framework-independent “portlets” • Expose backend services as customizable web interfaces • Small changes allow portlets to run in any JSR-168 compliant portal framework (e.g., uPortal, WebSphere, Jetspeed; installs into Gridsphere by default) • Portal services • Run in the same web container as portlets • Provide portlet cohesion and portal framework level support
Single sign-on to access all grid resources • Documentation tab has details on: • Adding resources to the grid • Setting up user ids and uploading proxy certificates
Information Services • Resource-level view • State information about individual resources • Queue, Status, Load, OS Version, Uptime, Software, etc.. • Grid-level view • Grid-wide network performance • Aggregated capability • GPIR information Web Service • Collects and provides information above
Resource Monitoring http://gridportal.sura.org
Interactive Services • Security • Hidden from the user as much as possible • File Management • Upload • Download • Transfer between resources • Job Submission to a single resource • Job Submission to a grid meta-scheduler (future) • Composite Job Sequencing (future)
Proxy Management • Upload proxy certificates to MyProxy server • Portal provides support for selecting a proxy certificate to be used in a user session
File Management • List directories, Move files between grid resources, Upload/download files from local machine
Job Management • Submit Jobs for execution on remote grid resources • Check status of, cancel and delete submitted jobs
Future Directions • User Portal currently offers basic user, informational and interactive services. • Build on other services such as user support • Need to expand services as grid grows • Resource broker to automatically select resource for job execution • Workflow support for automation and better utilization of grid resources • Reliable file transfer services • Build customized application portlets
SURAgrid Authentication • Goal • Develop a scalable inter-campus solution • Preferred mechanisms • Leverage campus middleware activities • Researchers should not need to operate their own authentication systems • Use local campus credentials inter-institutionally • Rely on existing higher education inter-institutional authentication efforts
Inter-campus Globus Authentication • Globus uses PKI credentials for authentication • Leverage native campus PKI credentials on SURAgrid • Users do all of their work using local campus PKI credentials • How do we create the inter-campus trust fabric? • Standard inter-campus PKI trust mechanisms include • Operating a single Grid CA or trusting other campus CAs • Cross-certification and Bridge PKIs • How well does Globus operate in a bridged PKI? • OpenSSL PKI in Globus is not bridge-aware • Known to work from NMI Testbed project • Decision: intercampus trust based on a PKI Bridge • Leverage EDUCAUSE Higher Education Bridge CA (HEBCA) when ready
Background: Cross-certification I: UABS: UAB I: UVAS: UVA • Top section • Traditional hierarchical validation example • Bottom section • Validation using cross certification example • UVA signed a certificate request from the UAB CA • UAB signed a certificate request from the UVA CA • This pair of cross certificates enables each school to trust certs from the other using only their own root as a trust anchor • An n2 problem I: UABS: User-2 I: UVAS: User-1 I: UABS: UAB I: UVAS: UVA I: UABS: UVA Cross Certs I: UVAS: UAB I: UVAS: User-1 I: UABS: User-2
Bridge CA Cross-certificate pairs Campus A Campus B Campus n Mid-A Mid-B User A1 User B1 User B1 User A2 Background: Bridged PKI • Used to enable trust between multiple hierarchical CAs • Generally more infrastructure than just the cross-certificate pairs • Typically involves strong policy & practices • Solves the n2 problem • For SURAgrid we preload cross-certs
SURAgrid Authentication Schematic Campus F Grid F’s PKI SURAgrid Bridge CA Campus E Grid E’s PKI Cross-cert pairs D’s PKI Campus D Grid A’s PKI B’s PKI C’s PKI Campus A Grid Campus B Grid Campus C Grid
SURAgrid Authentication Status • SURAgrid Bridge CA • Off-line system • Used Linux and OpenSSL to build bridge • Cross-certifications with the bridge complete or in progress for 8 SURAgrid sites • Several more planned in near future • SURAgrid Bridge Web Site • Interesting PKI issues discussed in paper
Higher Education Bridge Certification Authority (HEBCA) • A project of EDUCAUSE • Implement a bridge for higher education based on the Federal PKI bridge model • Support both campus PKIs and sector hierarchical PKIs • Cross-certify with the Federal bridge (and others as appropriate) • Should form an excellent permanent trust fabric for a bridge-based Grid
Model SURAgrid Authentication Campus F Grid F’s PKI HEBCA Campus E Grid E’s PKI Cross-cert pairs D’s PKI Campus D Grid A’s PKI B’s PKI C’s PKI Campus A Grid Campus B Grid Campus C Grid
FBCA HEBCA SAFE Commercial Others Bridge to Bridge Context • A federal view on how the inter-bridge environment is likely to develop • FBCA – Federal Bridge • SAFE – Pharmaceutical • HEBCA – Higher Ed • Commercial - aerospace and defense • Grid extensible across PKI bridges?
SURAgrid AuthN/AuthZ Status • Bridge CA and cross-certification process • Forms the basic AuthN infrastructure • Builds a trust fabric that enables each site to trust the certificates issued by the other sites • The grid-mapfile • Controls the basic (binary) AuthZ process • Sites add certificate Subject DNs from remote sites to their grid-mapfile based on email from SURAgrid sites
SURAgrid AuthZ Development • Grid-mapfile automation • Sites that use a recent version of Globus will use a LDAP callout that replaces the grid-mapfile • For other sites there will be some software that provides and updates a grid-mapfile for their gatekeeper
SURAgrid AuthZ Development • LDAP AuthZ Directory • Web interface for site administrators to add and remove their SURAgrid users • Directory holds and coordinates • Certificate Subject DN • Unix login name (prefixed by school initials) • Allocated Unix UID (high numbers) • Some Unix GIDs? (high numbers) • Perhaps SSH public key, perhaps gsissh only • Other (tbd) • Reliability • Replication to sites that want local copies
SURAgrid AuthZ Development • Sites contributing non-dedicated resources to SURAgrid greatly complicate the equation • We will provide a code template for editing grid-mapfiles to manage SURAgrid users • Publish our LDAP schema • Sites may query LDAP to implement their own SURAgrid AuthZ/AuthN interface
Likely SURAgrid AuthZ Directions and Research • User directory or directory access • Group management • Person attributes • VO names • Store per-person, per-group allocations • Integrate with accounting • Local and remote stop-lists • Resource directory • Hold resource usage policies • Time of day, classifications, etc • Mapping users to resources within resource policy constraints • We’ll learn a lot more about what is actually required as we work with the early user groups
Art Vandenberg, Georgia State UniversityApplications on SURAgrid
SURAgrid Applications • Need applications to inform and drive development • Want to be of immediate service to real applications • Believe in grids as infrastructure • but not “if you build it they will come”… • Identifying & Fostering Applications
Proposed Application Process • Continuing survey of applications • Catalog of Grid Applications; similar agency and partner databases; survey of SURA membership • Identify target applications • Region significance, multi-institutional, intersection other e-Science • Illustrating grid benefits • Test it • Globus, authN-Z/BridgeCA, compilers, portal… and more • Implementation options 1) Immediate deployment 2) Demonstration deployment opportunities 3) Combined with proposal development
Catalog of Grid Applications • http://art11.gsu.edu:8080/grid_cat/index5.jsp • Researchers of grid, grid potential applications • Initial intent just to see who's doing what • Potentially larger resource (collaboration, regional perspective, overall trends) • 20 sites, 475+ researchers • Current focus: • Automated maintenance • Improved search, browse
Identify an Applications Base • Build from application activities already underway in SURAgrid • Integrate with regional strategy (SURA HPC-Grid Initiatives Planning Group) • Apply additional resources • Seeking additional collaboration, external funding • Achieve critical mass • Seek FUNDING
SURAgrid Applications • SCOOP/ADCIRC (UNC, RENCI, MCNC, SCOOP partners, SURAgrid partners) • Multiple Genome Alignment (GSU, UAB, UVA) • ENDYNE (TTU) • Task Farming (LSU) • Data Mining on the Grid (UAH) • BLAST (UAB) • … and more …
SCOOP/ADCIRC- UNC, RENCI, MCNC, SCOOP Partners, SURAgrid Participants • SURA program to create infrastructure for distributed Integrated Ocean Observing System (IOOS) in the southeast • Shared means for acquisition of observational data • Enables modeling, analysis and delivery of real-time data • SCOOP will serve as a model for national effort • http://www1.sura.org/3000/3300_Coastal.html • SCOOP/ADCIRC: forecast storm surge • resource selection (query MDS) • build package (application & data) • send package to resource (gridftp) • run adcirc in mpi mode (globus rsl & qsub) • retrieve results from resource (gridftp)
Left: ADCIRC max water level for 72 hr forecast starting 29 Aug 2005,driven by the "usual, always-available” ETA winds. Right: ADCIRC max water level over ALL of UFL ensemble wind fields for 72 hr forecast starting 29 Aug 2005, driven by “UFL always-available” ETA winds. Images credit: Brian O. Blanton, Dept of Marine Sciences, UNC Chapel Hill SCOOP/ADCIRC…
SCOOP/ADCIRC Results SURAgrid U. Kentucky (CCS-UKY, 48 CPU/230 Gflops/48G RAM, 500G Disk) • -rwx------ 1 howard howard 1458444 Sep 14 13:39 adcirc.x • -rwx------ 1 howard howard 12 Sep 14 13:39 adcpost.inp • -rwx------ 1 howard howard 843813 Sep 14 13:39 adcpost.x • -rw------- 1 howard howard 29 Sep 14 13:39 adcprep.inp • -rwx------ 1 howard howard 1150926 Sep 14 13:39 adcprep.x • -rwx------ 1 howard howard 915 Sep 14 13:39 execute_parallel_bundle.sh • -rwx------ 1 howard howard 3042520 Sep 14 13:39 fort.14 • -rw------- 1 howard howard 64545 Sep 14 13:39 fort.15 • -rw------- 1 howard howard 19804050 Sep 14 13:39 fort.22 • -rw-rw-r-- 1 howard howard 1444457 Sep 14 16:17 fort.61 • -rw-rw-r-- 1 howard howard 202457 Sep 14 16:17 fort.62Results stored in fort.61 - 64 • -rw-rw-r-- 1 howard howard 105626297 Sep 14 16:18 fort.63 • -rw-rw-r-- 1 howard howard 169753697 Sep 14 16:19 fort.64 • -rw------- 1 howard howard 1257568 Sep 14 13:39 fort.68 • -rw-rw-r-- 1 howard howard 1326004 Sep 14 13:40 fort.80 • -rw------- 1 howard howard 3940266 Sep 14 13:40 metis_graph.txt • -rwx------ 1 howard howard 1802370 Sep 14 13:39 padcirc.x • -rw-rw-r-- 1 howard howard 403 Sep 14 13:39 pbs_sub-howard • -rw-r--r-- 1 howard howard 1028 Sep 14 13:39 pbs_sub-howard.e125698 • -rw-r--r-- 1 howard howard 91 Sep 14 13:39 pbs_sub-howard.o125698drwxrwxr • -x 2 howard howard 4096 Sep 14 13:41 PE0000drwxrwxr • -x 2 howard howard 4096 Sep 14 13:41 PE0001drwxrwxr Directories created by job • -x 2 howard howard 4096 Sep 14 13:41 PE0002drwxrwxr • -x 2 howard howard 4096 Sep 14 13:41 PE0003
SCOOP/ADCIRC - Challenges • resource selection (query MDS) • Expect MDS to be hosted on resource being queried. CCS-UKY actually pointed to NCSA for their MDS; needed to implement MDS on CCS-UKY as well (essentially CCS-UKY part of multiple MDS) • build package (application & data) • Must address incompatibility between GT3 and GT2 style proxies; must use “-old” option to GT3’s grid-proxy-init to get GT2 style proxy which ADCIRC currently expects • send package to resource (gridftp) • Staff availability… • run adcirc in mpi mode (globus rsl & qsub) • retrieve results from resource (gridftp)