1 / 13

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids:

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid. William E. Johnston (wejohnston@lbl.gov) Lawrence Berkeley National Lab and NASA Ames. doesciencegrid.org. www.ipg.nasa.gov. Motivation for Science Grids.

sherry
Download Presentation

Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing and Data Infrastructure for Large-Scale Science Deploying Production Grids: NASA’s IPG and DOE’s Science Grid William E. Johnston (wejohnston@lbl.gov) Lawrence Berkeley National Lab and NASA Ames doesciencegrid.org www.ipg.nasa.gov

  2. Motivation for Science Grids • Large-scale science and research engineering are done through the interaction of geographically and organizationally dispersed • people • heterogeneous computing systems • data management systems • instruments • The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support collaborative science and engineering that is • widely distributed • multi-institutional • data intensive • collaborative

  3. Lessons Learned for Building Large-Scale Grids • Four main points: • deploying operational infrastructure • cross site trust • dealing with Grid technology scaling issues • listening to the users

  4. Basic Grid Services • There is a set of basic functions that all Grids must have in order to be called a Grid (the “neck of the hourglass” of Grids) • the Grid Security Infrastructure (“GSI” - the tools and libraries that provide Grid security) • the Grid Information Service (“GIS” - the basic resource discovery mechanism) • Grid job initiator mechanism • A basic data management mechanism such as GridFTP • Grid event mechanism (however, this is still under development) • Most of the deployment issues relate to these

  5. Lessons Learned for Building Large-Scale Grids • Like networks, successful Grids involve almost as much sociology as technology, and therefore establishing good working relationships among all of the people involved is essential.

  6. Deploying Operational Infrastructure • Establish an Engineering Working Group that involves the Grid deployment teams at each site • schedule weekly meetings / telecons • involve Globus experts in these meetings • establish an EngWG archived email list • Set up liaisons with the systems administrators for all systems that will be involved (computation and storage) • this is especially important if the resources that you expect to incorporate if your Grid are • not in your organization • not in your part of your organization

  7. Deploying Operational Infrastructure • Identify the computing and storage resources to be incorporated into the Grid • be sensitive to the fact that opening up systems to Grid users may turn lightly or moderately loaded systems into heavily loaded systems • batch schedulers may have to be installed on systems that previously did not use them in order to manage the increased load • carefully consider the issue of co-scheduling! • many potential Grid applications need this • only a few available schedulers provide it (e.g. PBSPro) • this is an important issue for building distributed systems

  8. Grid Information System • Plan for a GIS/GIIS sever at each distinct site with significant resources • this is important in order to avoid single points of failure • Structure of the GIIS is one of the basic scaling issues for Grids

  9. Cross Site Trust • Set up or identify a CA to issue Grid user identity certificates – the basis of the GSI • the basic trust management mechanism • The Certificate Policy Statement codifies how you will run your CA and to whom you will issue certificates • cross site trust is based on this • Don’t try and invent your own CPS! • Look at ESnet CP (envisage.es.net) and Grid Forum CP

  10. Defining / Understanding the Extent of “Your” Grid • The “boundaries” of a Grid are primarily determined by two factors: • what CAs you trust • this is explicitly configured in each Globus environment • however there is no guarantee that every resourcein what you think is “your” Grid trusts the same set of CAs – i.e. each resource potentially has a different space of users • in fact, this will be the norm if the resources are involved in multiple virtual organizations as they frequently are in the high energy physics experiment communities • how you scope the searching of the GIS/GIISs • this depends on the model that you choose for structuring your directory services

  11. Maintaining Local Control • Establish the conventions for the Globus mapfile • maps user Grid identities to system UIDs • this is the basic local control / authorization mechanism for each individual compute and storage platform

  12. Take Good Care of the Users as Early as Possible • Establish a Grid/Globus application specialist group • they should be running sample jobs as soon as the prototype-production system is operational • they should serve as the interface between users and the Globus system administrators to solve Globus related application problems • Identify early users and have the Grid/Globus application specialists assist them in getting jobs running on the Grid

  13. For the Full Talk grid.lbl.gov/~wej/Grids

More Related