380 likes | 554 Views
“Grid-enabling” applications. ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007. “Grid-enabling”. A poorly defined and understood term! One simple definition:
E N D
“Grid-enabling” applications ITCS 4146/5146 Grid Computing, 2007, UNC-Charlotte, B. Wilkinson. March 27, 2007
“Grid-enabling” • A poorly defined and understood term! • One simple definition: • Being able to execute an application on a grid platform, using the distributed resources available on that platform.
Another definition from the literature: “Turning an existing application, installed on a Grid resource, into a service and generating the application-specific user interfaces to use that application through a web portal.”1 This definition assumes a portal interface and the use of services. 1 From: "A Service-Oriented, Scalable Approach to Grid-Enabling of Legacy Scientific Applications" by Sanjeepan, Vivekananthan; Matsunaga, Andrea; Zhu, Liping; Lam, Herman; Fortes, Jose A.B. Proc. of 2005 Int. Conf.on Web Services (ICWS-2005), Orlando, Florida, p.553-560, 11-15 July, 2005.
How does one do “Grid-enabling”? • Still an open question and in the research domain without a standard approach. Here will describe various approaches.
Simple “grid-enabling” First step • Simply running an application on a grid resource. • Might just mean making sure executable and input files and available to the application. • Not exactly making the most of the grid platform!
Best types of applications for grid-enabling • One homogeneous application that needs to be executed multiple times with different arguments (“parameter sweep”) – perfect • Computational intensive • a high 'compute time' vs. 'communication time' ratio • An MPI type parallel application with minimal message-passing between grid sites
Parameter Sweep Examples • Molecular biologist (drug designer) looking for compounds in large chemical data sets that best dock with a particular protein • Geologist looking at change in density and depth of ore-body and overlying rock’s density to optimise cost and production • Aerospace engineer understanding role of geometry parameters in aerodynamic design and optimization process • High energy physicist investigating origin of mass by analyzing petabytes of data generated by high-energy accelerators such as the LHC (Large Hadron Collider) • Neuroscientist performing brain activity analysis by conducting pair-wise cross co-relation analysis of MEG (Magneto-EncephaloGraphy) sensors data Source: Alchemi project.
Grid-enabling MPI programs • Globus version of MPI available to run MPI jobs across a grid (MPICH-G2). Message passing can cross sites: http://www.globus.org/grid_software/computation/mpich-g2.php
MPICH-G2 programs • Ideally one can simply run the MPI job unmodified across the grid. • However not that simple
Problems: • Firewalls: Need to accommodate firewalls by opening up ports • Job Schedulers: Each site will have a separate independent local job scheduler, which will mean can guarantee all MPI processes will be operating at different sites at the same time to communicate. (This issue does not seem to be mentioned in MPICH-G2 documentation.) • Latency: The delays in messages in transit are much larger and variable between sites (Internet)
More advanced “grid-enabling” Some strategies: • Using Globus and Grid service APIs • Using Grid wrappers to form services • Higher-level toolkits
1. Using Globus APIs Globus provides a suite of services that have APIs1 (C and Java interfaces) that could be called from the application. 1 API: An application programming interface is a source code interface that a computer system or program library provides in order to support requests for services to be made of it by a computer program. http://en.wikipedia.org/wiki/API
Examples • GridFTP for high performance file transfers. • MDS (Monitoring and Discovery Service) for resource monitoring and discovery. Provides information about available grid resources and their status • RLS Replicator locator service: maintains and provides access to mapping information from logical names for data items to target names - a database that maps logical file names or file aliases to physical location. • GASS – Global Access to Secondary Storage:Provides mechanisms for transferring data between a remote HTTP, FTP, or GASS server. Condor-G uses GASS to transfer the executable, stdin, stdout, and stderr to/from the remote resource.
Globus Services G T 4 Delegation Service Community Scheduler Framework [contribution] Python WS Core [contribution] C WS Core G T 4 Credential Management G T 3 CommunityAuthorization Service OGSA-DAI [Tech Preview] Web ServicesComponents WS Authentication Authorization Reliable File Transfer Java WS Core Grid Resource Allocation Mgmt (WS GRAM) Monitoring & Discovery System (MDS4) G T 2 Pre-WS Authentication Authorization GridFTP Grid Resource Allocation Mgmt (Pre-WS GRAM) Monitoring & Discovery System (MDS2) C Common Libraries Non-WS Components G T 3 Replica Location Service XIO Security Data Management Execution Management Information Services CommonRuntime
GridFTP • Built on FTP using separation of data and control channels • Provides features for • Large data transfers • Secure transfers • Fast transfers • Reliable transfers • Third party transfers • Not a web service • RTF (Reliable File Transfer) service provided WS-level interface
Third party transfers Client PI PI Server Server PI PI Control channels DTP DTP Data channel DTP= FTP Data Channel Process PI = FTP Protocol Interpreter
Performing a third-party transfer 1. Client establishes control channel with server 2. Using control channel, client sets up transfer parameters and requests data channel creation 3. Data channel established, 4. Client sends transfer command over control channel, 5. Data transfer starts through data channel. Either client or server can send.
Parallel transfers and striping • Using multiple (virtual) connections for transfer • Same external network • Speed improvement possible, but limited by network card • Striping • a version of parallel transfers that can use separate hardware interfaces • Implemented in GT 4.
GridFTP and RFT RFT service (Java) WS Client Client API (Java) Control channel Control channel Data channel XIO based (C) XIO based (C) GridFTP server GridFTP server From Gridwise
GT 4 Replica Location Service Index Index • Identify location of files via logical to physical name map • Distributed indexing of names, fault tolerant update protocols I Foster
Monitoring and Discovery • WSRF provides common mechanisms for monitoring and discovering a service. • Every GT 4 is discoverable
2. Grid service wrapper approach Providing a wrapper to make it possible to access application as a grid service Grid service Application Request One of our guest speakers (Joel Hollingsworth) will discuss this in more detail
3. Higher–level toolkits • Objective is to provide a suite of APIs that are system independent, to hides the underlying grid structure, and even that it is using Globus or any other lower-level grid middleware. • Examples: Grid Application Toolkit (GAT)
Grid Application Toolkit (GAT) • APIs for developing and executing portable grid applications that are independent of the underlying grid infrastructure and available services • GAT APIs used by application to access grid services • Essentially wrapper code that hides Globus API.
Deploying legacy code • For the most part, people want to re-use their existing high performance code. • Several projects to make this easier. Example GriddLeS: Grid Enabling Legacy Software http://www.csse.monash.edu.au/~davida/griddles/
Data Grids Data integration • Data integration is the capability to link different datasets together, thereby enabling users to interact with them as if they were a single, unified and homogenous resource.
OGSA-DAI ProjectOpen Grid Services ArchitectureData Access and IntegrationAim of the OGSA-DAI project is to develop middleware to assist with access and integration of data from separate sources via the grid.http://www.ogsadai.org.uk/
Grid-enabling a data resource using OGSA • “ … Placing it behind wrapper middleware for the Grid, e.g., OGSA-DAI. … • Once a data resource is Grid-enabled, its availability can be easily advertised in registries where advanced Grid middleware will know to find them and learn of their specific usage conditions for both access and update, as the case may be. ” http://www.ncess.ac.uk/learning/tutorials/datagrids/grid_en/why_grid_en_important/what_grid_en_involves/
http://www.ncess.ac.uk/learning/tutorials/datagrids/grid_en/why_grid_en_important/what_grid_en_involves/http://www.ncess.ac.uk/learning/tutorials/datagrids/grid_en/why_grid_en_important/what_grid_en_involves/
OGSA-DAI Architecture
What Next Mini-project: Will be discussed Thursday March 29th, 2007. PLEASE BE SURE TO ATTEND THIS CLASS Actually, mini-project will not start until April after MPI assignment, but next week have guest presentation.