10 likes | 74 Views
Making relational data available on the Grid: A survey of methods from CGI to OGSA-DAI Michael Grobe Indiana University 1 Introduction
E N D
Making relational data available on the Grid: A survey of methods from CGI to OGSA-DAI Michael Grobe Indiana University 1 Introduction This poster presents several alternative approaches to making (read-only) relational data sources available over a computational Grid as an aid for information providers who may wish to make their data available. The various alternatives help define what it means to “put data on the Grid.” 2 JDBC Java DataBase Connectivity (JDBC) is used within each approach presented here, but there exist several alternative ways in which to use it. The major question are: From where should the JDBC calls be made, and what user interfaces can be used? The target database for demonstration examples in this poster is the Centralized Life Sciences Data (CLSD) service at Indiana University [5]. CLSD presents a collection of life science data converted to relational form and/or federated into a single relational database managed by an IBM DB2 database management system (DBMS). Figure 0 shows shows a Java code fragment that will interact with CLSD via JDBC. 3 CGI and Java Servlet Engines There exists several platforms that can be used to pass dynamic queries to database systems, two of the most important being the Common Gateway Interface (CGI) and Java Servlet “engines”, such as Apache Tomcat. Both of these interfaces rely upon HTTP POST. Here is an example POST request to an Apache HTTP server running on a computer housing a file named “/cgi-bin/getdata.pl” in its filesystem: POST /cgi-bin/getdata.pl HTTP/1.0 Host: my.host.edu:8080 Accept:text/html,text/plain,image/* Accept-Encoding:gzip,compress Accept-Language:en Pragma:no-cache Cache-Control:no-cache User-Agent:Lynx/2.8.5dev.7 Content-type:application/x-www-form-urlencoded Content-length:83 ..blank line.. Query=select+tabschema,tabname+from+syscat.tables &account=testaccount&password=test This request defines three variables to be passed to the script within /cgi-bin/getdata.pl: an SQL select command, an account name, and an account password. When the server receives this request, it will start the script, and arrange for script output to be passed back to the browser. 4 Web Services, JAX-RPC and SOAP There is a separate set of technologies for performing so-called "remote procedure calls," which allow a program running on one computer to call functions running on another computer. There have been several approaches to remote procedure calls in the Web world. In general they allow the definition of remote procedure calls as "Web Services". (This is NOT the same as accessing an arbitrary Web page or using a CGI or Servlet engine script to build pages on-the-fly.) The most important such approach for purposes of this discussion is the Java API over XML for Remote Procedure Calls (JAX-RPC). Here, the Extensible Markup Langage (XML) is used to exchange data between a program and its (remote) functions. Figure 1 shows an outline for a Web Service called “CLSDservice” that “exposes” a method named “queryCLSD”. queryCLSD sends an SQL command to CLSD via DB2 and returns all or part of the resulting table. queryCLSD requires 6 parameters: the SQL command, the number of the first row of the resulting table to return, the total number of rows to return, an account name, that account’s password, and the format in which to return the result. Here is a Perl command-line client program that accesses this CLSDservice: #!perl -w use SOAP::Lite; # Set up the call to CLSD using SOAP. $host = “discover.uits.indiana.edu”; $service = SOAP::Lite -> service( “http://$host:8421/axis/CLSDservice.jws?wsdl” ); # Make the call to CLSD. $result = $service->queryCLSD( “select tabschema,tabname from syscat.tables”, 1, 5, "DB2account", "password“,“CSV“ ); print $result; This program does NOT need an interface to the database, since it doesn’t interact with the database directly, but rather through the “queryCLSD” method provided by the CLSDservice. This program DOES, however, require the SOAP::Lite Perl module to orchestrate interaction with CLSDservice, which then uses JDBC to interact with DB2. SOAP::Lite will embed the function call in an HTTP request within a SOAP “envelope”, transmit the request, receive and decode the response, which will also be embedded in a SOAP envelope, and return the result to the calling routine. Here is an example JAX-RPC POST generated by the above call to queryCLSD (highly edited for readability): POST http://localhost:8421/axis/CLSDservice.jws Accept: text/xml Accept: multipart/* Content-Length: 1009 Content-Type: text/xml; charset=utf-8 SOAPAction: "" ..blank line.. <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope SOAP-ENV:encodingStyle= "http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <namesp1:queryCLSD"> <query xsi:type="ns:string"> select tabschema,tabname from syscat.tables </query> <startingRowToPrint xsi:type="ns:string"> 1 </startingRowToPrint> <maxRows xsi:type="ns:string"> 5 </maxRows> <account xsi:type="ns:string"> DB2account </account> <password xsi:type="ns:string"> password </password> <format xsi:type=”ns:string”> CSV </format> </namesp1:queryCLSD> </SOAP-ENV:Body> </SOAP-ENV:Envelope> On the server side, the Apache Axis Web Services container [1] will arrange for parameters to be delivered to the queryCLSD method within the CLSDservice. This approach allows the creation of lightweight desktop, Servlet, Portlet, and AJAX applications that can access CLSD or other relational data resources from anywhere on the Internet. 5 WSRF The Web Services Resource Framework (WSRF) extends Web Services by defining a “Web Services Resource” [7] that provides: - access to a Web Service, typically one that can interact with a real-world computational resource, plus - a persistent property list containing information relevant to that service, and - a standardized list of methods that clients can use to manipulate those properties. For example, a WS-Resource may provide the capability to submit a batch job to a supercomputer queue, along with a property list that describes that queue, such as queue length, mean expected wait time, max number of processors available, all within the Grid Security Interface (GSI) environment, etc. WSRF is actually a collection of specifications that describe different aspects of managing Resource services and properties: Fig. 0. Using Java to access CLSD (without try/catch). // Declare JDBC drivers and connect to DB2. Class.forName( "com.ibm.db2.jcc.DB2Driver" ); Connection con = DriverManager.getConnection( "jdbc:db2://hostname.uits.iu.edu:50000/clsd2", account, password ); // Prepare a JDBC statement containing an SQL query, // submit it to DB2, and capture the returned result set. Statement statement = con.createStatement(); ResultSet resultSet = statement.executeQuery( query ); // Query result set metadata for column names and types // to return as the first row, and... ResultSetMetaData rsmd = resultSet.getMetaData(); int numcols = rsmd.getColumnCount(); String resultLine =“”; for( int colCount = 1; colCount <= numcols; colCount++ ) { String result = rsmd.getColumnLabel( colCount ); resultLine += “ “ + result; } System.out.println( resultLine ); // ...then collect and print the contents of each data row. while ( resultSet.next() ) { resultLine = “”; for( int colCount = 1; colCount <= numcols; colCount++ ) { String returned = resultSet.getString( colCount ); resultLine += “ “ + returned; } System.out.println( resultLine ); } Fig. 1. Outline of queryCLSD method in CLSDservice class. public class CLSDservice { public String queryCLSD( String query, String startingRowToPrint,String maxRows, String account, String password, String format ) { // Process input parameters. // Access CLSD as shown in Fig. 0. // Return results as single String. } // end queryCLSD } // end Class CLSDservice Fig. 2. Example client to access an OGSA-DAI resource. public class queryCLSD { public static void main(String[] args) throws Exception { // Create an instance of the data service. String handle = http://localhost/wsrf/ + “services/ogsadai/DataService"; String id = "DB2Resource"; DataService service = GenericServiceFetcher.getInstance().\ getDataService(handle, id); // Define a request composed of one activity. SQLQuery query = new SQLQuery( "select tabschema,tabname from syscat.tables"); WebRowSet rowset = new WebRowSet( query.getOutput() ); ActivityRequest request = new ActivityRequest(); request.add( query ); request.add( rowset ); // Submit the request and retrieve results. Response response = service.perform( request ); ResultSet result = rowset.getResultSet(); ResultSetMetaData rsmd = result.getMetaData(); int numCols = rsmd.getColumnCount(); // Display each column from each row. while( result.next() ) { for( int colCount = 1; colCount <= numCols; colCount++ ) { out.print( “ “ + result.getString(colCount) ); } out.println(); } } // end main } // end Class queryCLSD WS-Resource WS-ServiceGroup (WS-SG) WS-ResourceProperties (WS-RP) WS-BaseFaults (WS-BF) WS-ResourceLifetime (WS-RL) WSRF-RP, in particular, defines operations (also known as “PortTypes”) that can be provided to manipulate properties, such as: GetResourceProperty GetMultipleResourceProperties SetResourceProperties QueryResourceProperties The simplest client code for accessing CLSD will be very similar to the WebServices code. However, WSRF opens up other options, such as the ability to build data pipelines that pass EPRs identifying semi-permanent storage among applications, much as a file name is passed to a program subroutine in common programming practice. For example, one WSRF version of the CLSD service accepts an EPR as well as an SQL command, stores query results in the resource identified by the EPR, and returns control to the client. 6 OGSA and OGSA-DAI The Open Grid Services Architecture (OGSA) “…defines a set of core capabilities and behaviors that address key concerns in Grid systems.” [2] It does not, however, implement or define how to implement such core capabilities. In general, “Grid systems aim to integrate, virtualize, and manage resources and services within distributed, heterogenous, dynamic virtual organizations.” [2] Virtualization brings multiple technologies to bear to insulate users and developers from the physical locations, network addresses, administrative details, and underlying software infrastructures that enable access to higher-level services and data. OGSA Data Access and Integration (OGSA-DAI) [6] is a very flexible and powerful data access framework that can be used within an OGSA grid environment. It provides various data movement, virtualization, and manipulation services that transform the use of data into a higher-level workflow. OGSA-DAI has been implemented over WSRF as a higher-level OGSA service. It supports the construction of configurable WSRF-based services that take client requests for both access to, and transformation of, data which can be returned to clients in a variety of ways. An OGSA-DAI service can replace the CLSD Web Service and the WSRF-based web services just described; it can accept SQL queries, relay them to DB2, and return DB2 responses to the user. Users do not have to write their own Web Services to access their data sources, but can simply install the OGSA-DAI service over a WSRF container and create clients for that OGSA-DAI service. The OGSA-DAI Client Toolkit [6] provides an interface to many OGSA-DAI services. The Java client shown in Figure 2 uses the Client Toolkit to send a hard-coded query to CLSD (here known as the “DB2Resource”) and retrieve an OGSA-DAI WebRowSet. It then creates a JDBC ResultSet object from the WebRowSet, and returns data to the user via JDBC calls to retrieve individual fields within each returned row. This script displays only a small part of the functionality provided by OGSA-DAI. In addition, an OGSA-DAI service can be configured to: - operate on XML or text data sources, as well as relational data sources, - perform a series of operations (also known as “activities”) as part of a single request, - deliver results to a third party (via FTP, GridFTP, SMTP, etc.) or to another data service, - deliver results asynchronously, which can be very useful for long-running requests, and - utilize authentication methods supported by WSRF to provide grid-based security. Also, exposing a database via OGSA-DAI makes it available for OGSA Distributed Query Processing (OGSA-DQP), so that its use may be further virtualized within the DQP model. (In some cases, however, OGSA-DAI and DQP may introduce performance penalties.) Acknowledgments Thanks to Lisa Childers, et al. for a great tutorial on WSRF at TeraGrid 2006. Thanks to Andy Arenson and Scott McCaulay for providing the opportunity to prepare this poster. References [1] Apache Foundation, “Axis User’s Guide”. [2] Foster, Ian, et al. “The Open Grid Systems Architecture, Version 1.5”, 2006. [3] Globus Alliance, “How to Build a Service Using GT4”. [4] Globus Alliance, The Globus Grid Toolkit [5] Indiana University, The Centralized Life Sciences Data (CLSD) Service. [6] Open Middleware Infrastructure Institute, “The OGSA-DAI Project” [7] Sotomayer, Boria and Lisa Childers, Globus Toolkit 4: Programming Java Services, Morgan-Kaufmann Publishers, San Francisco, 2006.