E N D
Follow-on Proposal for Data Handles • I would like to make a proposal regarding data handles in response to the material presented by Laurent Philippe at GGF10. While the main concept is the same, I propose here some semantic clarifications that I think will be simpler, and may also map on to WSRF in a more direct manner. The main idea is to replace the “persistence mode” concept with a “bind operation”. • A data handle is essentially a reference to data that may reside anywhere. If we allow handles and data to be created separately, then the act of binding a handle to data is one of the basic operations that can be performed on a handle. While we may typically think of binding a handle to data, it is implicit that we are also binding a handle to refer to a specific machine that has the responsibility of maintaining the allocated storage for that data. This enables the virtualization of data since it can be read or written without knowing or caring where it is coming from or going to. The bind operation, however, allows the “location” of the data to be managed, or at least the host or IP address responsible for the data. It could even be possible that a data handle could be bound to an SRB or MCS client that read/writes data as those systems do. • In the examples that follow, a service that is given an unbound data handle for output has the default semantics of binding the handle to itself, i.e., accepting the responsibility to maintain the data and its storage. • All of the typical storage management issues reoccur. Data handles could be “dangling pointers” that point to nothing, and data items could be uncollected garbage pointed to by nobody. • Additional semantics are possible. Do we want to allow something like single-assignment semantics where a consumer can tell not only if a handle is bound but also if the associated data is actually present? Currently service completion can be tested through the function handle which implies that the output data is available. Do we want to allow a service to self-schedule when its input data becomes available? While such capabilities may be appealing, we might just want to Keep It Simple and avoid any complications beyond just the core semantics that are necessary. • Craig Lee, lee@aero.org, March 24, 2004
Operations on Data Handles(General, operational semantics without using exact function signatures) • create(data_handle_t *dh) • Create a new, unbound data handle. • bind() • Bind a DH to a specific data item or a machine. This allows the possibility of binding a DH to a third-party machine. (see next slide) • data_t read(data_handle_t dh) • Read (copy) the data referenced by the DH from whatever machine is maintaining the data. Reading on an unbound DH is an error. • write(data_t data, data_handle_t dh) • Write data to the machine, referenced by the DH, that is maintaining storage for it. Writing on an unbound DH could have the default semantics of binding to the local host. This storage does not necessarily have to be pre-allocated nor does the length have to be known in advance. • data_desc_t inspect(data_handle_t dh) • Allow the user to determine if the DH is bound, what machine is referenced, the length of the data, and possibly its structure. Could be returned as XML. • Bool free_data(data_handle_t dh) • Free the data (storage) referenced by the DH. • Bool free_handle(data_handle_t dh) • Free just the DH.
Follow-on Proposal of Craig Lee • The bind operation seems to be an interesting approach to replace the “persistency” mode. It avoids to manage a “persistency mode” on data for applications that does not need it. Moreover, data management using the bind operation is more implicit and thus more transparent, even if we need to manage the data. • However, the proposal should be completed with the possibility to bound an input data to the target server or repository, as explained in the discussion on the GridRPC mailing list. • Some of the following slides have been modified to present this idea and two solutions. The first solution adds a third parameter to the bind operation. The second solution adds a new method to set the transfer mode (by value or by reference). Note that in the first solution, the destination server for the data is explicitly set whereas in the second the server is implicit. • PS : we consider the recommendation made on the document untitled “A gridRPC model and API for end-users applications” by H. Nakada, S. Matsuoka, K. Seymour, J. Dongarra, C. Lee, H. Casanova in Global Grid Forum, Dec. 2003.
Operations on Data Handles Bind • To manage IN persistent data, 2 solutions : • Add information to the bind operation: • bind(data_handle_t dh, data_loc_t loc, data_loc_t site) • data_loc_t loc : data location (local machine or storage server • data_loc_t :location of the machine where data will be stored • if (site == NULL) data will be stored on the last computational server (client transparent) • if (site == loc) data copied to site (client or storage server) • if (site <> loc) data moved from loc to site. • Add a new method to set the call mode of a data: bind(data_handle_t dh, data_loc_t site) set_call_mode(data_handle_t dh, mode_t mode) • if (mode == reference) data not stored (=> volatile data) • if (mode == value) data stored on site (=> persistent data) • Default mode for data is reference, to avoid useless data to be stored in the platform
Extend bind operation, solution 1 * = server if it is known, NULL if not
set_mode operation, solution 2 Mode \ IO IN IN-OUT OUT PERSISTENT Client side create input data create input_DH bind(input_dh,input_data) set_call_mode(input_dh,value) create io data create io_DH bind(io_dh,io_data) set_call_mode(io_dh,value) // if we need the data read(io_dh); create output_DH set_call_mode(output_dh,value) // if we need the data read(io_dh); Server Side create output data bind(output_dh,server) VOLATILE Client side create input data create input_DH bind(input_dh,input_data) // default mode is reference create io data create io_DH bind(io_dh,io_data) // default mode is reference create output_DH // default mode is reference Server Side
create input data create input_DH bind input_DH to input data create output_DH bind output_DH to client not in solution 2 call( input_DH, output_DH ) read input_DH data sent execute service write output data on output_DH (output and input data are not subsequently available on this server) delete input data delete input_DH delete output data delete output_DH Simple RPC – without persistency Client Svc A Svc B
create input data create input_DH bind input_DH to input data,SvcA Create output_DH Bind output_DH to client call( input_DH, output_DH ) read input_DH data sent execute service write output data on output_DH (output data are not subsequently available on this server) input data is available delete input data delete input_DH delete output data delete output_DH Simple RPC – IN data persistent Sol1 Client Svc A Svc B
create input data create input_DH bind input_DH to input data Set_call_mode(dh,value) Create output_DH Bind output_DH to client call( input_DH, output_DH ) read input_DH data sent execute service write output data on output_DH (output data are not subsequently available on this server) input data is available delete input data delete input_DH delete output data delete output_DH Simple RPC – IN data persistent Sol2 Client Svc A Svc B
create input data create input_DH bind input_DH to input data create output_DH call( input_DH, output_DH ) Note: output_DH is unbound read input_DH data sent execute service create output data bind output data to output_DH, server return bound output_DH (output data still available on this server) read output_DH data sent free data referenced by output_DH (output data no longer available on this server) free output_DH Simple RPC w/ Unbound Data Handle Client Svc A Svc B
create input data create input_DH bind input_DH to input data create output_DH call( input_DH, output_DH ) Note: output_DH is unbound read input_DH data sent execute service create output data bind output data to output_DH return bound output_DH (output data still available on this server) create output2_DH bind output2_DH to client call( output_DH, output2_DH) execute service write data on output2_DH Two Successive RPCs on the Same Server Client Svc A Svc B
create input data create input_DH bind input_DH to input data create output_DH call( input_DH, output_DH ) Note: output_DH is unbound read input_DH data sent execute service create output data bind output data to output_DH return bound output_DH (output data still available on this server) create output2_DH bind output2_DH to client call( output_DH, output2_DH) read output_DH data sent execute service write data on output2_DH Two Successive RPCs on Different Servers Client Svc A Svc B