240 likes | 379 Views
The HDF Group Q5 Demo. 5.6 HDF5 Transaction API 5.7 Full HDF5 Dynamic Data Structure. Q5 Highlights (I). New features added since Q4 (other than 5.6 & 5.7): Support for variable length datatypes Some deferred items from Q4 (mostly metadata get routines)
E N D
The HDF Group Q5 Demo 5.6 HDF5 Transaction API 5.7 Full HDF5 Dynamic Data Structure
Q5 Highlights (I) • New features added since Q4 (other than 5.6 & 5.7): • Support for variable length datatypes • Some deferred items from Q4 (mostly metadata get routines) • Data Integrity for metadata; although not helpful at the moment without a real storage backend. • Enable/Disable Data integrity checks through a property. • Can be done for metadata and raw data. • Still working with skeletal IOD on a local Linux box: • EMC has their first code demo this quarter. • Many tests (especially reads) are “faked” for now. • This is the main limitation for adding automated regression testing framework and benchmarks. • Accounts for new Cray system at LANL pending.
Q5 Highlights (II) • Testing: • More tests are always added with new features. • Tests verify correctness of asynchronous execution, axe dependencies, function shipping, and HDF5 to IOD translation (to a simple extent). • Try to cover all routines added. • Automated with a simple script to run tests on a local machine. • Still working more on a larger automated regression testing framework. • (should be available in Q6). • This Demo will focus on 2 main and new features added: • Transactions and Read Contexts • Dynamic Data Structure support (Map objects)
Outline • 5.6 HDF5 Transaction API: • Transactions and Versions recap from earlier technical presentation. • Diving more into semantics and usage. • Go through some pseudo code & actual code. • Run Demo. • 5.7 Full HDF5 Dynamic Data Structure: • Changes from Q4 • New Map objects and routines • Code example and Run Demo.
FastForward Transactions (I) • A transaction consists of a set of updates to a container. • container ≈ file • Updates are added to a transaction, not made directly to a container. • Updates include additions, deletions, and modifications. • Any number and size of updates may be included in a single transaction. • Tiny transactions may have high overhead. • Large transactions will amortize the overhead. • Multiple processes can add updates to a single transaction.
FastForward Transactions (II) • A transaction is finishedwhen no more updates will be added to the transaction. • Transactions can finish in any order. • The updates for a finished transaction are not visible in the container. • A finished transaction must be committed in order for its updates to become visible in the container. • Transactions are committed in strict numerical order. • When a transaction is committed, all updates in the transaction are applied atomically to the container and become visible. • If all updates cannot be applied, none are applied, and the transaction is discarded.
FastForward Container Versions • A version identifies a container at a given state. • The version number equals the number of the committed transaction that transitioned the container into the state. • A read context can be created for a container version. • The read context allows access to the contents of the container version. • A given container version is guaranteed to remain readable until all associated read contexts are closed.
Acquiring a Container Version • This is important, so that IOD will not flatten this version, and it is available to read from. • Only one process is required to acquire, but any number can. IOD will ref. count the acquired context. • If one process acquires a version, it can communicate the acquired version to other processes so that they do need to acquire it themselves. • The same number of release calls need to be issued as acquire calls. • [although different processes can release than acquired]
Transactions and Read Contexts in Q5 • Two new set of routines added to the HDF5 FastForward API: • H5TR for transactions • H5RC for Read Contexts • All new APIs are asynchronous. • All HDF5 read/get routines take Read Context IDs. • All HDF5 write/update routines take Transaction IDs. • Note that transactions specify a read context, so writes/updates happen within one. • H5Fcreate_ff() will always use up transaction 0, i.e. application starts using transaction 1. • H5Fopen_ff() will return an acquired read context with the latest readable version of the container.
Container Version Acquire/Release semantics /* Any Leader Process */ version = 15; /* acquire container version 15 */ rid = H5RCacquire(file_id, &version, H5P_DEFAULT, event_q); /* wait for the acquire to complete. This is not necessary, but user must live with the consequences that the acquire on a flattened version will fail and so all subsequent reads and the release call on rid will fail . */ /* If Leader has delegates that it wants to tell that it has acquired the container version, it has to wait for the acquire to complete before informing them */ H5EQwait(event_q, &num_requests, &status); MPI_Ibcast ()/ MPI_Isend() … /* Read from Container */ … /* If other processes were informed to use this version, must wait to hear from them before releasing */ MPI_Barrier()/MPI_Bcast()/MPI_Recv() … /* release container version 15. This is async. */ H5RCrelease(rid, event_q); /* Close RC ID. This is a local operation that just frees resources. */ H5RCclose(rid); … /* Wait on all Events, Everything was asynchronous thus far */ H5EQwait(event_q, &num_requests, &status); /* Any Delegate process */ MPI_Ibcast()/MPI_Irecv() … MPI_Wait(); /* client received a version – x = 15 ; create a read context ID. This is a local immediate operation. */ rid = H5RCcreate(file_id, x); /* Read from Container */ … /* Wait for all reads to complete */ H5EQwait(event_q, &num_requests, &status); /* Tell leader I am done with my reads.*/ MPI_Barrier()/MPI_Bcast()/MPI_Secv() … /* Close RC ID. This is a local operation that just frees resources. */ H5RCclose(rid1); …
Hints to H5RCacquire • It is possible that the user is not interested with one Exact version to acquire; i.e. does not want the acquire to fail if the version specified is not valid. • Through the Read Context Acquire property list, the user can specify the following hints to acquire: • H5RC_EXACT: Fail if the current version is not available (default). • H5RC_PREV: Acquire the highest version smaller than the one specified if it is not available. • H5RC_NEXT: Acquire the lowest version greater than the one specified if it is not available. • H5RC_LAST: Acquire the last readable version; this will ignore the version specified in acquire.
Other ways to Acquire a Version • When opening a container, the user can optionally ask to also acquire the last readable version of the container: • When finishing a transaction, the user can also optionally ask to acquire it into a read context: file_id= H5Fopen_ff(file_name, H5F_ACC_RDONLY, fapl_id, &rid, H5_EVENT_QUEUE_NULL); H5TRfinish(tid, H5P_DEFAULT, &rid, H5_EVENT_QUEUE_NULL);
Creating a Transaction • All operations that write/update the container must be part of a started transaction. • All transactions must be created within a read context, because some updates require reading from the container. • Transactions may be started by x leader processes and communicated to other delegate processes, or started by all processes. • Those are two different models of operation. • In the former case, it is not required that the processes who started the transaction be the one who finishes or aborts it.
Operating Models • Leaders (Red) • Will start transaction. • Will communicate that transaction has started and is available to do updates on to the delegates (blue). • Will hear back from delegates that they are done updating. • Will finish transaction; or designate a delegate to do it. • Same number of start and finish calls. • Delegates (Blue) • Will hear from leaders that transactions can be used. • Can write to transactions. • Have to inform leaders when done with updates to transactions.
Leaders/Delegates Model /* create transaction object with an already acquired read context 15 – this is a local immediate operation */ tid= H5TRcreate(file_id, rid, (uint64_t)20); /* start transaction 20 with default model, i.e. Leader Model. */ if( I am a leader process) /* This is asynchronous, but here we make it synchronous so we can tell the delegates that it has been started */ ret = H5TRstart(tid, H5P_DEFAULT, H5_EVENT_QUEUE_NULL); trans_num = 20; /* Tell other procs that transaction 20 is started */ MPI_Ibcast(&trans_num, …); /* Leader processes can continue writing to transaction 20, while others wait for the ibcastto complete */ if(I am a delegate process) MPI_Wait(&mpi_req, MPI_STATUS_IGNORE); /* Write to container */ …. /* Delegate processes have to complete operations before notifying the leader */ if(I am a delegate process) H5EQwait(event_q, &num_requests, &status); /* Each Leader synchronizes with its delegates that they are done writing */ MPI_Barrier()/Bcast()/… if( I am a leader process) /* Finish the started transaction. */ ret = H5TRfinish(tid, H5P_DEFAULT, NULL, event_q); /* Note that leader does not have to wait for its updates to complete before issuing the finish. We track this internally. */ /* Wait on all Events*/ H5EQwait(event_q, &num_requests, &status);
Multiple Leaders/No Delegates Model /* All processes are considered equal participants in the transaction semantics; no inter-process communication is required More communication with the IONs will be done though. */ /* create & start transaction 20 with num_peers = n */ tid2 = H5TRcreate(file_id, rid2, (uint64_t)20); trspl_id= H5Pcreate (H5P_TR_START); H5Pset_trspl_num_peers(trspl_id, n); H5TRstart(tid2, trspl_id, event_q); H5Pclose(trspl_id); /* Update/write to the container */ … /* finish transaction 20. */ H5TRfinish(tid2, H5P_DEFAULT, NULL, event_q); /* Wait on all Events; everything was asynchronous thus far. */ H5EQwait(event_q, &num_requests, &status);
Operating Within Transactions • HDF5 has metadata and raw data operations. • Operations that occur inside a transaction are typically to update the contents of the container, i.e. create a group, create a dataset within a group, write to the dataset, etc… • Is it required, for example, if I create a dataset to commit the transaction that the dataset was created in to be able to write to the dataset? • No! But there are limitations/rules that must be followed
Operating Inside a Transaction • If I start transaction 1: • Create a group G1: • Create a group G2 in G1: • This is possible because a read is not required to write to G1. • If we happen to do something like this : the operation will fail because it will require a read to get to G1 from the root group. But since G1 is not readable in the transaction it has been created in, the result will be a failure. • Using a path in an access operation requires all objects in the path to be readable from the read context that is used by the transaction. • gid1 = H5Gcreate_ff(file_id, "G1", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT, tid1, event_q); • gid2 = H5Gcreate_ff(gid1, "G2", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT, tid1, event_q); • gid2 = H5Gcreate_ff(file_id, “G1/G2", H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT, tid1, event_q);
Access to Objects created by Leader • But what if 1 process created the object, and other processes need to write to the object in the same transaction? • The leader process will need to retrieve a token representing the object when it created it. This is a local operation: • H5Oget_token(object_id, &token); • Send the token to the other processes. • The other processes will open the object using the token they received: • obj_id = H5Oopen_by_token(token); • This is commonly referred to as local-to-global/global-to-local operation. • This is not currently supported. Deferred to Q6.
Quick Summary • Transactions provide a mechanism for making atomic updates to a container. • Committed transactions result in container versions. • Writes are done to the future. • To uncommitted transactions • Reads are made from the past. • From container versions
Demo • Look at some example code. • Run Tests.
Dynamic Data Structures • The main purpose is to support ACG’s need to access the FastForward stack. • We added a new HDF5 object called a Map object with a new set of routines that should fully support the Dynamic Data Structure use case. • H5DO routines to conveniently append to datasets and fast-append mechanism: • Realized that implementation with IOD will not be possible without an atomic append feature. • Will drop from current project as it wasn’t high priority as other features. • Support for variable length data is done with some limitations: • No nested VL types, or VL types as fields of compound types. • Neither are very commonly seen.
Map objects • A Map object is a direct mapping to a KV store. • We wanted to expose that type of access to the application. • New routines added: hid_tH5Mcreate_ff(hid_tloc_id, const char *name, hid_tkeytype, hid_tvaltype, hid_tlcpl_id, hid_tmcpl_id, hid_tmapl_id, hid_ttrans_id, hid_teq_id); hid_tH5Mopen_ff(hid_tloc_id, const char *name, hid_tmapl_id, hid_trcxt_id, hid_teq_id); herr_tH5Mset_ff(hid_tmap_id, hid_tkey_mem_type_id, const void *key, hid_tval_mem_type_id, const void *value, hid_tdxpl_id, hid_ttrans_id, hid_teq_id); herr_tH5Mget_ff(hid_tmap_id, hid_tkey_mem_type_id, const void *key, hid_tval_mem_type_id, void *value, hid_tdxpl_id, hid_trcxt_id, hid_teq_id); herr_tH5Mget_types_ff(hid_tmap_id, hid_t *key_type_id, hid_t *val_type_id, hid_trcxt_id, hid_teq_id); herr_tH5Mget_count_ff(hid_tmap_id, hsize_t *count, hid_trcxt_id, hid_teq_id); herr_tH5Mexists_ff(hid_tmap_id, hid_tkey_mem_type_id, const void *key, hbool_t*exists, hid_trcxt_id, hid_teq_id); herr_tH5Miterate_ff(hid_tmap_id, hid_tkey_mem_type_id, hid_tvalue_mem_type_id, H5M_iterate_func_t callback_func, void *context, hid_trcxt_id); herr_tH5Mdelete_ff(hid_tmap_id, hid_tkey_mem_type_id, const void *key, hid_ttrans_id, hid_teq_id); herr_tH5Mclose_ff(hid_tmap_id, hid_teq_id);
Demo • Look at some example code. • Run Tests.