Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk L.Moreau@ecs.soton.ac.uk University of Southampton

Implementation and Evaluation of a Protocol for Recording Process Documentation in the Presence of Failures Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk L.Moreau@ecs.soton.ac.uk University of Southampton

Outline • Motivation • Protocol Overview • Implementation • Experimental Setup • Experimental Results & Analysis • Conclusions & Future Work

The provenanceof a data product refers to the process that led to that data product • Process documentation is a computer-based representation of a past process for determining provenance • Process documentation consists of a set of p-assertions • Process documentation is stored in provenance stores • Provenance obtained by querying provenance stores

Actor1 Actor2 Actor3 Actor4 invocation invocation invocation result result result PS1 PS2 PS3 PS4 Invocation and result p-assertions Link Pointer Chain PReP (Groth 04-08) • A protocol to record process documentation • Multiple provenance stores are interlinked to enable retrievability of distributed process documentation

Actor1 Actor2 Actor3 Actor4 invocation invocation invocation result result result PS1 PS2 PS3 PS4 Invocation and result p-assertions Broken Pointer Chain Link Failures • Provenance store crash, communication failures • We do not consider application failures, e.g. actor crash • Poor quality process documentation Incomplete Disconnected

Requirements • Guaranteed Recording After a process completes, the entire documentation of the process must eventually be recorded in provenance stores • Link Accuracy All the links recorded during a process must eventually be accurate to enable retrievability of distributed documentation • Efficient Recording The protocol should be efficient and introduce minimum overhead

F-PReP • A protocol for recording process documentation in the presence of failures • Derives from PReP to inherit its generic nature • Introduces an Update Coordinator to facilitate updating links(We assume the coordinator does not crash) • Actor’s side • Uses timeout and retransmission to record p-assertions • Chooses alternative provenance stores in case of failures • Requests the coordinator to update links • Provenance store Replies an acknowledgement only after it has successfully recorded p-assertions in its persistent storage.

Update Update Coordinator PS2’ Repair Request Update Actor1 Actor2 Actor3 Actor4 invocation invocation invocation result result result PS1 PS2 PS3 PS4 Invocation and result p-assertions Link Pointer Chain F-PReP

Implementation • Provenance Store • Implemented as a Java Servlet • backend store (Berkeley DB) • Disk cache Flushing OS buffers to disk before providing an ack to actor • Update Plug-In • Client Side Library • Remedial actions that cope with failures • Multithreading for the creation and recording of p-assertions • A local file store (Berkeley DB) for temporarily maintaining p-assertions • Update Coordinator • Implemented as a Java Servlet • Berkeley DB is also employed to maintain request information

Performance Study • Throughput of provenance store and coordinator • Scalability of update coordinator • Failure-free recording performance • Overhead of taking remedial actions • Performance impact on application

Experimental Setup • Iridis cluster (Over 1000 processor-cores) • Gigabit Ethernet • Tomcat 5.0 container • Berkeley DB Java Edition database • Java 1.5 • A generator is used on an actor's side to inject random failure events: • Failure to submit a batch of p-assertions to a provenance store • Failure to receive an acknowledgement from a provenance store before a timeout • Generates a failure event based on a failure rate, i.e., the number of failure events occurring after a total number of recordings

1. Provenance Store (PS) Throughput • Setup: up to 512 clients sending 10k p-assertions to 1 PS in 10 min • Hypothesis: Disk cache may sacrifice a provenance store's throughput. • Result:20% decrease in throughput

2. Coordinator Throughput • Setup: up to 512 clients sending 100 requests to 1 coordinator in 10 min • Hypothesis: The coordinator’s throughput is high. • Result:30,000*100 repair requests accepted in 10 min

A client sends at most 200*100 repair requests. (Maximum is seen when failure rate is 50%.) Coordinator throughput: 30,000*100 req/10min This implies that coordinator can support a large number of clients (50 - 100?) without being a bottleneck. 3. Throughput Experiment with Failures (1 client) • Setup: 1 client sending 10k p-assertions to 1 PS 1 alt. PS and 1 coordinator used in the case of failures • Hypothesis: (a)Resending to a same PS is preferred over alt. PS • for transient failures • (b) Update coordinator is not a bottleneck.

128 clients send at most 750*100 repair requests. (Maximum is seen when failure rate is 50%.) Coordinator throughput: 30,000*100 req/10min This implies that coordinator can support a large number of clients without being a bottleneck. 4. Throughput Experiment with Failures (128 clients) • Setup: 128 clients sending 10k p-assertions to 1 PS • 1 alt. PS and 1 coordinator used in the case of failures • Hypothesis: (a)Resending to a alt. PS is preferred to same PS • (b) The coordinator is not a bottleneck.

5. Failure-free Recording Performance • Setup: 1 client recording 10,000 10k p-assertions to 1 PS • 100 p-assertions shipped in a single batch • Hypothesis: Disk cache causes overhead. • Results: (a) 900 10k p-assertions may be lost if PS’s OS crashes. (PReP) • (b)13.8% overhead, compared to PReP

6. Overhead of Taking Remedial Actions • Setup: 1 client recording 100 p-assertions to 1 PS • 1 alt. PS and 1 coordinator used in the case of failures • Hypothesis: Remedial actions have acceptable overhead. • Result: <10% overhead, compared to failure-free record time

7. Performance Impact on Application • Amino Acid Compressibility Experiment (ACE) • High performance and fine grained, thus representative • One run of ACE: 20 parallel jobs; 54, 000 interactions/job • Extremely detailed process documentation • 1.08 GB p-assertions/job in 25 minutes

Recording Performance in ACE • Setup: 5 PS and 1 coordinator • Multithreading for creation and recording p-assertions • Hypothesis: F-PReP has acceptable recording overhead. • Results: (a) similar overhead (12%) as PReP on application performance when no failure occurs • (b)Timeout and queue management affect performance.

Impact of Queue Management on Performance • Hypothesis: Flow control on queue affects performance. • Conclusions: (a) The result supports our hypothesis. • (b) We can monitor queue and take actions, • e.g., employing the local file store.

8. Quality of Recorded Process Documentation • Setup: Using F-PReP and PReP to record p-assertions Querying PS to verify recorded documentation • Results: (a) PReP: incomplete; F-PReP: complete • (b)PReP: irretrievable; F-PReP: retrievable

Conclusions & Future Work • Coordinator does not affect an actor’s recording performance. • In an application, F-PReP has similar recording overhead as PReP on application performance when there is no failure. • Although it introduces overhead in the presence of failures, we believe the overhead is still acceptable, given that it can record high quality (i.e., complete and retrievable)process documentation. • We are currently investigating how to create process documentation when an application has its own fault tolerance schemes to tolerate application level failures. • In future work, we plan to make use of the process documentation recorded in the presence of failures to diagnose failures.

Questions? Thank you!

Zheng Chen and Luc Moreau zc05r@ecs.soton.ac.uk L.Moreau@ecs.soton.ac.uk University of Southampton