240 likes | 261 Views
Learn about standardizing batch processing in Java, using Java EE technologies like Spring Batch and WebSphere Compute Grid. Explore features like Chunked Processing, Checkpointing, Partitioning, and Exception Handling.
E N D
Batch Applications for the Java Platform Arun Gupta, Java EE & GlassFish Guy blogs.oracle.com/arungupta @arungupta
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Batch Applications for the Java Platform • Standardizes batch processing for Java • Non-interactive, bulk-oriented, long-running • Data or computationally intensive • Sequentially or in parallel • Ad-hoc, scheduled or on-demand execution • Led by IBM • Spring Batch, WebSphere Compute Grid (WCG), z/OS Batch • Part of Java EE 7, can be used in Java SE
Chunked Processing: Reader, Processor, Writer public interface ItemReader<T> { public void open(Externalizable checkpoint); public T readItem(); public Externalizable checkpointInfo(); public void close(); } public interface ItemProcessor<T, R> { public R processItem(T item); } public interface ItemWriter<T> { public void open(Externalizable checkpoint); public void writeItems(List<T> items); public Externalizable checkpointInfo(); public void close(); }
Batch Applications for the Java Platform Step Example using Job Specification Language (JSL) @Named(“accountReader") ...implements ItemReader... {public Account readItem() {// read account using JPA //For a job <step id=”sendStatements”> <chunk reader=”accountReader” processor=”accountProcessor” writer=”emailWriter” item-count=”10” /> </step> @Named(“accountProcessor") ...implements ItemProcessor... { public Statement processItems(Account account) {// read Account, return Statement @Named(“emailWriter") ...implements ItemWriter... { public void writeItems(List<Statements> statements) {// use JavaMail to send email
Checkpointing • For data intensive tasks, long periods of time • Checkpoint/restart is a common design requirement • Basically saves Reader, Writer positions • Naturally fits into Chunk oriented steps • reader.checkpointInfo() and writer.checkpointInfo() are called • The resulting Externalizable data is persisted • When the Chunk restarts, the reader and writer are initialized with the respective Externalizable data
Chunked Processing: Checkpoint public interface ItemReader<T> { public void open(Externalizable checkpoint); public T readItem(); public Externalizable checkpointInfo(); public void close(); } public interface ItemProcessor<T, R> { public R processItem(T item); } public interface ItemWriter<T> { public void open(Externalizable checkpoint); public void writeItems(List<T> items); public Externalizable checkpointInfo(); public void close(); }
Handling Exceptions <job id=...> ... <chunk skip-limit=”5” retry-limit=”5”> <skippable-exception-classes> <include class="java.lang.Exception"/> <exclude class="java.io.FileNotFoundException"/> </skippable-exception-classes> • <retryable-exception-classes> ... </retryable-exception-classes> <no-rollback-exception-classes> • ... </no-rollback-exception-classes> </chunk> ... </job>
Partitioned Step • A batch step may run as a partitioned step • A partitioned step runs as multiple instances of the same step definition across multiple threads, one partition per thread <step id="step1" > <chunk ...> <partition> <plan partitions=“10" threads="2"/> <reducer .../> </partition> </chunk> </step>
Partitioning – Advanced Scenarios • Partition Mapper • Dynamically decide number of partitions (partition plan) • Partition Reducer • Demarcate logical unit of work around partition processing • Partition Collector • Sends interim results from individual partition to step's partition analyzer • Partition Analyzer • Collection point of interim results, single point of control and analysis
Flow and Split • Flow defines a set of steps to be executed as a unit <flow id=”flow-1" next=“{flow, step, decision}-id” > <step id=“flow_1_step_1”> </step> <step id=“flow_1_step_2”> </step> </flow> • Split defines a set of flows to be executed in parallel <split …> <flow …./> <!– each flow runs on a separate thread --> <flow …./> </split>
Batchlet • Task oriented unit of work • Not item-oriented • Doesn't support resume • Simple Batchlet interface – process(), stop() • In Job XML: <batchlet ref=”{name}”/>
Programming Model – Advanced Scenarios • CheckpointAlgorithm • Decider • Listeners – Job, Step, Chunk Listeners … • @BatchPropertyString fname = “/tmp/default.txt” • @BatchContext JobContext jctxt;
Batch Runtime Specification • JobOperator interface – for programmatic lifecycle control of jobs • Step level Metrics through the StepExecution object • Batch Artifact loading through CDI • Job XML loading through META-INF/batch-jobs/myJob.xml • Packaging – jar, war, ejb-jar
Example: Programmatic Invocation of Jobs In a Servlet/EJB: import javax.batch.runtime.BatchRuntime; ... JobOperator jo = BatchRuntime.getJobOperator(); jo.start("myJob", new Properties()); ...
Where to Find Out More • Read the final spec and API docs at http://java.net/projects/jbatch/ • Join public@jbatch.java.net and send questions and comments • RI integrated in GlassFish 4.0 • http://glassfish.java.net