Introducing JSR-352 - Batch Applications for the Java Platform

Introduction

I still find the inclusion of JSR-352: Batch Applications for the Java platform into the Java EE 7 specification a slightly surprising choice. Many systems built with Java EE will be online only and have no batch requirements. Those that do need batch processing (non-interactive, typically record based, etc. ) will probably have rolled their own technical and operational solutions e.g. popping messages onto a queue and letting JMS listeners asynchronously handle the work, or writing a shell script which fires up a JVM to perform some long running process.

There are other solutions such as Spring Batch and WebSphere Compute Grid, so in fairness to the spec there are many disparate implementations doing nearly the same thing. Rather than reinvent the wheel each time I guess it would be nice for these batch jobs to be portable and maintainable by the Java EE community. All long running batch jobs have some common characteristics too e.g. handling and recovering from failure. Historically Java EE standardises what has already become a good idea elsewhere; JSR-352 seems no different in this regard - as far as I can see it combines elements of Spring Batch and Compute Grid.

Maybe making batch a standard part of Java EE 7 will give large Financial Services companies the confidence to migrate their COBOL based batch processing to Java. Batch is critical to these businesses - as a recent case at the RBS bank highlighted. I don't think including batch in Java EE 7 means such companies will immediately plan to migrate their batch processes to Java, but it will probably progress the discussion.

The most interesting part of JSR-352 is undoubtedly the Chunk Processing. This is where a batch step processes data e.g. millions of records from a database cursor. Some logic is applied to each record and then its transformed output is written out in chunks (to a file, database, etc.). Basically Chunk processing is where the real work happens!

However, in this first post I am going to concentrate on Batchlets. This is a batch step which is called once. It either succeeds or fails. If it fails it can be restarted and it runs again. I have chosen to start with the Batchlet as it's the simplest example to see how to get up and running with Java Batch - it's the Hello World of JSR-352 - but that's not to say Batchlets aren't useful. In the real world they may be used:

  • at the start of a job to copy a file into a working area or to split a file into separate files
  • at the end of a job to do some verification; produce and e-mail a report or delete some working files
  • to fire a mass update query against a database

Glassfish 4 will have JSR-352 support when it's released and Maven Central will host the Java EE API jars. Before it's official release, I am using nightly builds and snapshot repositories. The table below shows where to find these artefacts. The spec is also referenced, which is clearly written and easy to follow:

My first batch job

I created an instance of Java EE 7 WAR archetype using the following Maven command

mvn -DarchetypeGroupId=org.codehaus.mojo.archetypes
-DarchetypeArtifactId=webapp-javaee7 
-DarchetypeVersion=0.4-SNAPSHOT 
-DarchetypeRepository=https://nexus.codehaus.org/content/repositories/snapshots/
-DgroupId=co.uk.planetjones 
-DartifactId=planetjones-java-batch-example
-Dversion=1.0-SNAPSHOT 
-Dpackage=co.uk.planetjones 
-Darchetype.interactive=false --batch-mode --update-snapshots archetype:generate

I am using Intellij to develop on the iMac, but any IDE or OS will be ok. I'm also using a version 7 JDK.

Next you need to add the Java Batch dependency to pom.xml:

<dependency>
      <groupId>javax.batch</groupId>
      <artifactId>javax.batch-api</artifactId>
      <version>1.0</version>
      <scope>provided</scope>
</dependency>

A batch job is comprised of steps. Some will be chunk steps and some will be Batchlet steps. Jobs are defined in XML and must be located inside:

META-INF/batch-jobs

I created my Batchlet by implementing the javax.batch.api.Batchlet interface. I can inject the JobContext and StepContext instances using CDI - these instances give me information about the whole job and the current step.

public class HelloWorldBatchlet implements javax.batch.api.Batchlet {

    @Inject JobContext jobContext;
    @Inject StepContext stepContext;

    @Override
    public String process() throws Exception {

        // here I could copy a file as a precursor to processing it

       long executionId = jobContext.getExecutionId();

        if(isEven(executionId))   {
            throw new 
              NullPointerException("I don't like even numbers :)");
        }
        return "SUCCESS";
    }

    private boolean isEven(long num) {
        return (num % 2 == 0);
    }

    @Override
    public void stop() throws Exception {
    }
}

To follow the code sample above you need to know what an executionId is. There are lots of ids in Java Batch which can be summarised as:

  • when a new job is started it is assigned an instanceId
  • when the new job is executed it gets assigned an executionId. A job is executed when it first starts or when it is restarted - if a job fails and is restarted it gets assigned a new executionId
  • within a job execution there are many steps. Each step is assigned a new stepExecutionId - this is only assigned when the step starts to execute. If a step is executed more than once because of restarts each is assigned a new stepExecutionId

On Glassfish these ids just increment upwards from 0. I have added logic which throws an unchecked Exception when the executionId is even. This means that execution should fail. If the job is restarted it will be assigned another executionId which should be an odd number (as I am the only operator using my Glassfish server) so it should then succeed.

I created an XML file to describe my job at:

META-INF/batch-jobs/planetjones-test-batch-job.xml

Its contents:

<?xml version="1.0" encoding="UTF-8"?>
<job id="simple-batchlet-job" xmlns="http://xmlns.jcp.org/xml/ns/javaee">
    <step id="batchlet-step">
        <batchlet ref="co.uk.planetjones.HelloWorldBatchlet"/>
    </step>
</job>

To start jobs and restart jobs you must programatically interact with the JobOperator, which is available via the BatchRuntime class. I added simple methods to start, restart and get information on an execution to a Stateless Session Bean:

package co.uk.planetjones;
import javax.batch.operations.JobOperator;
import javax.batch.runtime.BatchRuntime;
import javax.batch.runtime.JobExecution;
import javax.ejb.Stateless;
import java.util.Properties;

@Stateless
public class BatchExecutionBean {

    public long submitJob() {
        JobOperator jobOperator = BatchRuntime.getJobOperator();
        Properties jobProperties = new Properties();
        long executionId = 
         jobOperator.start("planetjones-test-batch-job", jobProperties);
        return executionId;
    }

    public JobExecution getJobExecutionDetails(long executionId) {
        JobOperator jobOperator = BatchRuntime.getJobOperator();
        JobExecution jobExecution = jobOperator.getJobExecution(executionId);
        return jobExecution;
    }

    public long restartJob(long executionId) {
        Properties jobProperties = new Properties();
        long newExecutionId = 
          BatchRuntime.getJobOperator().restart(executionId, jobProperties);
        return newExecutionId;
    }

}

Note you can give Properties to a job, if you want to provide dynamic parameters. Now I create a simple servlet to start my job, restart failed jobs and view job details. This code would (thankfully) not win any awards - but it shows the possibilities of dynamically scheduling batch jobs via the JobOperator instance:

@WebServlet(name = "BatchJobStartServlet",
            urlPatterns = BatchJobStartServlet.SERVLET_MAPPING)
public class BatchJobStartServlet extends HttpServlet {

    enum OperatorAction {
        START, RESTART, VIEW;
    }

    final static String SERVLET_MAPPING = "/run_batch_job";

    @EJB
    private BatchExecutionBean batchExecutor;

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse res)
            throws ServletException, IOException {

        OperatorAction action = OperatorAction.valueOf(req.getParameter("action"));

        long executionId = -1;

        if (req.getParameter("executionId") != null) {
            executionId = Long.valueOf(req.getParameter("executionId"));
        }

        switch (action) {
         case START:
            executionId = batchExecutor.submitJob();
            this.write(String.format("Batch execution %d is running",
                    executionId), res);
            break;
         case RESTART:
            executionId = batchExecutor.restartJob(executionId);
            this.write(
                    String.format("Batch execution %d is the result of a restart",
                            executionId), res);
            break;
         case VIEW:
            JobExecution execution = 
                    batchExecutor.getJobExecutionDetails(executionId);
            this.write("Execution Id \n " + execution, res);
            break;
        }

        String contextPath = req.getContextPath();

        this.write("<h2>Options</h2>", res);

        this.writeLink(OperatorAction.VIEW,
                "View details for Execution Id " 
                        + executionId, executionId, req, res);

        this.writeLink(OperatorAction.RESTART,
                "Restart Execution Id " + executionId, executionId, req, res);

        this.writeLink(OperatorAction.START,
                "Start a new job", null, req, res);
    }

    private void write(String message, HttpServletResponse res) 
            throws IOException {
        
        res.setContentType("text/html");
        PrintWriter out = res.getWriter();
        out.println(message);
    }

    private void writeLink(OperatorAction action, String text, Long executionId,
                           HttpServletRequest req, HttpServletResponse res)
            throws IOException {

        PrintWriter out = res.getWriter();
        StringBuilder sb = new StringBuilder();
        
        sb.append("<a href=\"")
          .append(req.getContextPath())
          .append(SERVLET_MAPPING)
          .append("?action=")
          .append(action.name());

        if (executionId != null) {
            sb.append("&executionId=").append(executionId);
        }

        sb.append("\">").append(text).append("</a>");

        out.println(sb.toString());
        out.println("<hr/>");
    }
}

The Glassfish admin console does have a section to view batch jobs. However it is a read only view - presumably other Java EE application servers will provide a standard interface for submitting, restarting and monitoring jobs. Many big companies will use external schedulers to start the jobs - so there's a lot of flexibility here for the operational side. Here's a screenshot of the Batch Monitoring section of the Glassfish console:

Glassfish 4 batch admin console

Anyhow, deploy the WAR file to Glassfish (Intellij can do this as an exploded WAR for you). Then interact with it using these URLs:

  • Start: http://localhost:8080/planetjones-java-batch-example-1.0-SNAPSHOT/run_batch_job?action=START
  • Restart: http://localhost:8080/planetjones-java-batch-example-1.0-SNAPSHOT/run_batch_job?action=RESTART&executionId=XX
  • View: http://localhost:8080/planetjones-java-batch-example-1.0-SNAPSHOT/run_batch_job?action=VIEW&executionId=XX

My "wonderful" Servlet UI makes this simple :)

Let's walkthrough a simple case where I start a job which gets assigned the fated even numbered executionId:

Start new jsr 352 batch job

So a new job instance was created and it was given an executionId of 74. Now I click on the View details link:

View failed jsr 352 batch job

Because my code throws the NullPointerException the execution has failed. Now I can restart the job by clicking the restart link:

View java batch job after restart

The Job is executed again so it gets a new executionId of 75. This is an odd number so I expect my Batchlet to succeed now. Let's view the details again:

View successful execution after restart

Now the job instance has COMPLETED successfully. Notice the instanceId of 58 - this remains the same over restarts because this is the instanceId of the Job - which only gets assigned when JobOperator.start is called.

The exitStatus is set to the same value as the jobStatus because I didn't specify an exist status by calling:

jobContext.setExitStatus("any string");

Summary

I hope after reading this you have a general idea about JSR-352. Over the next few months I'll be experimenting a lot more - chunk steps, checkpointing, transactions and partitioning. The source code for this example is on github. I'd love to hear your feedback or suggestions.