Introduction
This chapter outlines the concepts and components of Jakarta Batch.
1. Basic Concepts of Jakarta Batch
Batch processing is a processing mode: it processes a bulk of workload in the background without requiring real-time user involvement. It is used for processing jobs that take long hours, requires heavy computation, and are suitable for both parallel and sequential processes. Batch process types include ad-hoc processes, scheduled processes, and on-demand processes. It is also provided with features such as logging, checkpoint algorithms, parallel process, and supports monitoring of batch workloads so that administrators are allowed to publish, stop, or resume jobs.
JEUS supports the referential integrity checking included in the specifications and the option to specify a thread pool for a certain job.
For details about Jakarta Batch, refer to the specifications. |
2. Jakarta Batch Components
This chapter outlines the concepts and components of Jakarta Batch.
The following illustrates the structure of the core Jakarta Batch components.
Client applications has access to JobOperator through which they can execute and stop jobs that contain specific tasks to perform. A job contains one or more steps, which has three different processing modules of ItemReader, ItemProcessor, and ItemWriter. All of this information is stored as meta-data in JobRepository in the Jakarta Batch implementation.
2.1. JobOperator
JobOperator serves as the interface for client applications to execute batch jobs by providing control over all job-processing statuses. With JobOperator, you can control the start, restart, and termination of a job and to call StepExecution.
2.2. Job
A job is an object that encapsulates the entire process of a batch job, containing one or more steps and allowing for global configuration of properties for the steps. Configuring a job includes the name, ordering of steps, and an indication of whether the job can be restarted or not.
-
JobInstance
A JobInstance is a conceptual operation of a job.
Suppose that a batch job should be run once at a certain hour every day. For this job, a JobInstance is created for one given day, and another for another given day. For example, there will be a January 1st run and a January 2nd run, which are discrete JobInstances of the given Job that runs everyday. If the January 1st run fails and is run again the next day, it is still the January 1st run. In addition, the January 2nd run is run on its own. Therefore, one job run may be run more than once, and each technical attempt to run a job is defined as a JobExecution. Using a new JobInstance indicates that a batch job will be run from scratch, and using an existing JobInstance indicates that a batch job will be run from the checkpoint.
In concept, a JobInstance has nothing to do with the data it processes. It is up to the ItemReader implementation used to load data in units of an item.
-
JobParameters
A JobInstance is distinguished from another by the JobParameters it has. JobParameters are a set of java.until.Properties used to start a batch job. The same JobParameters may be used for two distinct JobInstances.
For example, the January 1st Jobinstance above has "schedule.date = 2016/01/01" as JobParameters, and the January 2nd Jobinstance has "schedule.date = 2016/01/02" as JobParameters. In short, a job contains one or more JobInstances, and each JobInstance is defined by the JobParameters it has and distinguished from one another. A JobInstance can have one or more job attempts called JobExecution.
Relationship among Job, JobInstance, JobParameters, and JobExecution -
JobExecution
A JobExecution is a technical attempt at a job. An execution could result in either success or failure, but the JobInstance of a given execution will not be considered complete until the execution is completed successfully. Let’s consider the previous example of a job that runs at a certain hour everyday. The job itself defines what to do and how to do it, but it is a JobExecution that is responsible for what happens during a job run. For example, if a January 1st JobExecution fails, the January 1st JobInstance creates another JobExecution the next day to complete the job.
2.3. Step
A step is an object that encapsulates an independent process that is part of a series of ordered processes of a batch job. A job consists of one or more steps, and a step contains all information necessary for defining and controlling batch processes. A simple example of a step is a task of reading files into a database.
Just like a job has JobExecutions, a step has one or more StepExecutions.
The following figure illustrates the one-to-many relationship between a job and a step. A job can have multiple steps, and each step can create one or more StepExecutions. That is, a JobExecution can also have one-to-many relationship with StepExecutions, meaning that a job may create many StepExecutions for job completion.
-
StepExecution
A StepExecution is a technical attempt at a step so that running a step creates a StepExecution. Step types include chunk steps and batchlet steps, which are mutually exclusive and thus cannot be used together at once.
-
Chunk
Jakarta Batch uses a chunk step to perform item-oriented processing. It can reader, process, and write items it contains in chunks, and a chunk is defined as the items processed within the scope of a transaction. During a chunk step, checkpoints are taken regularly before starting a new transaction.
The following figure shows the reader-processor-writer modules of a chunk step.
Reader-Processor-Writer Modules of a Step in Jakarta BatchWhen a step starts, one item is read from an ItemReader and handed to an ItemProcessor, and processed. This process is repeated until the entire of chunk of items are processed and handed to an ItemWriter.
-
Batchlet
Jakarta Batch uses a batchlet step to perform task-oriented processing that an item-oriented processing may not be able to handle. For example, a batchlet step can be used to send files or to execute commands.
2.4. ItemReader
An ItemReader reads an item that is in the scope of a step and hands it to an ItemProcessor. Let’s suppose that a file contains lines of data where each line serves as an independent record. In this case, a line serves as an item, which is read by an ItemReader.
Checkpoints are applicable to an ItemReader. Checkpointing allows an ItemReader to begin reading an item from a checkpoint at which an item has been read successfully in case an error occurs or a job is restarted. In the case of the file containing lines of data as independent record, the number of the line that has been read last is checkpointed.
2.5. ItemWriter
An ItemWriter writes out the processing result of a chunk, a list of items handed from an ItemProcessor. Only the information about the items that have been handed can be written out, not the information about the other items that have not been handed yet.
2.6. ItemProcessor
An ItemProcessor is responsible for processing an item handed from an ItemReader and handing it to an ItemWriter. The processing logic can be implemented on an ItemProcessor.
2.7. Checkpoints
As batch processing applications take long hours to process massive amounts of data, checkpointing and restartability are required.
Checkpointing allows the progress status of a StepExecution to be bookmarked so that a job may begin from the last checkpoint in case it is restarted. Because checkpointing entails a lock on the checkpointed item, frequent checkpoints may have adverse effect on the system performance. Therefore, it is recommended that the checkpoints are spaced discreetly.