Job Management
This chapter describes the job states, job types, the steps during execution of a job, and the result retrieval process after the job has finished running. In OpenFrame, tjesmgr and OpenFrame Manager can be both used for job management. However, this user guide focuses on job management using only tjesmgr.
For more information about how to manage jobs by using OpenFrame Manager, refer to OpenFrame Manager User Guide. |
1. Job Status
TJES categorizes a job into 8 states: START, HOLD, WORKING, SUSPEND, DONE, ERROR, STOP, and FLUSH. Each state and their relationships are described in the following figure.
The READY state indicates that SUBMIT is being processed. PURGE means that the job is being removed from TJES. |
The following describes each of the TJES states.
State | Description |
---|---|
START |
The START state is valid for jobs prior to submission through JCL; these jobs have not yet been allocated to a Runner slot. The scheduler examines all jobs in the START state to generate the schedule. In this state, a job’s class can be modified, a HOLD command can be issued, or a job can be removed from the JOBQ with the CANCEL command. |
HOLD |
The HOLD state is also valid for jobs prior to job submission through JCL, but unlike jobs in the START state, jobs in the HOLD state are not included in a schedule by the scheduler. Jobs can be put into the HOLD state in three different ways.
Jobs in the HOLD state can be put in the START state with START command or removed from the JOBQ with the CANCEL command. |
WORKING |
A job is in the WORKING state when tjclrun is actively processing the job. Depending on the context of the step, it can be run through multiple UNIX processes. Jobs in this state can be paused with the SUSPEND command or stopped with the STOP command. |
SUSPEND |
A job can be put in the SUSPEND state by issuing a SUSPEND command. All processes currently involved with the job are paused. From this state, a job can be put back into the WORKING state with a RESUME command, or job processing can be terminated (i.e. enter STOP state) with the STOP command. If a third-party utility is used and a new session has been set within the utility itself, the corresponding process and its sub-processes cannot be suspended. |
DONE |
When a job has successfully finished processing, it enters the DONE state. To verify that the job has been processed as the user intended, the spool data (PODD) and return codes (PSJOB) must be checked. From the DONE state, the only state change command that can be used is the REMOVE command, which will remove the current job from the JOBQ. The DONE state does not necessarily indicate that the procedure or invoked program processed successfully; it merely indicates that all return codes returned by the COND statement or the rc subject are acceptable. |
ERROR |
A job enters an ERROR state when the return code is out of the acceptable code range specified in COND or the rc subject. From the ERROR state, the only state change command that can be used is the REMOVE command, which can remove the current job from the JOBQ. |
STOP |
A job enters the STOP state when a user sends a STOP command or initiates the Disaster Recovery Plan. All resources used by the job are freed and the job is terminated. From the STOP state, the only state change command that can be used is the REMOVE command, which can remove the current job from the JOBQ. |
FLUSH |
A job enters the FLUSH state when tjclrun encounters an error and the job cannot be processed. The usual causes of a job entering FLUSH state are JCL run-time parsing errors or DD allocation errors. From the FLUSH state, the only state change command that can be used is the REMOVE command, which can remove the current job from the JOBQ. |
1.1. Using Commands to Change the Job State
Users can change a job’s state by using a number of commands.
The following table describes the tjesmgr commands required to change a job’s state. The state in the first column changes to the state listed in the third column when the command from the second column is used.
Current State | Command | Desired Next State |
---|---|---|
- |
RUN |
START, HOLD |
START |
CANCEL |
Job is removed from JOBQ prior to job execution |
HOLD |
HOLD |
|
HOLD |
CANCEL |
Job is removed from JOBQ prior to job execution |
START |
START |
|
WORKING |
SUSPEND |
SUSPEND |
STOP |
STOP |
|
SUSPEND |
STOP |
STOP |
RESUME |
WORKING |
|
DONE |
REMOVE |
Job is removed from JOBQ prior to job execution |
ERROR |
REMOVE |
Job is removed from JOBQ prior to job execution |
STOP |
REMOVE |
Job is removed from JOBQ prior to job execution |
FLUSH |
REMOVE |
Job is removed from JOBQ prior to job execution |
The number of jobs that can be added to JOBQ is limited by the amount set in the JOBDEF section of the tjes subject, in OpenFrame Configuration. Therefore, any unnecessary jobs must be removed from the JOBQ.
The CANCEL and REMOVE commands remove a job from the JOBQ. The difference between the two commands is the timing of the job deletion. CANCEL deletes the job before job processing starts, while REMOVE deletes it after processing starts. If a job is in the HOLD state, it is not scheduled to run. The START command can be used to change the job state and put it in the job schedule.
|
2. JCL Management
OpenFrame can process all mainframe-compatible JCL files. While it is possible to manage JCL files by using directory structures, it is a better idea to register the JCL files as members of the data set specified in the JCLLIB key of the PROCLIB section from the tjes subject under OpenFrame Configuration. If there are many JCL files, create separate PDS data sets and manage them as PDS data set members.
Since PDS data sets are represented as actual directories in OpenFrame, JCL files that are copied to the directory corresponding to a PDS data set are automatically treated as members of that PDS.
After a JCL file has been registered as a data set member, the full file path of the JCL file no longer needs to be specified when the JCL file is submitted; instead, the corresponding JCL file can be located by specifying the name of the data set and the relevant member name. If a JCL file is registered as a member of the data set specified in the JCLLIB key in the PROCLIB section from the tjes subject under OpenFrame Configuration, the JCL file can be located by simply specifying the member name alone. The data set’s members are searched according to the PDS order specified in JCLLIB.
For more information about the PROCLIB section of the tjes subject, refer to OpenFrame Configuration Guide. |
3. JOB Processing Steps
All jobs that are processed via TJES go through the following steps.
-
Submit: JCL files are accepted from TJES.
-
Schedule: Qualified jobs are allocated to a Runner slot.
-
Execution: The Runner processes the job.
-
Output: Output from finished jobs is processed.
-
Remove: The job is removed from TJES.
3.1. Submit Step
In the submit step, JCL files are analyzed and the jobs are entered into TJES.
Once a job is submitted and TJES finishes analyzing the JCL file, job entries are formed and the resources needed for job management, such as the JOBID and spool, are allocated and saved in the JOBQ. Information about the new job is reported to the scheduler, which allocates the job to a Runner slot.
TJES allocates the following resources to a job.
Resource | Description |
---|---|
JOBID |
To simplify job management, each job is assigned a unique identifier. This is called the JOBID and is in 'JOBnnnnn' or 'J0nnnnnn' format. The format is determined by the ENDNUM key in the JOBDEF section of the tjes subject, in OpenFrame Configuration. According to the format, JOBIDs range from 00001 to 99999, or from 000001 to 999999. The JOBID is released when the job is deleted as the result of a CANCEL or REMOVE command. TJES assigns released JOBIDs again. In addition, there are other JOBIDs such as 'STCnnnnn' and 'TSOnnnnn' created by CICS and TSO respectively, but TJES does not manage those JOBID types. |
SPOOL |
Each job has its own spool space. It is assigned to the spool volume using the JOBID as a directory name. The spool space stores a number of files, sich as INPJCL, which preserves the state of JCL at the time when the job submission was received; CATPROC, which saves catalogs; SYSOUT, which saves job processing results; and other files that TJES uses internally to report the progress of a job. The spool can be initialized with tjesinit. |
JOBQ |
To make job search and modification easier, TJES uses a KSDS to save basic information about each job. The job information saved to JOBQ can be initialized by using the tjesinit tool. |
TJES receives JCL submission according to the following steps.
-
JCL syntax error is checked. (Procedures are not verified nor are syntax errors detected in this step. tjclrun verifies them when it parses the JCL)
-
JOBID is assigned.
-
The spool is generated.
-
The submitted JCL file is copied from the spool to INPJCL.
-
The job information is saved in JOBQ.
-
If 'TYPRUN=HOLD' is specified in the JCL job statement, or if HOLD is not specified to the CLASS KEY value in the JOBCLASS section of the tjes subject, in OpenFrame Configuration, this is reported to obmjschd for scheduling.
If an error occurs during the submit step, the error event is recorded to the log file with a naming format of 'submit_YYYYMMDD.log', under the 'job' subdirectory in the path specified to the LOG_DIR key in the DIRECTORY section of the ofsys subject, in OpenFrame Configuration. A successfully submitted job remains either in the START or HOLD state.
Once the JCL file is successfully submitted through the RUN command of tjesmgr, all jobs generated by submission are registered in JOBQ.
|
OpenFrame allows JCL files to be submitted in the following ways.
-
Using the RUN command in tjesmgr (See RUN in TJESMGR Commands)
-
Using the textrun tool (See "4.4. textrun" in OpenFrame Tool Reference Guide.)
-
Using the [Batch] menu of OpenFrame Manager (See OpenFrame Manager User Guide)
3.2. Schedule Step
In the scheduling step, a job in START state is assigned to the Runner slot that meets the following requirements.
-
The Runner slot must be idle and able to accept a new job.
-
The class of the Runner slot must match the job class.
-
If the job is configured for execution on a specific node, the Runner slot must be active on that node.
The following algorithm demonstrates scheduling logic.
Priority aging For i = 1 to 8 / * to iterate RUNNER CLASSes */ For each idle runner { Get the list of Job which matches the ith class of the runner For each Job in descending priority order { Check node affinity. If fails, try next Job Check Jobname duplication. If fails, try next Job Assign the Job to the runner and break } }
Job Class
One of many properties of a job, and described with a single letter (between A and Z, 0 and 9) in the CLASS parameter of the JCL JOB statement.
If no class is specified in the JCL file, use the default job class value specified to the CLASS key in the JOB section of the tjclrun subject, in OpenFrame Configuration. The job class is used for job scheduling to limit assignment of the job to certain Runner slots. A job with a class 'A' can be assigned only to a Runner slot with a Runner class containing the letter A, such as CBA or ABC.
Also, the TJES’s behaviors vary depending on the sub-property of the job class, 'HOLD.' Jobs that are submitted with a HOLD job class enter into a HOLD state, even if 'TYPRUN=HOLD' is not specified.
For more information about how to configure the tjclrun subject, refer to OpenFrame Configuration Guide. |
Runner Class
The Runner class is specified in the INITDEF section of the tjes subject, in OpenFrame Configuration.
The format of a Runner class is similar to that of a job class and can only contain characters between A-Z and 0-9. One Runner slot can contain up to eight classes.
If multiple classes are specified for a Runner slot, jobs are executed based on the order in which their class appears in the Runner class list. This prioritization scheme means that in order for a job of class 'i' to be executed by a Runner, there must be no jobs with a class listed prior to 'i' in the Runner class list. Take an example where Runner slot #1 has a Runner class of 'ABC' and Runner slot #2 has a Runner class of 'BCA'. A job of class 'C' can be run as the second job in Runner slot #2, as long as there are no jobs waiting with a class of 'B'. For job 'C' to be run as the third job in Runner slot #1, there must be no jobs waiting for Runner slot #1 with a class of 'A' or 'B', and Runner slot #2 must either be already executing a job or in an inactive state and thus unable to execute the job.
For more information about how to configure the tjes subject, refer to OpenFrame Configuration Guide. |
3.3. Execution Step
The Runner executes the job as described in the JCL file.
In the execution step, the Runner slot parses through the JCL file in the spool and executes the job described in the file. There are multiple steps defined for each job, which are executed by the Runner in the order that they are described in the JCL file. The data set allocation for job execution also occurs during the execution step.
3.4. Output Step
The SYSOUT generated during the job execution is processed. SYSOUT is processed at the output level. Depending on the output class, the request is either sent to the printer for processing or to a specified post-processing destination.
Output is either printed through a printer or submitted to TJES via an internal reader. For more information, refer to Output Processing.
3.5. Remove Step
After a job has finished the output step, the user can remove the completed job from the system. All resources associated with the job, including the JOBID, spool, and JOBQ entries, are removed and released back to TJES. To ensure that the results of a deleted job can still be recovered, use the backup command. After the job is successfully backed up, it is automatically removed from the spool.
4. Job Retrieval
Depending on the parameters specified in the JCL file, a job is submitted to the JOBQ in either a START or HOLD state. The state of each job in the JOBQ can be viewed by using the PS command of tjesmgr.
The JOB in a START state is in a WORKING state when it is run, and terminated in a ERROR, DONE, or FLUSH state.
Using the following tjesmgr commands, detailed job information can be retrieved:
Command | Description |
---|---|
PSJOB |
Shows detailed job information. |
PSIO |
Shows I/O information and the data sets used by the job. |
POSPOOL |
Shows information about the spools generated by the job. |
PODD |
Shows the contents of the spools generated by the job. |
POJOB |
Shows detailed output information related to the job. |
For more information, refer to Job/Output Commands. |
5. Log
TJES logs all job events during operation and is able to perform trace-backs and analyze any errors encountered.
In OpenFrame/Batch, TJES records the following two logs:
-
The submit log, which stores information about all JCL submissions.
-
The job log, which stores information about all job state changes.
Due to the vast number of log messages that can accumulate in a short time, log files can become very large if not purged regularly. To make it easier to purge only old, unnecessary data, the logging function can be configured to create separate log files for each day.
5.1. Submit Log
The submit log file is saved, with the naming format of 'submit_YYYYMMDD.log', under the subdirectory named 'job' in the path specified to the LOG_DIR key in the DIRECTORY section of the ofsys subject under OpenFrame Configuration. All attempts to submit JCL files recognized by obmfmsvr, regardless of whether they succeed or fail, are recorded in the submit log.
The default format of the submit log is as follows:
>> timestamp jclpath submit_result
Element | Description |
---|---|
timestamp |
14-digit numbers with the format of yyyymmddHHMMSS. |
jclpath |
Absolute Unix path of the submitted JCL file. |
submit_result |
Result of the submission. |
5.2. Job Log
The job log is saved with a name format of job_YYYYMMDD.log, under the job subdirectory specified to the LOG_DIR key in the DIRECTORY section from the ofsys subject, in OpenFrame Configuration.
The following is the default format of a job log entry.
[YYYY-MM-DDTHH:MI:SS.FFFFFF][MODULE(PID)][MSG_LEVEL][MSG_CODE] CMD=operation,NODE=nodename,USER=userid,JOBID=jobid,JOBNAME=jobname,additional info
The following is a list of common items.
Item | Description |
---|---|
CMD |
One of the following operation type.
|
NODE |
Name of a node in which a job is executed. If 'execution_node' for job execution is not specified in JCL or the VALUE of the DEFAULT_RUNNING_NODE key is not set to MY, in the DEFAULT_OPTION section of the tjesmgr subject under OpenFrame Configuration, an asterisk (*) is set so that the job can be executed in all nodes. |
USER |
Name of the user who executes the job. If CMD=FINISH, it is TJES. |
JOBID |
Submitted job ID. |
JOBNAME |
Job name. |
additional info |
Varied depending on the value specified in CMD=operation. |
SUBMIT
The job is submitted.
-
Usage
CLASS=class,STATUS=status,JCL=path
Element Description CLASS
Class of the submitted job.
STATUS
Status of the job at the time of the submission.
-
S: START
-
H: HOLD
-
D: DONE
JCL
Path of the original JCL file, not the copied INPJCL path.
-
EXECUTE
The job is allocated to Runner.
-
Usage
INDEX=index,PID=pid,CLASS=class
Element Description INDEX
Index of the Runner slot in which the job is running. Valid only for the currently running node.
PID
Process ID of the Runner of the job. Valid only for the currently running node.
CLASS
Class used for running the current job among many other classes allocated to the Runner slot.
CHANGE
Job properties are changed before execution.
-
Usage
CLASS=class,PRIORITY=priority,STATUS=status
Element Description CLASS
Class of the job after the change.
PRIORITY
Priority of the job after the change.
STATUS
Status of the job after the change.
-
S: START
-
H: HOLD
-
FINISH
The job execution is finished.
-
Usage
INDEX=index,PID=pid,STATUS=status,RCODE=exitcode
Element Description INDEX
Index of the Runner slot in which the job is running. Valid only for the currently running node.
PID
Process ID of the Runner of the job. Valid only for the currently running node.
STATUS
State of the finished job.
-
D: DONE
-
E: ERROR
-
T: STOP
-
F: FLUSH
RCODE
Exit code of the job. For more information, refer to Job Exit Codes.
-
REMOVE
The job is removed from TJES.
-
Usage
STATUS=status
Element Description STATUS
State of the removed job.
-
D: DONE
-
E: ERROR
-
T: STOP
-
F: FLUSH
-
SUSPEND
The job is suspended.
-
Usage
INDEX=index,PID=pid,STATUS=status
Element Description INDEX
Index of the Runner slot in which the job is running. Valid only for the currently running node.
PID
Process ID of the Runner of the job. Valid only for the currently running node.
STATUS
State of the suspended job.
-
P: SUSPEND
-
RESUME
The job is resumed.
-
Usage
INDEX=index,PID=pid,STATUS=status
Element Description INDEX
Index of the Runner slot in which the job is running. Valid only for the currently running node.
PID
Process ID of the Runner of the job. Valid only for the currently running node.
STATUS
State of the resumed job.
-
W: WORKING
-
NICE
The CPU priority of the job is specified.
-
Usage
INDEX=index,PID=pid,STATUS=status,NICE=nice
Element Description INDEX
Index of the Runner slot in which the job is running. Valid only for the currently running node.
PID
Process ID of the Runner of the job. Valid only for the currently running node.
STATUS
State of the niced job.
-
P: SUSPEND
-
W: WORKING
NICE
Number ranging from 0 to 20. Smaller number represents higher priority.
-
STEP
Displays the information of a specific step of the job.
-
Usage
STEPNAME=stepname,SKIP=skip,STEPSEQ=seqnum,RCODE=exitcode
Element Description STEPNAME
Name of the job step.
SKIP
Option to skip execution of the job step.
-
Y: Does not execute the job step.
-
N: Executes the job step.
STEPSEQ
Sequence number of the job step. Starts from 1.
RCODE
Exit code of the job. For more information, refer to Job Exit Codes.
-
CANCEL
The job execution is canceled.
-
Usage
STATUS=status
Element Description STATUS
State of the canceled job.
-
S: START
-
H: HOLD
-
6. Job Backup
If too many jobs are submitted, the JOBQ may become full and no longer able to process any new jobs. To prevent such condition, any old or unnecessary jobs must be removed from the JOBQ. If you preserve some information of those jobs to be removed, back up the spool.
The following table describes the spool backup commands.
Command | Description |
---|---|
SPOOLBACKUP |
Backs up the job to the spool and deletes the job from the JOBQ. |
SPOOLBACKUPLIST |
Displays the date of existing spool backups. |
SPOOLPS |
Displays the jobs that were backed up on the specified date. |
SPOOLRESTORE |
Copies the backed-up job into a temporary directory. |
SPOOLPSJOB |
Displays detailed information about recovered jobs. (displays the same information as PSJOB) |
SPOOLPSIO |
Displays I/O information about recovered jobs. (displays the same information as PSIO) |
SPOOLPOSPOOL |
Displays the spool list of recovered jobs. (displays the same information as POSPOOL) |
SPOOLPODD |
Displays the spool content of recovered jobs. (displays the same information as PODD) |
SPOOLCLEAR |
Deletes recovered jobs from the temporary directory. |
For more information, refer to Spool Backup Commands. |
7. Job Exit Codes
The following section describes the possible exit codes for jobs and job steps that are executed by TJES.
7.1. Step Exit Codes
Each step returns a return code in the following format.
XNNNN
-
X
Can contain one of six different EXIT states: R, S, U, A, F or M. They are defined as follows:
Item Description R
Normal condition. The application program is terminated successfully.
S
System ABEND. The application program is terminated abnormally with an error signal such as SIGPIPE, SIGSEGV, SIGABEND, and SIGBUG.
U
User ABEND. The application program reports an ABEND and is terminated through WAASABND.
A
Application ABEND. The application program is terminated successfully, but the program that received the RC (such as IKJEFT01) determine that the RC indicated an abnormal event; therefore, the job has to be terminated.
F
Flush. The application cannot run normally because of Runner’s internal error.
M
Return code issued as a result of a job execution with TYPRUN=JEM described in the JOB statement or through the tjesmgr JEM command.
-
NNNN
A number ranging from 0 to 4096 used to express the return code of the application program (return codes greater than or equal to 255 are reported through the external RC module). However, for 'F', a return value of Runner.
7.2. Job Exit Codes
The job exit code contains additional information for the job exit status and is provided in the format of XNNNNN, where X can be one of six characters (R, S, U, A, F or M) and two symbols (+, -), and each N is a number between 0 and 9.
The meaning of each exit code is dependent on the following job status.
-
DONE
XNNNNN
The exit code of the last step (the state specified in 'X' is the same as the state specified in the step exit code).
-
ERROR
XNNNNN
The exit code of the step that caused an error (the state specified in 'X' is the same as the state specified in the step exit code).
-
STOP
The user issued a STOP command to terminate the process; therefore, the exit code is meaningless.
-
FLUSH
[+-]NNNNN
Sign Description +
Indicates that the Runner detected an error and terminated execution; NNNNN is the return code from the Runner. If the error is a predefined error code in OpenFrame, the error code is returned. Otherwise, 1 is returned.
-
Indicates that the Runner was terminated abnormally after receiving a signal; NNNNN is the Unix signal number.