Data Set Overview
This chapter provides a general overview of the data set including descriptions of the access methods, names, SMS classes, catalog registrations, management, and data set processing.
1. Overview
A data set is a file of logically linked data records. A record is the basic unit of information used in application programs. The fundamental difference between a data set and a UNIX file is that a data set provides access methods for record-level I/O.
Data sets are stored on secondary storage devices, such as hard disks or magnetic tapes, and its basic unit is a volume. While a hard disk can store all types of data sets, a magnetic tape can only store sequential data sets (SDS).
OpenFrame does not support real magnetic tape devices, but provides a set region of the hard disk mapped as a magnetic tape volume. Depending on the options, data sets stored on magnetic tape volumes can be kept in a compressed form.
2. Access Methods
The access method provides the functionality to read and store data in units of records. The access method stores data according to each data set structure and provides interfaces and utilities that process the data sets. Access methods allow record-level reading and storage of data. Each access method has its own dataset structure to store data and provides interfaces and utilities to process data sets. A particular access method used to process individual data set can be identified with its dataset structure. For example, the access methods used to process sequential data sets are BSAM or QSAM.
OpenFrame does not support the physical I/O operations executed through mainframe’s Execute Channel Program (EXCP) because general rehosted application programs rarely use the macros or functions of the mainframe’s low level access methods. OpenFrame is not merely intended to support compatibility of macros or functions of mainframes, but rather to provide a cost efficient solution to significant changes of logical/external functionalities or concept, which incurs high costs. |
The following are different types of data sets and their access methods:
-
BDAM(Basic Direct Access Method)
BDAM stores data records as the applications instruct, directly accessing the data record’s target address from 0. The BDAM data set recognizes a block of data as one record.
BDAM is the most basic type of access methods, but not much adapted for today’s business applications since it has many constraints..
-
BSAM(Basic Sequential Access Method)
The sequential data set stores data records in the order of input. Utilities organize and store multiple records in a block. BSAM methods can read or write data at the block-level.
To process sequential data sets, OpenFrame provides QSAM, which itself manages a data block buffer, instead of BSAM, which depends on the user to manage the buffer.
-
QSAM(Queued Sequential Access Method)
As with BSAM, QSAM stores records in the order of input. QSAM uses a buffering scheme to improve I/O performance, and manages the buffer for the user. The GET interface reads the next sequential record from the input buffer, and the PUT interface writes the next sequential record to the output buffer. Tasks such as reading data from disk to the input buffer or storing data from the output buffer to disk are handled by the access method.
-
ISAM(Indexed Sequential Access Method)
ISAM writes records in the order of the key field, and accesses data records through the key.
OpenFrame supports ISAM data sets by internally mapping them to VSAM’s KSDS data sets. However, it is recommended, for unified management of data sets, to convert any indexed sequential data sets to VSAM’s KSDS from the start.
-
BPAM(Basic Partitioned Access Method)
It stores data as a member of the partitioned data set (PDS). Each member of the partitioned data set can be viewed as a single sequential data set.
A partitioned data set contains the directory information that has the data addresses with the individual member’s name in the data set. OpenFrame organizes a partitioned data set as a UNIX directory.
Hitachi mainframe supports an access method called LIME, which is not supported in OpenFrame. Similar to a non-VSAM PDS, a LIME data set has multiple members. It implements PDS functionality based on VSAM and member-level locking.
-
VSAM(Virtual Storage Access Method)
VSAM is used for direct processing or sequential processing of fixed or variable records stored on disk. Data organized by VSAM must be registered in the catalog for quick access.
The VSAM data set is categorized into the following four types:
Type Description Entry-sequenced Data Set (ESDS)
Writes each record in the order it was input. New records are added at the end of the data set. Sequential processing of records is possible and direct processing using relative byte address (RBA) is also possible.
Key-sequenced Data Set (KSDS)
Records are written in the ascending order of the key fields, and can be directly processed using the key fields. OpenFrame TSAM cannot perform direct processing on KSDS data sets with target addresses.
Relative Record Data Set (RRDS)
Records are written in the order of their relative record number (RRN), and direct processing of the record is possible only with their number. RRDS can either be of fixed length or of variable length.
Linear Data Set (LDS)
LDS writes data in units of blocks. LDS, unlike other VSAM data sets, does not contain any control information. OpenFrame TSAM does not support LDS.
|
When choosing an access method for a new data set, the first thing to consider is whether sequential processing or direct processing is required. For example, RRDS is ideally suited for application programs performing only direct processing or sequential processing. BSAM (sequential data sets) or VSAM ESDS data sets are suited for application programs performing sequential processing.
3. Data Set Names
A new data set must be created with a unique name (in JCL, the data set name is used as the DSNAME).
The data set name is composed of a single name or a link of multiple name segments, with each name segment representing a qualifier level. For example, a data set with the name TPLAB.IGKANG.TEST3 is composed of three name segments. The first name segment, TPLAB, is called the high level qualifier, and the last name segment, TEST3, is called the low level qualifier.
Data sets are created with the following naming rules:
-
Each name segment can have up to eight characters.
-
A name segment must start with either a letter of the alphabet (A-Z) or a special character (@, #, $).
-
The rest of the segment can be composed of letters of the alphabet (A-Z), numbers (0-9), special characters (@, #, $), or a hyphen (-).
-
A period (.) in between name segments must be inserted as a separator.
-
The entire data set name is limited to 44 characters including the periods used to separate the name segments.
|
4. Managing Storage
If you had to create a data set by specifying a unit or volume serial used to store the data set in JCL every time you need a data set, it would be inefficient. Also, if you were responsible for managing storage while running a job, you would be required to acquire systematic knowledge about storage management, which would be time-demanding.
OpenFrame provides SMS classes as solution to these problems. This section describes how to use SMS classes (storage, management, data) for efficient storage management in the OpenFrame system.
4.1. Storage Class
The storage manager classifies the usages of all available storages according to performance and capacity of each storage device. Then, each classified storage device is registered as storage class in sms during configuration.
After the storage class is registered in the system it can be called directly in JCL DD statements without requiring the UNIT or VOL parameters. A data set that has a defined storage class is allocated data space in one of the storage volume serials that is categorized as a storage class.
Additionally, if the attribute of the data set can be inferred from the data set name, it is possible to automatically define a storage class without having to do so in the JCL DD statement.
When the storage manager registers an automatic class selection (ACS) rule, the data set that matches the respective ACS rule is defined as the storage class set by the storage manager. Space for the data set is allocated to one of the volume serials categorized as a storage class.
|
4.2. Management Class
You can delete expired data sets with SMS classes.
The storage manager categorizes data set classes according to each data set’s storage period, then registers the management class. The registered management class can be explicitly written in a JCL DD statement, like the storage class or the ACS rule, it can automatically assign the management class to the data set.
Data sets defined with a management class is automatically deleted by the OpenFrame data set management server, ofrdmsvr, when the expiration date has past.
4.3. Data Class
A data class is registered depending on the data set attribute that is used most commonly and frequently. Using data classes makes it easier for the user to write in JCL.
Attributes that define the data class of the data set can be changed within JCL parameters, unlike previously described storage classes or management classes.
The following example shows how a data set is created by explicitly setting the SMS class in the JCL without using the ACS rules (parameters VOL and UNIT are ignored in the example).
//MYJOB JOB //STEP1 EXEC PGM=IEFBR14 //MKDS1 DD DSNAME=IGKANG.TEST.DS0001,DISP=(NEW,CATLG), // DATACLAS=TEST,MGMTCLAS=NEVER, // STORCLAS=SECURE,VOL=SER=XXXXXX,UNIT=SYSDA
5. Registering Catalogs
The OpenFrame system uses volume table of contents (VTOC) and catalogs to manage the physical storage locations of the data sets. VTOC manages data set lists, data set information, physical disk address, and data set sizes in a single volume. A catalog contains data set attributes and information on the data set’s volume.
Data sets are registered in the catalog using the following:
-
JCL DISP parameter
-
Access method service ALLOCATE or DEFINE command
-
TSO ALLOCATE command
Already existing data sets can be registered in the catalog using the IDCAMS DEFINE RECATALOG command.
|
6. Processing Data Sets
Application programs process data sets in the following order in OpenFrame.
-
Allocates a data set and logically links it to the program.
The data in the allocated data set can be accessed using the interface provided by each access method.
Data set allocation involves the following two tasks:
-
Allocate disk space for the new data set.
A data set can be allocated using the following methods:
Method Description Access Method Services (AMS)
Uses IDCAMS (JSCVSUT/KQCAMS) ALLOCATE command.
Job Control Language (JCL)
Defines data sets in JCL.
-
Logically link the JOB STEP with the data set.
-
-
Use the data set’s DDNAME to open the data set.
-
Use access method to read or write data to/from the data set.
-
Close the data set.
-
Unallocate the data set.
Non-VSAM data sets are also processed in the same steps. |
6.1. Allocate/Unallocate
When you attempt to access an existing data set, the system uses the specified data set information in JCL and the catalog to find the data set address on a physical storage device.
When you try to create a new data set, the system uses the specified data set information to secure storage space for the data set on disk. The process of finding the storage device address for an existing data set or securing storage space for a new data set is called allocation.
-
Allocation for a new data set requires information about the volume and device where the data set is to be created and the data set size.
-
Allocation for an existing data set requires the volume and device information. If the data set is already in the catalog, only the data set name is required.
When allocating a data set to process it, the system prepares a structure called the data control block (DCB). The DCB is composed of buffers used to contain all the information required for data set processing and real-time I/O reporting.
Like the data set must be allocated before processing, it must be unallocated after processing the data set. The data set disposition method described in the JCL specifies how the system is to process the data set when unallocating it. For example, the disposition method for a temporary data set decides whether to retain the data set until the task is finished or to delete it right away. The disposition method for a permanent data set decides whether to register it to the master catalog or to the user catalog.
6.2. Open/Close
Before accessing a specific data set, the application must use the OPEN interface to open a data set. The OPEN interface logically links applications, access methods, and data sets together using a DCB.
The OPEN interface creates the DCB structure, initializes data sets, and prepares system buffer for data set processing.
The DCB structure is composed of the following three components:
-
Catalog information
-
JCL DD statements
-
Application information
The CLOSE interface disconnects the link between the application and the data set.
When the data set is closed, the system performs the following tasks:
-
Recording data that has not been physically written
-
Updating catalog information (such as data set size)
-
Releasing the DCB structure used for data set processing
6.3. OPEN Mode
When opening a data set, you must specify an OPEN mode. The OPEN mode used depends on whether the application is performing a read, write, or read/write operation.
Data sets can be accessed through one of the following four OPEN modes:
-
INPUT Mode
When the INPUT mode is specified, the data set can only be read. An error occurs when you attempt to modify the data set by writing, updating, or deleting a record. If an attempt is made to open a non-existent data set and the OPTIONAL parameter is not set, an error message is displayed.
-
OUTPUT Mode
When the OUTPUT mode is specified, the application creates a new data set to write records in. If an attempt to read a record from the data set is made, an error occurs. When an existing data set is opened for writing in OUTPUT mode, all previous records in the data set are deleted.
-
INPUT/OUTPUT Mode
When INPUT / OUTPUT mode is specified, it is possible to read from and write to the data set. If an attempt is made to open a non-existent data set and the OPTIONAL parameter is not set, an error message is displayed.
-
EXTEND Mode
Unlike the OUTPUT mode, the EXTEND mode is used for writing to pre-existing data sets. In EXTEND mode, the application writes new records at the end of the file. If an attempt is made to open a non-existent data set and the OPTIONAL parameter is not set, an error message is displayed.
If a non-existent data set is opened and the OPTIONAL parameter is set, an empty data set will be created. |
6.4. Access Mode
Data set access mode is the processing order of records in a data set, and is not an attribute of the data set itself but represents the method the application is going to access the data set.
The following are data set access modes:
-
Sequential Access
Sequential access mode reads or writes the records from the first record to the last record in a sequential manner. Sequentially structured data sets, such as ESDS, RRDS and KSDS, can only be accessed sequentially. For RRDS, the order is determined by RRN and KSDS by a single alphanumeric key from the primary key or the secondary keys.
-
Random Access
Random access mode processes records in the order the user wants. All ESDS, RRDS and KSDS support random access. To access data set records in this mode, the user must specify the record recognizers such as RBA, RRN, and key values depending on the data set structure type.
-
Dynamic Access
Dynamic access mode can use both sequential access and random access to the data set during application runtime. All ESDS, RRDS, and KSDS support dynamic access.
6.5. Record Access
After a data set is opened, records within the data set can be processed using various interfaces provided by the access method. The READ interface, for example, can be used to read a record from the data set and the WRITE interface to add records to the data set.
The following are the four record access interfaces:
-
WRITE (Insert/Add Records)
A new record is added to the end of a sequential data set.
In a VSAM ESDS, you cannot insert a new record in the middle of a data set – all records are added at the end of the data set. VSAM KSDS and RRDS, however, allow new records to be added anywhere in the data set using the record key for KSDS or RRDS. Attempting to add a new record with a duplicate key value triggers the DUPLICATE RECORD error message.
-
READ (Read Records)
-
Sequential Access
When the application attempts sequential access to a data set for the first time, it always starts reading from the first record. However, if a random access has been already performed on the data set before, an internal record pointer may have been changed. In such cases, it is recommended to use the START interface to initialize the record pointer before using the sequential access interface.
-
Random Access
VSAM ESDS, KSDS and RRDS support random access through record identifiers. Random access mode is not affected by the internal record pointer. To find the desired record, one of the following is required for each VSAM data set type.
Data Set Information KSDS
KEY
RRDS
RRN
ESDS
RBA
Although an ESDS can be randomly accessed by using the RBA value, it is not suitable for most business applications.
-
-
REWRITE (Update Records)
The REWRITE interface is used to update a record. However, the primary key of a KSDS record cannot be updated.
-
DELETE (Delete Records)
The DELETE interface is used to delete a record. After deleting a RRDS record, the space can be reused. However, an ESDS record cannot be deleted.