EBCDIC → ASCII Conversion Issues

To migrate resources from a legacy mainframe system to an open system environment using OpenFrame, you must convert the resources from EBCDIC to ASCII.

There are a number of issues that may arise during the conversion process, especially in relation to COBOL application source code. The three main issues are:

  • Hexadecimal value processing

  • Character sort order processing

  • Double byte space processing

This chapter describes the causes of these issues and provides some solutions to them.

1. Hexadecimal Processing

The following sample COBOL application source file includes logic that uses hex values. This source has been converted to ASCII by using the dsmigin tool.

01  WORK06-AREA.
           05  W06-KOUZA-NO.
               10  FILLER                  PIC X(01)  VALUE  X'F1'.
               10  W06-NO-1                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F2'.
               10  W06-NO-2                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F3'.
               10  W06-NO-3                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F4'.
               10  W06-NO-4                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F5'.
               10  W06-NO-5                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F6'.
               10  W06-NO-6                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X'F7'.
               10  W06-NO-7                PIC X(01).

Although the previous example seems to be converted successfully from EBCDIC to ASCII, it may contain the following problems.

Let’s look at the following line from the sample source.

10  FILLER                  PIC X(01)  VALUE  X'F1'.

During the conversion process, the application logic has been unintentionally modified.

Assume that X’F1' in the original code points to the EBCDIC value '1' instead of representing the character F1. The EBCDIC value '1' cannot be processed correctly through a typical character set conversion process. Therefore, after converting the source code to ASCII, you must manually change the value to '31', which is equivalent to the ASCII value of '1' as shown in the following.

10  FILLER                  PIC X(01)  VALUE  X'31'.

The following is the result of converting the original source file to ASCII by using dsmigin and then replacing the character values with their respective decimal representation.

01  WORK06-AREA.
           05  W06-KOUZA-NO.
               10  FILLER                  PIC X(01)  VALUE  X '31'.
               10  W06-NO-1                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '32'.
               10  W06-NO-2                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '33'.
               10  W06-NO-3                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '34'.
               10  W06-NO-4                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '35'.
               10  W06-NO-5                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '36'.
               10  W06-NO-6                PIC X(01).
               10  FILLER                  PIC X(01)  VALUE  X '37'.
               10  W06-NO-7                PIC X(01).

The hexadecimal processing problem is common in source code that explicitly specify hex values. However, the opposite situation may also occur.

01  YEAR-TABLE.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{JJJJJJ'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{JJJJJ'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{{JJJJ'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{{{JJJ'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{{{{JJ'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{{{{{J'.
           05  FILLER                  PIC  X(12)  VALUE '{{{{{{{{{{{{'.
           05  FILLER                  PIC  X(12)  VALUE 'A{{{{{{{{{{{'.
           05  FILLER                  PIC  X(12)  VALUE 'AA{{{{{{{{{{'.
           05  FILLER                  PIC  X(12)  VALUE 'AAA{{{{{{{{{'.
           05  FILLER                  PIC  X(12)  VALUE 'AAAA{{{{{{{{'.
           05  FILLER                  PIC  X(12)  VALUE 'AAAAA{{{{{{{'.

In the previous example, a similar problem occurs with the character '{'. This character is used not as a character but as the zoned decimal (ZD) value X’C0', which corresponds to the hex character '{' .

In this case, the ASCII ZD value X'30' corresponds to the mainframe ZD value X’C0', so the ASCII character that corresponds to X'30' must be modified to '0' to preserve the application logic. However, it is recommended that you manually modify the source code after it has been converted from EBCDIC to ASCII.

01  YEAR-TABLE.
           05  FILLER                  PIC  X(12)  VALUE '000000qqqqqq'.
           05  FILLER                  PIC  X(12)  VALUE '0000000qqqqq'.
           05  FILLER                  PIC  X(12)  VALUE '00000000qqqq'.
           05  FILLER                  PIC  X(12)  VALUE '000000000qqq'.
           05  FILLER                  PIC  X(12)  VALUE '0000000000qq'.
           05  FILLER                  PIC  X(12)  VALUE '00000000000q'.
           05  FILLER                  PIC  X(12)  VALUE '000000000000'.
           05  FILLER                  PIC  X(12)  VALUE '100000000000'.
           05  FILLER                  PIC  X(12)  VALUE '110000000000'.
           05  FILLER                  PIC  X(12)  VALUE '111000000000'.
           05  FILLER                  PIC  X(12)  VALUE '111100000000'.
           05  FILLER                  PIC  X(12)  VALUE '111110000000'.

As stated earlier, the problem with converting hex values is due to the uncertainty about whether a hex value actually represents a hex value or an EBCDIC character, and vice versa.

In order to correctly interpret hex values, you must use COBOL syntax for COBOL program source files and BMS macro syntax for BMS map files.

To evaluate hex values in a source file, an in-depth analysis must be performed by an experienced TmaxSoft consultant before the source code is converted from EBCDIC to ASCII. An automatic analysis tool is currently planned for development for future versions of OpenFrame.

2. Character Sort Order Processing

After source code is converted from EBCDIC to ASCII, you may not detect any visible errors at first. However, because of the unique characteristics of each character set, problems may become evident when you compile and run the program.

This section describes the character sort order problem caused by character set conversion and provides a solution to it.

The following example is from a source code that has been converted from EBCDIC to ASCII.

IF      W01-XX     <=       '99'     THEN
        MOVE    'Y'    TO      W01-CC
ELSE
        MOVE    'N'    TO      W01-CC
END-IF.

Although the previous example seems to be converted successfully from EBCDIC to ASCII, the conversion process may have unintentionally modified the application logic.

The following shows the sort orders for EBCDIC and ASCII characters.

  • EBCDIC: a < z < A < Z < 0 < 9

  • ASCII: 0 < 9 < A < Z < a < z

In the previous example, assume that the value of W01-XX is 'AA'. In this case, if the program is run on mainframe, W01-CC is set to 'Y'; however, if the same example is converted to ASCII and run on UNIX, W01-CC is set to 'N'.

To ensure that application logic is preserved, you must manually modify the previous example as follows:

IF      W01-XX     <       'zz'       THEN
        MOVE    'Y'    TO      W01-CC
ELSE
        MOVE    'N'    TO      W01-CC
END-IF.

The previously mentioned character sort order processing issue can be, though not simple, addressed to a certain extent by modifying the user program. However, there is another character sort order problem that may seriously affect application end users.

Application developers generally understand that they need to account for sort order differences between EBCDIC and ASCII ('ZZ' < '99' in EBCDIC and '99' < 'ZZ' in ASCII). However, application end users are generally not aware of this difference.

The following example illustrates the sort order issue that is presented to an end-user.

[User Address List]
--------------------------------------------------------------------------

    ID : AAAAAAAA


  ID              NAME                 ADDRESS
 -------------------------------------------------------------------------
  AAAAAAAA        KIM                  SEOUL
  BBBBBBBB        LEE                  PUSAN
  CCCCCCCC        PARK                 SEOUL
  HHHHHHHH        AHN                  DAEGU
  LLLLLLLL        CHO                  GWANGJU
  MMMMMMMM        CHOI                 INCHEON
  NNNNNNNN        KWAK                 BUPYOUNG
  XXXXXXXX        IM                   SUNGNAM
  ZZZZZZZZ        SEO                  GURI
--------------------------------------------------------------------------
 <F1> Menu    <F2> Prev     <F3> Next                       <Enter> Search

If this application is run on mainframe, "AAAAAAAA" (the smallest ID value) could be used to query the entire ID list. However, if the same application is converted to ASCII and run on Unix, querying "AAAAAAAA" would not provide the user with the entire ID list.

As another example, assume that the ID "11111111" exists. If the application is run on mainframe, pressing <F3> will display "11111111" on the next screen. But if the application is converted to ASCII and run on Unix, no IDs beyond "ZZZZZZZZ" will be displayed. End users accustomed to the mainframe environment might not realize that the ID "11111111" exists in the system.

The following example shows how to query the entire ID list in an open system environment by using "00000000" instead of "AAAAAAAA".

[User Address List]
--------------------------------------------------------------------------

    ID : 00000000


  ID              NAME                 ADDRESS
 -------------------------------------------------------------------------
  11111111        NOH                  SEOUL
  88888888        KANG                 DAEJEON
  AAAAAAAA        KIM                  SEOUL
  BBBBBBBB        LEE                  PUSAN
  CCCCCCCC        PARK                 SEOUL
  HHHHHHHH        AHN                  DAEGU
  LLLLLLLL        CHO                  GWANGJU
  MMMMMMMM        CHOI                 INCHEON
  NNNNNNNN        KWAK                 BUPYOUNG
--------------------------------------------------------------------------
 <F1> Menu    <F2> Prev     <F3> Next                       <Enter> Search

The character sort order issue is most easily identified through a professional analysis of each user application. This must be performed by an experienced TmaxSoft consultant. Tools will be provided in future versions of OpenFrame to automatically perform this analysis.

3. 2-byte Space Processing

In a mainframe environment, 2-byte space is X'4040', which is recognized as two 1-byte space, X'40', values.

               10  W-K-00.
                   20  W-K-00-1                    PIC  X(01).
                   20  W-K-00-2                    PIC  X(01).
               10  W-K-01        REDEFINES       W-K-00.
                   20  W-K-01-1                    PIC  G(01).

     * . . .

         MOVE SPACE TO W-K-01-1.

         IF W-K-00-1 = SPACE THEN
              DISPLAY 'DOUBLE BYTE SPACE = SINGLE BYTE SPACE * 2'
         END-IF.

If this COBOL program is executed on mainframe, the message "DOUBLE BYTE SPACE = SINGLE BYTE SPACE * 2" is displayed. If, however, the same application is converted to ASCII and then executed on Unix, no message is displayed. This demonstrates that when an application that uses 2-byte space is converted for an open system environment, the application logic can be modified unintentionally.

To use double-byte Korean characters in an open environment, they are converted to EUC-KR. However, EUC-KR character set uses X'8140' for 2-byte space and X'20' for 1-byte space. This means that the formula "double byte space = single byte space + single byte space" is not applicable for the EUC-KR character set.

This problem may or may not be solvable depending on the functions provided by the compiler.

OpenFrame attempts to resolve this problem by avoiding 2-byte spaces wherever possible. If necessary, OpenFrame uses two 1-byte spaces to replace 2-byte spaces. Sometimes 2-byte spaces can be ignored, easily resolving the problem.

You can use the OpenFrame CPM utility in situations where you must use 2-byte spaces, such as when OpenFrame data must be transferred to a mainframe environment or to a TN3270 terminal emulator where 1-byte and 2-byte characters cannot be intermixed.