JEUS MQ Failover
This chapter describes how a JMS client can recover from a JEUS MQ server or network failure and re-establish the connection. It also explains the server configuration and server failure recovery that are required for JMS client recovery.
1. Overview
In JEUS MQ, when failure occurs a client automatically reconnects to the client application and restores the connection to a point before the failure by using the failover functionality.
Reasons for failure can be classified into the following two categories.
-
Network Failure
If a network failure occurs, a JEUS MQ server can no longer communicate with a client. It maybe that the network is down temporarily or completely unavailable, or the server is down.
If a network failure occurs, a JEUS MQ client attempts to reconnect to the failed server or to its backup server. If the attempt succeeds, the client state is recovered, and services become available again.
-
Server Failure
Server failure includes all types of failures except network failures. In general, server failures occur due to disk or database failures or a lack of memory. When a server fails, the standby backup server automatically restores data, and continues to provide the service.
To handle such failures, the network between JEUS MQ servers and required JEUS MQ client settings must be configured. Failover properties can be configured for each client by calling the client API provided by JEUS MQ.
2. Server Failover
This section describes the network configuration and other required JEUS MQ failover settings.
2.1. Network Configuration
To use JEUS MQ failover, one or more active servers must be clustered. A standby server provides a backup support and sufficient capacity during a failure and it is optional. For more information about JEUS clustering configuration, refer to JEUS Clustering in JEUS Domain Guide.
-
Active Server
The main server that processes client requests during normal operation.
-
Standby Server
The backup server that provides the services of the active server when it fails.
JEUS MQ clustering and JEUS MQ failover functions are integrated. Therefore, configuring a JEUS MQ cluster also enables JEUS MQ failover. Unlike the previous versions, active and standby servers are not configured as a pair. This allows servers to be configured more flexibly. A general configuration usually consists of many active servers and some standby servers, or active servers only.
To enable failover, the network between MQ servers is configured as in the following figure.
When an active server fails, one of the standby servers that are available takes over the operations of the failed active server. If another failure occurs on any of the active or standby servers, another standby server available takes over the failed server’s operations. If no standby server is available, one of the active servers that are available takes over the operation, thereby providing services that two servers normally provide. This operation continues until only one server is available. When the last available server fails, the JEUS MQ failover service no longer works.
Active and Standby Server Configuration
The following is an example of setting up failover between the Active server and Standby server.
-
Active Server Configuration
Active Server Failover
domain.xml<domain> ... <jms-engine> <engine-roll>Active</engine-roll> <failover-check-timeout>5</failover-check-timeout> <failover-check-count>0</failover-check-count> </jms-engine> </domain>
The following describes each configuration tag.
Tag Description <engine-roll>
Specifies the role of the JMS Engine. (Default: Active)
-
Active: Handles service during normal operation.
-
Standby: Takes over service if the Active engine fails.
<failover-check-timeout>
Specifies the duration (in seconds) to wait before rechecking the availability of the target JMS Engine after a failure is detected, prior to initiating failover. This value represents the time taken for a single attempt. (Default: 5)
<failover-check-count>
Specifies the maximum number of times to recheck the availability of the target JMS Engine after a failure is detected, prior to initiating failover. (Default: 0)
If the specified number of attempts to check the engine’s availability fails, the engine is considered unavailable, and failover is initiated. If the value is set to 0, failover occurs immediately after a failure is detected.
-
-
Standby Server Configuration
To configure the Standby server, specify 'Standby' within the <engine-roll> tag.
2.2. Configuring Connection Factories
Failover enables connection factories to redirect connection requests when a JEUS MQ server is unavailable.
The following is a sample connection factory configuration, defined within the <connection-factory> tag in the domain.xml file.
<domain> ... <jms-engine> <connection-factory> <type>queue</type> <name>qcf</name> <service>default</service> <reconnect-enabled>true</reconnect-enabled> <reconnect-period>0</reconnect-period> <reconnect-interval>5000</reconnect-interval> </connection-factory> </jms-engine> </domain>
The following describes each configuration tag.
Tag | Description |
---|---|
<reconnect-enabled> |
Specifies whether to reconnect when a failure occurs. (Default: false) To enable client recovery through reconnection in the event of a failure, set the value to true. When enabled, the JEUS MQ client will continuously attempt to reconnect to both the Active and Standby servers. |
<reconnect-period> |
Specifies the time period for attempting reconnections. If set to the default value, reconnection attempts continue indefinitely. (Default: 0) |
<reconnect-interval> |
Specifies the wait time between reconnection attempts. (Default: 5) |
2.3. Configuring Persistence Stores
When the DeliveryMode is set to PERSISTENT, messages are saved in a persistence store.
When a server fails, another active or standby server can retrieve the messages of the failed server from the persistence store to provide seamless services. The persistence store is a key resource of JEUS MQ failover function.
Before configuring a persistence store for JEUS MQ failover, the persistence store must be in a path that can be accessed by the active and standby servers.
-
Journal Log Persistence Store
To use the journal log as the persistence store, the base journal log directory (Base Dir of the journal log configuration) has to be under a directory that can be accessed by the active and standby servers. This requires a setup of a disk sharing hardware like SAN and the creation of a journal log base directory.
-
JDBC Persistence Store
To use JDBC as the persistence store, configure a data source under the <jms-engine><persistence-store><jdbc> tags in domain.xml. However, to ensure service continuity in the event of a database failure, you must also set up failover mechanisms using clustering technologies such as Tibero TAC or Oracle RAC.
If a server cannot access the persistence store, failover will be attempted with another server that can access the persistence store. |
2.4. Automatic Failback
When an active server fails over to another active or standby server, the server administrator must quickly identify possible reasons for failure and restore the failed server.
When the active server restarts, the data is migrated from the backup server to the active server and the connected clients are also reconnected to the restarted server. Such process is called failback. Failback is always performed automatically.
3. Client Failover
If a JEUS MQ client is disconnected from the server due to a server or network failure, the client attempts to reconnect to an active server and a standby server, alternating the attempts between the two servers. If successfully reconnected, the client attempts to restore the server to the state where it was before being disconnected. Such client failover process is automatically performed through JEUS MQ configurations without having to change client application source codes.
This section describes the details and restrictions of client failover process and explains how to handle a failure without message loss.
3.1. Reconnection
The "Reconnect Enabled" option determines whether to try to reconnect if the connection between a client and server is lost. This applies to all connections that are established through the connection factory. For more information, refer to Connection Factory Configuration.
To modify the reconnection configuration of a particular connection, use the "jeus.jms.client.facility.connection.JeusConnection" class, which is the JEUS MQ client API.
. . .
import jeus.jms.client.facility.connection.JeusConnection;
. . .
Context ctx = new InitialContext();
ConnectionFactory factory = ctx.lookup("connection-factory");
JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus");
connection.setReconnectEnabled(true);
connection.setReconnectInterval(1000); // 1 second
connection.setReconnectPeriod(3600000); // 1 hour
. . .
When Reconnect Enabled is set to true, the entire reconnection process is automatically performed on the client application without modifying the client source code.
3.2. Reusing the Connection Factory
In JEUS MQ, active and standby servers use the same connection factory. Once a connection factory is obtained through a JNDI lookup, it can be reused without having to look it up again when a server or network failure occurs.
3.3. Reusing Destinations
Like connection factories, active and standby servers share the same destination name. Once a destination is obtained through JNDI lookup, it can be reused without having to look it up again when a server or network failure occurs.
When a server fails, all the messages stored at the destination are restored, and the client can continue to process the messages by using the destination.
3.4. Request Blocking Time
All requests sent from JEUS MQ clients wait for a response from the server for a specific amount of time. (Default Value: 200000, Unit: ms). This wait time is configured in the <request-blocking-time> tag under the Connection Factory section in domain.xml.
To configure settings for each connection, you can use the JEUS MQ Client API "jeus.jms.client.facility.connection.JeusConnection" class.
. . .
import jeus.jms.client.facility.connection.JeusConnection;
. . .
Context ctx = new InitialContext();
ConnectionFactory factory = ctx.lookup("connection-factory");
JeusConnection connection = (JeusConnection)factory.createConnection("jeus", "jeus");
connection.setRequestBlockingTime(300000); // 5 minutes
. . .
RequestBlockingTime is also used as the default transaction timeout value for session or CA transaction.
3.5. Connection Recovery
When connection recovery is not configured, JEUS MQ connection share a physical connection (a socket) by default. But if the <reconnect-enabled> element of the connection factory configuration in domain.xml is set to true, each client gets a one-to-one connection with the socket for fail over.
When physical and logical connections establish a one-to-one relationship, a new physical connection has to be created whenever a new connection is created. Since this may result in performance degradation, the client application must be implemented to reuse connections without having to create a new one each time. |
On a connection recovery, the connection state is also recovered.
-
Start state
If Connection.start() has been called to receive messages, then it continues to receive messages after the connection recovers.
-
Stop state
If Connection.start() has been called to stop receiving messages, then it does not receive messages after the connection recovers.
Other objects created by using the connection object including sessions and connection consumers are all restored.
-
Session Recovery
Sessions that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. For more information, refer to Session Recovery.
-
Connection Consumer Recovery
Connection consumers that were created through a connection are restored when the connection recovers, unless Session.close() was called before the failure. If the connection was in the start state before the connection recovers, then the consumer will start receiving messages again after the recovery. Since the messages that were received before the failure are all returned to the server and retrieved again, the Message.getJMSRedelivered() method call for these messages may return "true".
After recovering from the failure, the methods that create sessions or connection message receivers will re-send its requests and wait for a response. If Connection.close is invoked, recovery is not performed regardless of whether or not a response is issued.
3.6. Session Recovery
Sessions are automatically restored during the connection recovery process unless Session.close() is called. In addition, other objects derived from the connection object including MessageConsumers or MessageProducers are all restored.
A session implements methods for creating various objects. The following shows how each method is used after recovery.
-
Message Creation Method
Message creation methods are immediately called regardless of the failure.
createBytesMessage() createMapMessage() createMessage() createObjectMessage() createObjectMessage(Serializable object) createStreamMessage() createTextMessage() createTextMessage(String text)
-
Queue Browser Creation Method
Queue browser creation methods complete the request after the recovery. If a failure is not handled during the RequestBlockingTime, a JMSException is raised.
createBrowser(Queue queue) createBrowser(Queue queue,String messageSelector)
-
Destination Creation Method
Destination creation methods complete the request after the recovery. If a failure is not handled during the RequestBlockingTime, a JMSException is raised.
createQueue(String queueName) createTopic(String topicName)
-
Temporary Destination Creation Method
Temporary destination creation methods complete the request regardless of failure.
createTemporaryQueue() createTemporaryTopic()
-
Message Consumer Creation Method
Message consumer creation methods complete a request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createConsumer(Destination destination) createConsumer(Destination destination, java.lang.String messageSelector) createConsumer(Destination destination, java.lang.String messageSelector,boolean NoLocal)
-
Durable Message Subscriber Creation Method
Durable message subscriber creation methods complete the request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createDurableSubscriber(Topic topic, String name) createDurableSubscriber(Topic topic,String name, String messageSelector,boolean noLocal)
-
Message Producer Creation Method
Message producer creation methods complete the request after the recovery. If a failure is not handled during RequestBlockingTime, a JMSException is raised.
createProducer(Destination destination)
If an error occurs in the session, the session transaction is affected in the following cases.
Operation | Description |
---|---|
commit() |
If an error occurs while sending and receiving a message through the message producer and consumer that are created by the transaction session, then the "jakarta.jms.TransactionRolledBackException" is generated at the first commit point and the transaction is rolled back. If there are no messages to commit at the commit point, the exception is not thrown. Even after the failure of a commit operation, the subsequent commit operations for the transaction are executed normally. If a failure that occurred during the commit operation is not recovered during the RequestBlockingTime, a JMSException is raised. In this case, the commit operation must be checked by using the administration tool. |
rollback() |
Rollback() completes a rollback request after the failure is recovered. If a failure that occurred during the rollback operation is not recovered during the RequestBlockingTime, a JMSException is raised. Even after a JMSException, the rollback operation is performed normally. |
The Session recover() method completes the recovery request even after the failure has been recovered. When an error occurs after a Session recover() is issued and the failure is not recovered when the RequestBlockingTime expires, a JMSException is raised.
When the acknowledge mode of the session is configured to Session.CLIENT_ACKNOWLEDGE, Message.acknowledge() will be issued for the unacknowledged messages that exist in the session. If an error occurs during the acknowledgement, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for each message. The exception notifies that an error has occurred during the message acknowledgement, and the message may be re-delivered.
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode(). |
3.7. Transmission Error Message Recovery
This section describes how to handle the errors that occur while sending messages through message producers.
The send() method of the message producer is blocked until the message is sent to the server and a response is returned. The following describes possible error scenarios for this process.
-
The send() method is called, but the message has not yet been sent.
After recovery, the message is sent to the server and processed successfully. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
-
The send() method is called, and the message was processed on the server. However, a network error occurs.
If the server is reconnected after recovery, a response message is successfully issued. If the failure is not recovered after the RequestBlockingTime expires, a JMSException is raised.
-
The send method is called, and the message was processed on the server. However, a server error occurs, and then the server recovers.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
-
The send method is called, and the message has not yet been processed on the server. However, a network or server error occurs.
Even if the server is reconnected after recovery, it is hard to know whether the message has been successfully transmitted. Thus, a "jeus.jms.common.message.MessageSendException" is issued through the ExceptionListener after the RequestBlockingTime expires.
The MessageID of the failed message can be obtained by calling MessageSendException.getErrorCode().
3.8. Reception Error Message Recovery
JEUS supports synchronous and asynchronous message reception methods, which perform recovery in different ways. Synchronous message reception methods are described first, followed by asynchronous message reception methods.
Recovery of Synchronously Received Messages
A message consumer can be invoked with three methods for synchronously receiving messages, MessageConsumer.receive(), MessageConsumer.receive(long timeout), and MessageConsumer.receiveNoWait().
The following describes what happens when an error occurs during each method call.
Operation | Description |
---|---|
receive() |
This method blocks until a message arrives. But when a failure occurs, it may take a long time for a message to arrive. To avoid an indefinite wait time, change the wait time to the RequestBlockingTime. If the failure is recovered before the wait time expires, send the request message again. Otherwise, a JMSException is raised. If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception indicates that an error has occurred during the message acknowledgement, and the message may be redelivered. |
receive(long timeout) |
This method blocks until a message arrives. But when a failure occurs, it may take a long time for a message to arrive. To avoid an indefinite time out, change the timeout value that is greater than the RequestBlockingTime to the RequestBlockingTime value. If the failure is recovered before the timeout expires, send the request message again. Otherwise, a JMSException is raised. If the Session.AUTO_ACKNOWLEDGE option is set, an acknowledgement is sent to the server before the message is passed to the client. If an error occurs, the ExceptionListener issues a jeus.jms.common.message.MessageAcknowledgeException for the messages that have not been acknowledged. The exception indicates that an error has occurred during the message acknowledgement, and the message may be redelivered. |
receiveNoWait() |
This method does not block even if a message does not arrive. It immediately receives the next message that has arrived. |
Recovery of Asynchronously Received Messages
Asynchronously received messages are categorized into those being processed by MessageListener.onMessage, those being acknowledged after being processed by MessageListener.onMessage, or those prefetched and waiting in the client queue.
Each category of messages goes through a different fail over process.
-
If a failure occurs while the on message method is processed, the failure is recovered first and then an acknowledgement is sent. After this, the message is normally processed.
-
If a failure occurs while an acknowledgement is being delivered, the ExceptionListener will issue a jeus.jms.common.message.MessageAcknowledgeException for the message that has not been acknowledged. The exception indicates that a failure occurred during message acknowledgement, and the message may be redelivered.
-
If a failure occurs while prefetched messages are waiting in the client queue, the failure is recovered and then the messages are sent to the server and then later to the client. The Message.getJMSRedelivered() method call for these messages may return "true".
The MessageID of the failed message can be obtained by calling MessageAcknowledgeException.getErrorCode(). |
3.9. Message Loss Prevention and Transactions
A JEUS MW fail over is automatically and transparently processed in a client application. But when messages become lost during the message transmission process, they must be processed separately by the ExceptionListener.
Message loss in an enterprise messaging application can be critical. The only way to perfectly recover from a failure while preventing message loss is to use transactions.
It is strongly recommended to use the following method to create an application.
-
In the Jakarta EE environment, messages have to be sent and received within a transaction.
-
For servlet, the UserTransaction must be looked up in the JNDI object, and messages must be sent and received within the UserTransaction.
-
For EJBs, the TransactionAttribute of the EJB method must be set to "Required" or "RequireNew" so that the messages can be sent and received within a transaction.
General Java clients call the Connection.createSession(true, Session.SESSION_TRANSACTED) method to create a session. Such sessions can send and receive messages within a transaction by calling commit() or rollback().