WebSphere MQ for iSeries Best Practice Guide Table of Contents

by user

on 15 сентября 2016

Category: Documents

>> Downloads: 11

views

Report

Comments

Description

Download WebSphere MQ for iSeries Best Practice Guide Table of Contents

Transcript

WebSphere MQ for iSeries Best Practice Guide Table of Contents

WebSphere MQ for iSeries Best Practice Guide
Version 1.1: Initial Release
Table of Contents
1
2
3
4
5
6
Introduction............................................................................................................2
WebSphere MQ maintenance and software rollout ...............................................3
2.1
Software maintenance....................................................................................3
2.2
Managing rollouts ..........................................................................................4
General housekeeping............................................................................................5
3.1
Creating and changing objects .......................................................................5
3.2
Journal housekeeping.....................................................................................5
3.3
Shared memory cleanup.................................................................................5
3.4
Queue manager shutdown..............................................................................6
3.5
12-Step / Cold Start procedure.......................................................................6
3.6
Backups..........................................................................................................8
3.6.1
Data ........................................................................................................8
3.6.2
Object definitions...................................................................................8
3.7
Multiple queue managers...............................................................................9
3.8
Daylight saving time ......................................................................................9
3.8.1
Spring time change ..............................................................................10
3.8.2
Autumn or fall time change .................................................................10
Performance .........................................................................................................11
4.1
Journal receiver location ..............................................................................11
4.2
Journal receiver switching ...........................................................................11
4.3
Restart time ..................................................................................................12
4.4
Channel process pooling ..............................................................................12
4.5
IFS Type 2....................................................................................................12
Availability ..........................................................................................................13
5.1
High Availability (HA) clustering ...............................................................13
5.1.1
Remote mirroring.................................................................................13
5.2
High Availability features............................................................................13
5.2.1
WebSphere MQ clustering...................................................................14
5.2.2
Client Channel table.............................................................................14
5.3
Network topology ........................................................................................15
5.4
Hardware assistance.....................................................................................15
Conclusion ...........................................................................................................15
1 Introduction
This document is intended for people who manage WebSphere® MQ 5.2 or 5.3
software on iSeries machines. It offers advice on maintaining, managing, and
improving the availability of your WebSphere MQ on iSeries installation. This guide
has been written for both novice and expert users of WebSphere MQ and encapsulates
general best practice information that has been collated by the IBM® development
and service teams from customer installations. It is divided into four main sections:
WebSphere MQ maintenance and software rollout, General housekeeping,
Performance, and Availability.
This guide should be used in conjunction with, and not as a replacement to, the
WebSphere MQ publications. Full details of the WebSphere MQ administration and
programming interfaces are available in these publications; the latest editions of all
books can be downloaded from the WebSphere MQ Web site .
WebSphere MQ for iSeries – Best Practice Guide
Page 2 of 15
2 WebSphere MQ maintenance and software rollout
2.1 Software maintenance
You should ensure that you keep current with maintenance. Preventative maintenance
for WebSphere MQ is delivered by distributing Corrective Service Diskettes (CSDs)
to customers (as Final PTFs) and corrective maintenance is carried out via iSeries
Test Fix delivery. A CSD contains a number of cumulative fixes; these PTFs are
made available every 3-4 months. The entire set of CSD PTFs can be ordered using a
single “marker” PTF.
A Test Fix is an OS/400 specific method of delivery. It lets a customer use the Load
PTF (LODPTF) and Apply PTF (APYPTF) CL commands to apply the fix, and the
remove PTF (RMVPTF) CL command to remove it. A Test Fix (unlike a PTF) is not
the final delivery vehicle for the fix. The final fix is delivered in the CSD. Test Fixes
cannot be superseded by another PTF (all Test Fixes need to be applied on top of the
last CSD), so there is a requirement to remove a Test Fix before applying a new CSD
or another Test Fix that fixes the same object.
Producing Test Fixes that cannot supersede other Test Fixes, allows WebSphere MQ
Service to target individual fixes to specific customers because there are no
dependencies that force a Test Fix to drag in fixes devised for other customers. This
process is consistent with the other WebSphere MQ platforms on the common code
base (though the fixes are packaged differently). It allows emergency fixes to be
delivered quickly, in a format that allows customers to use standard PTF commands to
apply, remove, and track these fixes, and allows IBM to target individual fixes to
specific customers.
You should not need to apply every CSD/PTF on the day it is issued, but you should
be aware of what has been released. It is a good idea to check the list of APARs that
have been included in each PTF. Expect that at least one of the APARs could show up
in your system unless you upgrade. Plan to install PTFs on a test machine, and then
migrate to production systems.
If a high impact pervasive problem arises in-between the release of WebSphere MQ
CSDs, as with other iSeries products, the service team will release a set of HIPER
(High Impact PERvasive) PTFs so a fix can be provided to all customers before the
next CSD is released. You should ensure that you are current with both WebSphere
MQ and other iSeries HIPER PTFs. Regularly review the Preventive Service Planning
(PSP) information that is available online to iSeries customers and which lists all new
important PTFs that have been released on the iSeries for each supported version of
the operating system. It also lists any PTFs that are known to be defective and should
not be applied to your system.
New full versions of WebSphere MQ have historically been released every 18-24
months, with an overlap of service of approximately 12-15 months. When a new
release arrives make early plans to upgrade to it. It is advisable to schedule your
upgrade before service has been withdrawn for the older version. It takes time to test
applications and roll the product into production systems and last-minute upgrades
can run into problems if this testing is not done. We know that some sites take several
WebSphere MQ for iSeries – Best Practice Guide
Page 3 of 15
months to implement a replacement version, as they go through the tiers of testing.
Problems need to be found early so that you have time to fix these problems before
the final end of service deadline.
The latest version of WebSphere MQ for iSeries at the time of writing (October 2003)
is V5.3. This release contains many functional enhancements over V5.2. It also
contains fixes to all the relevant, known defects that were included in the V5.2 PTFs
available before V5.3 shipped. Future fixes made to V5.2 will also be included in
V5.3 PTFs where appropriate. You can see the WebSphere MQ Support,
Service summary for OS/400 for a summary of the problems fixed in each PTF. This
page also shows the end of service date for each version of WebSphere MQ.
2.2 Managing rollouts
All application and system software should be fully tested, using production
workloads and configurations, on separate machines or partitions.
Many customers have a multi-tier approach, covering development, system testing,
production testing, and real production systems. Programs and configurations are
moved between these environments using locally defined practices. A fundamental
rule is that no changes should be made without change control and audit trail
mechanisms. Do not go into production without properly testing your systems. This
normally means having a spare machine of equivalent processing power that can run
workloads similar to that of the production system. The machines should also, of
course, use identical levels and versions of all software.
While we cannot make specific recommendations, we know that any successful
enterprise will have implemented a staged rollout process, testing at each stage to
ensure WebSphere MQ and applications meet the availability and quality
requirements. Testing needs to cover performance, capacity, and disaster recovery in
addition to the basic application function. A self-contained, repeatable, regression test
suite that can be enhanced as necessary contributes greatly to a successful rollout.
Logical partitions can be used for some tests, as these are equivalent to having
separate machines. Any recommendations in this document that refer to machines can
equally be applied to separate partitions. The only difference is that the CPU cycles of
a machine are shared between the partitions, so you cannot test the peak performance
of a dedicated machine.
WebSphere MQ for iSeries – Best Practice Guide
Page 4 of 15
3 General housekeeping
3.1 Creating and changing objects
It is good practice when creating queue manager objects, to put the object creation
commands into an MQSC script or CL command program. If you always create
WebSphere MQ objects programmatically, then you have a record of the queue
manager definitions that allows you to very quickly rebuild an entire queue manager.
When changing WebSphere MQ objects, change the definition of the object in the
program and rerun the program rather than changing the object directly.
3.2 Journal housekeeping
On iSeries, WebSphere MQ periodically issues messages AMQ7460 and AMQ7462
(AMQ7467 and AMQ7468 on other platforms) to indicate which journals are no
longer needed for crash or media recovery. Older journals should be deleted or
archived once they are not needed by WebSphere MQ. This should be done by an
automated task that runs regularly, perhaps once per day, depending on the number of
journals used each day.
For a discussion and sample program showing how to automate the Housekeeping of
journal receivers, see the article, Automating Journal Management in WebSphere MQ
for iSeries.
3.3 Shared memory cleanup
Customers have occasionally experienced problems starting a queue manager that
have led to them clearing out the shared memory and semaphore files that are found
in the IFS file system. The manual deletion of these files is not recommended. These
files are not actually the shared memory segments and semaphores used by
WebSphere MQ, they are simply used by the UNIX ftok() function to provide a
unique key to the actual shared memory and semaphores.
The existence of these files provides a placeholder to ensure that WebSphere MQ
does not attempt to create new shared resources with the same key as existing shared
resources. Manually deleting these files will orphan shared memory and semaphores,
which may result in unexpected errors. In normal circumstances it should not be
necessary to manually clear WebSphere MQ shared memory.
The correct way to clean out shared memory in WebSphere MQ V5.2 is to end all
queue managers by issuing the command:
ENDMQM MQMNAME(*ALL) ENDCCTJOB(*YES)
In V5.3, shared memory can be purged for a single queue manager by specifying a
queue manager name on the command:
ENDMQM MQMNAME(queuemanager) ENDCCTJOB(*YES)
WebSphere MQ for iSeries – Best Practice Guide
Page 5 of 15
3.4 Queue manager shutdown
Customers have found that they occasionally have trouble shutting down their queue
managers on busy systems within a reasonable period of time. Often this is related to
the time required to end large numbers of channel jobs connected into the queue
manager.
The recommended way to quiesce a queue manager is to use the End Message
Queue Manager (ENDMQM) command with ENDCCTJOB(*YES) to end
connected jobs, listeners etc. The ENDCCTJOB(*YES) option forces the queue
manager to record media images (with an implicit RCDMQMIMG) before it is
shut down. The implicit RCDMQMIMG can take some time. To speed up the
ENDMQM OPTION(*IMMED) ENDCCTJOB(*YES) you can first issue an
ENDMQM *CTRLD. This command returns immediately and flags the queue
manager as ending. Issuing this initial controlled end of the queue manager will
prevent the implicit RCDMQMIMG. The ENDCCTJOB(*YES) will still run,
ending the queue manager immediately and shut down any jobs connected to the
queue manager. If you bypass the RCDMQMIMG on ENDMQM in this way,
then you should consider issuing a manual RCDMQMIMG command when the
queue manager restarts to ensure you have up to date media images of the queues
in the journal.
Splitting queue managers may also help to alleviate this problem by distributing the
channel jobs between the queue managers (Section 3.7).
Migrating to WebSphere MQ V5.3 offers some other options for solving this
problem:
•
WebSphere MQ V5.3 offers a new feature called channel process pooling
which allows channels to be run as threads within a pool of processes. Each
job can run 60 channel threads, which will significantly reduce the number of
channel jobs on the system (see section 4.4)
•
WebSphere MQ V5.3 has improvements to prevent channel jobs being started
by the channel listener when a queue manager is quiescing. This lowers the
overhead on the system when a queue manager shuts down.
3.5 12-Step / Cold Start procedure
In the past, some customers have experienced problems starting their WebSphere MQ
queue managers on iSeries. These problems have manifested themselves as failures
in the Start Message Queue Manager (STRMQM) command, or as apparent “hangs”
while running STRMQM (in fact this is more likely to be STRMQM running
extremely slowly. A problem in this area was fixed with CSD06 in V5.2).
To recover from these problems, the queue manager is often restarted using a
procedure referred to as the “12-step” or “cold start” procedure. This procedure
involves deleting the existing AMQAJRN journal and journal receivers for the queue
manager, and creating new empty journals and journal receivers.
WebSphere MQ for iSeries – Best Practice Guide
Page 6 of 15
Performing a cold start is not best practice. When WebSphere MQ journals and
journal receivers are deleted it must be understood that there is some danger that
WebSphere MQ messages could be lost or duplicated. An explanation of why
follows:
Normal start-up processing involves reconciling the data in the AMQAJRN journal
with the data in the IFS queue files. When a queue manager is restarted using the cold
start procedure, there is no journal data to replay, so WebSphere MQ cannot perform
its normal start-up processing.
The start-up reconciliation is necessary because WebSphere MQ uses a technique
called “write-ahead” logging. This means that whenever WebSphere MQ puts or gets
persistent messages, the put or get is recorded in two ways:
1. With a forced disk write to the AMQAJRN Journal, meaning that WebSphere
MQ waits for the OS/400 to confirm that the disk has been physically
updated.
2. With a lazy write to the queue, WebSphere MQ caches some data in memory,
and writes some data to the IFS queue file with an OS/400 unforced write.
The unforced write will be stored in OS/400 file system buffers and written to
disk at the operating system's convenience.
The journal data on disk is therefore the master copy of WebSphere MQ data and is
always more up to date than the queue file data. In normal circumstances when
WebSphere MQ is shut down and restarted, STRMQM processing ensures that the
data in the queue files is bought up to date with the data in the journals.
If a queue manager or system terminates abnormally, then information about
messages may exist in the journal, but not in the queue files. Deleting the journal as
part of a cold start obviously deletes the only copy of these messages. Thus, the cold
start, by deleting the journals and bypassing any STRMQM reconciliation, could lead
to loss, duplication, or corruption of data.
Because the data in the journal receivers is so important to the recovery of MQ data,
we strongly recommend that the journal receivers are stored on RAID protected disk,
and that the cold start is avoided wherever possible.
IBM is committed to improving the performance of the WebSphere MQ start-up
procedure, which will make the cold start procedure unnecessary. However, it may be
that IBM cannot investigate some start-up problems because customers use the cold
start procedure to restart the queue manager, without reporting the problem to IBM
support.
These start-up problems will continue to occur unless IBM can get good diagnostic
information about the problems to fix them. IBM would therefore request that in
these circumstances, the queue manager data (queue manager library and IFS) is
saved, and a STRMQM command is run with WebSphere MQ trace (TRCMQM)
WebSphere MQ for iSeries – Best Practice Guide
Page 7 of 15
turned on prior to taking the 12-step cold start procedure. The resulting data will give
IBM the best possible chance of finding and fixing STRMQM problems.
3.6 Backups
Two methods to consider when planning a backup and recovery strategy for
WebSphere MQ are the: Data backup; and Object definition backup. These methods
are complementary, and most enterprises successfully implement a combination of
these two techniques.
3.6.1 Data
It is necessary to quiesce a queue manager before fully backing up its IFS and journal
data. However you can take a backup of just the journal data while the queue
manager is running. If the backed up journal data is restored over an older backup of
the entire queue manager, then it is possible to fully recover a queue manager and its’
data.
WebSphere MQ for iSeries V5.3 System Administration Guide, Chapter 7 has
instructions on how to do this.
It is a good idea to perform a full backup of the WebSphere MQ libraries and IFS
directories occasionally (for example, when the queue manager is quiesced to apply
maintenance). Journal backups can be taken more frequently, for example, daily or
weekly.
3.6.2 Object definitions
Data backups will let you recover object definitions and messages up to the last
backup, but if you are using WebSphere MQ for non-persistent messaging, then it is
simpler to just backup your WebSphere MQ queue manager definitions. This lets you
quickly recreate a copy of a queue manager.
SupportPac - MS03 – saves the queue manager definitions as MQSC commands that
can be replayed to recreate the objects. The SupportPac can be downloaded from:
http://www.ibm.com/software/integration/support/supportpacs/individual/ms03.html
MS03 does not save or restore the queue manager’s authority records, but a utility
program (AMQOAMD) ships with WebSphere MQ for this purpose. Calling
AMQOAMD with the “-s” flag will dump the authority records in the form of
GRTMQMAUT commands that can be replayed to recreate the authority records. So
to dump the authority commands to a file the following approach can be used:
OVRDBF FILE(STDOUT) TOFILE(&OUTLIB/&OUTFILE) MBR(&OUTMEMB)
CALL AMQOAMD PARM('-m' &MQMNAME '-s')
DLTOVR FILE(STDOUT)
MS03 and AMQOAMD provide a quick and lightweight way of backing up and
restoring a queue manager’s definitions from one machine to another, or even
between queue managers on a single machine.
WebSphere MQ for iSeries – Best Practice Guide
Page 8 of 15
The alternative to using MS03 is using one of the third-party system management
tools that hold queue manager configurations in a central repository.
3.7 Multiple queue managers
Prior to V5.1, WebSphere MQ on iSeries only supported a single queue manager per
machine, and so, many disparate applications were forced to share a single queue
manager. Despite the introduction of multiple queue manager support in v5.1, many
customers are still using one queue manager for all their applications.
Sharing queue managers in this way can lead to some problems that can be alleviated
by allocating individual queue managers to applications. Changing a WebSphere MQ
topology from a shared queue manager to multiple queue managers can involve
changes to applications, and so is not a trivial undertaking. The following advantages
and disadvantages should be reviewed when considering how your applications are
distributed over queue managers.
Advantages of a single queue manager:
• Low overhead - There is a fixed minimum overhead associated with a queue
manager. Each queue manager starts at least five jobs before any applications
connect. The fewer queue managers, the fewer system resources used.
• Management - An increase to the number of queue managers may increase
the management tasks that need to be performed. Running each queue
manager in a separate subsystem is a good strategy to simplify queue manager
maintenance if multiple queue managers are used.
Advantages of multiple queue managers:
• Failure impact - A single queue manager is a single point of failure for all
applications. Multiple queue managers can reduce the impact of a failure to
only the application(s) or location(s) being served by that queue manager.
• Scalability - On a single queue manager system, one high volume application
can adversely affect the performance of all other applications using the queue
manager. Having multiple queue managers gives scope for greater throughput
on a queue manager by queue manager basis . For example, if a queue
manager’s performance is suffering because of a bottleneck when writing to
the journals, it is possible to move the journals for that queue manager to a
dedicated disk. If the workload demands, it becomes easier to move the queue
manager and its applications to a dedicated machine.
• Availability planning – Multiple queue managers can simplify planning for
high availability. It allows individual queue managers and applications to be
considered as a single failover unit that can be failed over to another machine
without affecting services for other applications.
3.8 Daylight saving time
OS/400 does not have automatic provision to adjust the clock for daylight saving
time. iSeries users must adjust the system clock (and UTC offset) when making
adjustments for daylight savings time.
WebSphere MQ for iSeries – Best Practice Guide
Page 9 of 15
This causes problems for WebSphere MQ on iSeries, because WebSphere MQ uses
timestamps based on the system clock to access data in the queue manager’s journal.
If the system clock changes while WebSphere MQ is running, then WebSphere MQ
can fail to access journal data correctly. It is therefore necessary to quiesce
WebSphere MQ before changing the system clock.
3.8.1 Spring time change
When the clocks go forward one hour in the spring, WebSphere MQ can just be shut
down for the time it takes to adjust the clock. The queue manager can be restarted
immediately after changing the clock and UTC offset.
3.8.2 Autumn or fall time change
When the clocks go backward in the autumn or fall, you cannot restart the queue
manager immediately after changing the clock backwards. If you do so, there is a risk
that WebSphere MQ will write duplicate timestamps to the journal. You should
ensure that WebSphere MQ is stopped for an hour either before, or after the time
change and UTC offset update, to avoid the problems associated with setting the
system clock backward by an hour.
In environments where downtime must be minimized, an enforced outage of one hour
may not be acceptable. IBM is looking into providing a better solution, but until this
is available, the only alternative to quiescing the queue manager for an hour is to
perform a controlled cold start of the system (see section 3.5 for a discussion of the
Cold Start). A controlled cold start is one where all queues are emptied of any
persistent messages and the queue manager is cleanly shut down. The queue manager
journal data can then be deleted per the cold start procedure. This eliminates the risk
of losing messages, but it still deletes all media recovery information. You will not be
able to recover damaged objects without media recovery information, so you should
ensure that you have backed up your object definitions prior to attempting this (see
section 3.6.2). Your IBM service representative will be able provide details of the
cold start procedure.
WebSphere MQ for iSeries – Best Practice Guide
Page 10 of 15
4 Performance
4.1 Journal receiver location
The critical path for performance of persistent messages is usually the update of the
WebSphere MQ log files (journals on iSeries), these updates should ideally be
isolated using operating system facilities so that they are written using dedicated disk
heads. This is achieved on OS/400 by putting the Queue Manager AMQAJRN and its
receivers into a separate Auxiliary Storage Pools.
4.2 Journal receiver switching
Some customers have encountered performance problems when WebSphere MQ
journal receivers switch. This is because when a receiver switch happens, WebSphere
MQ uses a high level lock to protect data while it builds an in-memory image of the
journal receiver chain. In extreme cases WebSphere MQ can appear to hang while
the AMQALMPX job builds this image.
Important: In WebSphere MQ V5.3, a significant amount of work has been done
to minimize the impact of switching journal receivers by caching information
about the journal receiver chains, and optimizing the API calls used to retrieve
journal information.
The journal receiver switch should be significantly faster in WebSphere MQ
V5.3. This discussion concerns WebSphere MQ V5.2 and earlier versions
There is a direct relationship between the amount of journal data that is stored on the
system and the amount of time WebSphere MQ takes to perform a journal switch, so
it pays to remove redundant receivers as described in section 3.2.
You can reduce the number of times you switch log files by using a small number of
large journal receivers as opposed to a large number of small journal receivers. The
optimum size for journal receivers depends on workload and the amount of persistent
data passing through the queue manager. You can avoid journal receiver switches
during busy periods by making the journal receivers large enough to contain a full
day’s data. At close of day, journal receivers should be switched manually with
CHGJRN *GEN to ensure a new receiver is used the next day.
You change the size of WebSphere MQ’s journal receiver by creating a new receiver
with the desired size, and attaching it to the journal. All subsequent receivers will be
created with the new size. Use the following commands to do this.
CRTJRNRCV JRNRCV(QMGRLIB/AMQAnnnnnn) THRESHOLD(NEW_SIZE)
TEXT('MQM local journal receiver') AUT(*EXCLUDE)
CHGOBJOWN OBJ(QMGRLIB/AMQAnnnnnn) OBJTYPE(*JRNRCV) +
NEWOWN(QMQM)
CHGJRN JRN(QMGRLIB/AMQAJRN) JRNRCV(QMGRLIB/AMQAnnnnnn)
WebSphere MQ for iSeries – Best Practice Guide
Page 11 of 15
…where QMGRLIB is the name of your queue manager library, AMQAnnnnnn is
the name of the next journal receiver in sequence, and NEW_ SIZE is the new
receiver size.
4.3 Restart time
If a queue manager is not ended with the normal ENDMQM command, STRMQM
will take longer than if the queue manager was shut down normally.
The queue manager restart time after an abnormal shutdown is heavily dependent on
the amount of work needed to replay and recover transactions that were in-flight when
the queue manager shut down.
If queue data files are found to be corrupt when replaying transactions, then the queue
files must be recovered. Queues are recovered from the last recorded media image,
and all operations (put/get) are replayed from the journals into this queue.
To reduce restart time:
• Make sure that applications check for “fail if quiescing” when using MQGET,
and roll back their Units of Work.
• Where possible, write your applications so that units of work are short lived.
• Use the RCDMQMIMG command regularly to update the media recovery
images for queues. This will reduce the amount of data replayed if the queue
files become corrupt.
4.4 Channel process pooling
As already mentioned, the channel process-pooling feature introduced in WebSphere
MQ V5.3 lets channels run as threads within a pool of processes. Each job can run 60
channel threads, which will significantly reduce the number of channel jobs on the
system. This improves channel start-up performance and reduces overall system
overhead.
Channel process pooling is the default behaviour for new queue managers in
WebSphere MQ V5.3. You can turn on channel process pooling for migrated queue
managers, by adding the “ThreadedListener=YES” value to the Channels stanza in the
qm.ini file. For example:
Channels:
ThreadedListener=YES
Important: If you use Channel process pooling you must ensure that your channel
exits are thread-safe.
4.5 IFS Type 2
As WebSphere MQ uses the IFS extensively you may want to consider converting to
IFS Type 2 directories. Please see a discussion of IFS Type 2.
WebSphere MQ for iSeries – Best Practice Guide
Page 12 of 15
5 Availability
5.1 High Availability (HA) clustering
High Availability (HA) clustering is a general term for systems where a service is
automatically restarted, perhaps on a different box, when a failure is discovered.
WebSphere MQ provides technology for integrating with HA frameworks where the
WebSphere MQ data and log files are stored on a disk that can be accessed by more
than one machine (not necessarily simultaneously). When a machine fails, the disk is
switched to the other machine and the queue manager is restarted. There will be a
short delay while the takeover occurs, queue managers outside the HA cluster will
automatically reconnect as channel retry kicks in, and no persistent messages are lost.
OS/400 V5R2 introduced the ability to place libraries onto disks that can be switched
between two machines (Independent Auxiliary Storage Pools or IASPs). This makes
it possible to develop an HA clustering solution for WebSphere MQ. IBM will
provide a SupportPac towards the end of 2003 that shows how to configure OS/400
HA clusters with WebSphere MQ. This will be available for download from the
WebSphere MQ SupportPac Web page:
5.1.1 Remote mirroring
Without a shared disk, and therefore without a “standard” HA integration facility, an
alternative approach to restarting a queue manager on a second machine is to mirror
the contents of the disks holding WebSphere MQ data and logs.
Provided the mirror is precisely synchronized with the original data, then this exactly
has the same availability characteristics as an HA cluster. If, however, the mirror is
an asynchronous process, there is a possibility that log records written by the queue
manager might not have been copied before the failure, and therefore that the rebuilt
queue manager image can have missed updates. This could result in lost or duplicated
messages.
There are several vendor products that work using a combination of mirroring disk
files and extracting data out of WebSphere MQ journals. Any true mirror is likely to
introduce a performance impact, as any forced update (flush) to the disk is going to
have to be written to the mirrored system before the queue manager can continue.
This performance hit may or may not be acceptable, depending on customer
requirements; we would recommend running a production-level workload to ensure
performance is adequate and data replication is complete.
5.2 High Availability features
WebSphere MQ includes a number of features that let applications become more
resilient to queue manager or hardware failures. We consider two in more detail here:
WebSphere MQ queue manager clusters, and the client channel table. Both queue
manager clustering and client channel table reconnection techniques assume that any
one of the queue managers in the cluster / table is capable of delivering the service
WebSphere MQ for iSeries – Best Practice Guide
Page 13 of 15
that the client needs. By duplicating services on all queue managers, you eliminate
single points of failure.
5.2.1 WebSphere MQ clustering
WebSphere MQ clustering provides two main benefits: reduced administration and
workload distribution (WLM).
The WLM algorithm routes each inbound message to one of the available queue
managers that contains the target named queue. If a queue manager is not currently
running, then the messages are sent elsewhere; if no queue manager is running that
hosts the named queue, the messages remain on the originating system until a server
queue manager is restarted.
This approach allows new work to be injected into the cluster, even when some of the
server machines are not running. However messages which have been sent to a queue
manager, which subsequently fails, cannot be processed until the queue manager
restarts. These messages are often called ‘marooned’. A WebSphere MQ cluster is
especially effective if there are no affinities between individual messages enabling
them to be processed in any sequence. Affinities can be maintained if the application
is written to use options in the MQI, but this has to be a conscious decision based on
the business requirements.
Many customers are using WebSphere MQ clustering successfully in large
configurations; it is our normal recommendation for managing a set of queue
managers that need to share workload. More information about Clustering can be
found in these online WebSphere MQ manuals.
5.2.2 Client Channel table
Available previously on other platforms, the Client Connection Table was introduced
in WebSphere MQ V5.3 for iSeries as a way of building more resilient client
applications without clustering.
The Client Channel table is created by the iSeries server queue manager and contains
a list of Client Connection (*CLTCN) channels that have been created on the server.
The table allows client applications to select the queue manager they connect to at run
time via a list. If the client uses a wildcard in the queue manager name on the
MQCONN, the client code will pick the first available queue manager from the preconfigured Client Channel table.
If the connection to the queue manager subsequently fails for any reason, the client
can be programmed to detect the failure and attempt the connection again. The client
code will pick the next available queue manager from the Client Channel table. In
this way, the client application can recover from server failures.
WebSphere MQ for iSeries – Best Practice Guide
Page 14 of 15
5.3 Network topology
There are times when it is appropriate to share workload between multiple systems
even when a single machine could theoretically handle the entire capacity. The
decision to partition the workload here is often made for geographic or organizational
reasons. For example, it might be appropriate to have work executed in a regional
center instead of sending it all to a central location.
There are no fixed rules about “business-driven” partitioning. However you need to
consider the availability (including bandwidth) of networks between processing
centers and the amount of inter-server messaging. Using WebSphere MQ clustering,
perhaps with a custom-written workload exit, would be a good way to direct messages
to the nearest available server.
Designing an efficient queue manager and network topology is something that
requires an analysis of the expected message traffic patterns. This activity needs to
consider the other material in this document to ensure good availability and
performance of all of the queue managers in the network, avoiding single failure
points and bottlenecks.
5.4 Hardware assistance
There are a number of hardware technologies that can improve the availability of a
WebSphere MQ system. These include Uninterruptible Power Supplies and RAID
technologies for disks. Use of these facilities is transparent to WebSphere MQ, but
should be considered as part of the planning process when ordering machine
configurations.
6 Conclusion
This article discussed some of the best practices that will help you to get the most out
of WebSphere MQ on iSeries. These practices will help you keep your system up to
date, safely backed up, available, and performing well.
WebSphere MQ is a versatile product that is used in many environments, and that
makes it difficult to describe procedures that will cover every eventuality. If you have
a “best practice” that is not covered here, the authors would be interested to hear
about it.
IBM and WebSphere are trademarks or registered trademarks of IBM Corporation in the
United States, other countries, or both.
Other company, product, and service names may be trademarks or service marks of others.
IBM copyright and trademark information
WebSphere MQ for iSeries – Best Practice Guide
Page 15 of 15