WebSphere MQ for iSeries Best Practice Guide Table of Contents
by user
Comments
Transcript
WebSphere MQ for iSeries Best Practice Guide Table of Contents
WebSphere MQ for iSeries Best Practice Guide Version 1.1: Initial Release Table of Contents 1 2 3 4 5 6 Introduction............................................................................................................2 WebSphere MQ maintenance and software rollout ...............................................3 2.1 Software maintenance....................................................................................3 2.2 Managing rollouts ..........................................................................................4 General housekeeping............................................................................................5 3.1 Creating and changing objects .......................................................................5 3.2 Journal housekeeping.....................................................................................5 3.3 Shared memory cleanup.................................................................................5 3.4 Queue manager shutdown..............................................................................6 3.5 12-Step / Cold Start procedure.......................................................................6 3.6 Backups..........................................................................................................8 3.6.1 Data ........................................................................................................8 3.6.2 Object definitions...................................................................................8 3.7 Multiple queue managers...............................................................................9 3.8 Daylight saving time ......................................................................................9 3.8.1 Spring time change ..............................................................................10 3.8.2 Autumn or fall time change .................................................................10 Performance .........................................................................................................11 4.1 Journal receiver location ..............................................................................11 4.2 Journal receiver switching ...........................................................................11 4.3 Restart time ..................................................................................................12 4.4 Channel process pooling ..............................................................................12 4.5 IFS Type 2....................................................................................................12 Availability ..........................................................................................................13 5.1 High Availability (HA) clustering ...............................................................13 5.1.1 Remote mirroring.................................................................................13 5.2 High Availability features............................................................................13 5.2.1 WebSphere MQ clustering...................................................................14 5.2.2 Client Channel table.............................................................................14 5.3 Network topology ........................................................................................15 5.4 Hardware assistance.....................................................................................15 Conclusion ...........................................................................................................15 1 Introduction This document is intended for people who manage WebSphere® MQ 5.2 or 5.3 software on iSeries machines. It offers advice on maintaining, managing, and improving the availability of your WebSphere MQ on iSeries installation. This guide has been written for both novice and expert users of WebSphere MQ and encapsulates general best practice information that has been collated by the IBM® development and service teams from customer installations. It is divided into four main sections: WebSphere MQ maintenance and software rollout, General housekeeping, Performance, and Availability. This guide should be used in conjunction with, and not as a replacement to, the WebSphere MQ publications. Full details of the WebSphere MQ administration and programming interfaces are available in these publications; the latest editions of all books can be downloaded from the WebSphere MQ Web site . WebSphere MQ for iSeries – Best Practice Guide Page 2 of 15 2 WebSphere MQ maintenance and software rollout 2.1 Software maintenance You should ensure that you keep current with maintenance. Preventative maintenance for WebSphere MQ is delivered by distributing Corrective Service Diskettes (CSDs) to customers (as Final PTFs) and corrective maintenance is carried out via iSeries Test Fix delivery. A CSD contains a number of cumulative fixes; these PTFs are made available every 3-4 months. The entire set of CSD PTFs can be ordered using a single “marker” PTF. A Test Fix is an OS/400 specific method of delivery. It lets a customer use the Load PTF (LODPTF) and Apply PTF (APYPTF) CL commands to apply the fix, and the remove PTF (RMVPTF) CL command to remove it. A Test Fix (unlike a PTF) is not the final delivery vehicle for the fix. The final fix is delivered in the CSD. Test Fixes cannot be superseded by another PTF (all Test Fixes need to be applied on top of the last CSD), so there is a requirement to remove a Test Fix before applying a new CSD or another Test Fix that fixes the same object. Producing Test Fixes that cannot supersede other Test Fixes, allows WebSphere MQ Service to target individual fixes to specific customers because there are no dependencies that force a Test Fix to drag in fixes devised for other customers. This process is consistent with the other WebSphere MQ platforms on the common code base (though the fixes are packaged differently). It allows emergency fixes to be delivered quickly, in a format that allows customers to use standard PTF commands to apply, remove, and track these fixes, and allows IBM to target individual fixes to specific customers. You should not need to apply every CSD/PTF on the day it is issued, but you should be aware of what has been released. It is a good idea to check the list of APARs that have been included in each PTF. Expect that at least one of the APARs could show up in your system unless you upgrade. Plan to install PTFs on a test machine, and then migrate to production systems. If a high impact pervasive problem arises in-between the release of WebSphere MQ CSDs, as with other iSeries products, the service team will release a set of HIPER (High Impact PERvasive) PTFs so a fix can be provided to all customers before the next CSD is released. You should ensure that you are current with both WebSphere MQ and other iSeries HIPER PTFs. Regularly review the Preventive Service Planning (PSP) information that is available online to iSeries customers and which lists all new important PTFs that have been released on the iSeries for each supported version of the operating system. It also lists any PTFs that are known to be defective and should not be applied to your system. New full versions of WebSphere MQ have historically been released every 18-24 months, with an overlap of service of approximately 12-15 months. When a new release arrives make early plans to upgrade to it. It is advisable to schedule your upgrade before service has been withdrawn for the older version. It takes time to test applications and roll the product into production systems and last-minute upgrades can run into problems if this testing is not done. We know that some sites take several WebSphere MQ for iSeries – Best Practice Guide Page 3 of 15 months to implement a replacement version, as they go through the tiers of testing. Problems need to be found early so that you have time to fix these problems before the final end of service deadline. The latest version of WebSphere MQ for iSeries at the time of writing (October 2003) is V5.3. This release contains many functional enhancements over V5.2. It also contains fixes to all the relevant, known defects that were included in the V5.2 PTFs available before V5.3 shipped. Future fixes made to V5.2 will also be included in V5.3 PTFs where appropriate. You can see the WebSphere MQ Support, Service summary for OS/400 for a summary of the problems fixed in each PTF. This page also shows the end of service date for each version of WebSphere MQ. 2.2 Managing rollouts All application and system software should be fully tested, using production workloads and configurations, on separate machines or partitions. Many customers have a multi-tier approach, covering development, system testing, production testing, and real production systems. Programs and configurations are moved between these environments using locally defined practices. A fundamental rule is that no changes should be made without change control and audit trail mechanisms. Do not go into production without properly testing your systems. This normally means having a spare machine of equivalent processing power that can run workloads similar to that of the production system. The machines should also, of course, use identical levels and versions of all software. While we cannot make specific recommendations, we know that any successful enterprise will have implemented a staged rollout process, testing at each stage to ensure WebSphere MQ and applications meet the availability and quality requirements. Testing needs to cover performance, capacity, and disaster recovery in addition to the basic application function. A self-contained, repeatable, regression test suite that can be enhanced as necessary contributes greatly to a successful rollout. Logical partitions can be used for some tests, as these are equivalent to having separate machines. Any recommendations in this document that refer to machines can equally be applied to separate partitions. The only difference is that the CPU cycles of a machine are shared between the partitions, so you cannot test the peak performance of a dedicated machine. WebSphere MQ for iSeries – Best Practice Guide Page 4 of 15 3 General housekeeping 3.1 Creating and changing objects It is good practice when creating queue manager objects, to put the object creation commands into an MQSC script or CL command program. If you always create WebSphere MQ objects programmatically, then you have a record of the queue manager definitions that allows you to very quickly rebuild an entire queue manager. When changing WebSphere MQ objects, change the definition of the object in the program and rerun the program rather than changing the object directly. 3.2 Journal housekeeping On iSeries, WebSphere MQ periodically issues messages AMQ7460 and AMQ7462 (AMQ7467 and AMQ7468 on other platforms) to indicate which journals are no longer needed for crash or media recovery. Older journals should be deleted or archived once they are not needed by WebSphere MQ. This should be done by an automated task that runs regularly, perhaps once per day, depending on the number of journals used each day. For a discussion and sample program showing how to automate the Housekeeping of journal receivers, see the article, Automating Journal Management in WebSphere MQ for iSeries. 3.3 Shared memory cleanup Customers have occasionally experienced problems starting a queue manager that have led to them clearing out the shared memory and semaphore files that are found in the IFS file system. The manual deletion of these files is not recommended. These files are not actually the shared memory segments and semaphores used by WebSphere MQ, they are simply used by the UNIX ftok() function to provide a unique key to the actual shared memory and semaphores. The existence of these files provides a placeholder to ensure that WebSphere MQ does not attempt to create new shared resources with the same key as existing shared resources. Manually deleting these files will orphan shared memory and semaphores, which may result in unexpected errors. In normal circumstances it should not be necessary to manually clear WebSphere MQ shared memory. The correct way to clean out shared memory in WebSphere MQ V5.2 is to end all queue managers by issuing the command: ENDMQM MQMNAME(*ALL) ENDCCTJOB(*YES) In V5.3, shared memory can be purged for a single queue manager by specifying a queue manager name on the command: ENDMQM MQMNAME(queuemanager) ENDCCTJOB(*YES) WebSphere MQ for iSeries – Best Practice Guide Page 5 of 15 3.4 Queue manager shutdown Customers have found that they occasionally have trouble shutting down their queue managers on busy systems within a reasonable period of time. Often this is related to the time required to end large numbers of channel jobs connected into the queue manager. The recommended way to quiesce a queue manager is to use the End Message Queue Manager (ENDMQM) command with ENDCCTJOB(*YES) to end connected jobs, listeners etc. The ENDCCTJOB(*YES) option forces the queue manager to record media images (with an implicit RCDMQMIMG) before it is shut down. The implicit RCDMQMIMG can take some time. To speed up the ENDMQM OPTION(*IMMED) ENDCCTJOB(*YES) you can first issue an ENDMQM *CTRLD. This command returns immediately and flags the queue manager as ending. Issuing this initial controlled end of the queue manager will prevent the implicit RCDMQMIMG. The ENDCCTJOB(*YES) will still run, ending the queue manager immediately and shut down any jobs connected to the queue manager. If you bypass the RCDMQMIMG on ENDMQM in this way, then you should consider issuing a manual RCDMQMIMG command when the queue manager restarts to ensure you have up to date media images of the queues in the journal. Splitting queue managers may also help to alleviate this problem by distributing the channel jobs between the queue managers (Section 3.7). Migrating to WebSphere MQ V5.3 offers some other options for solving this problem: • WebSphere MQ V5.3 offers a new feature called channel process pooling which allows channels to be run as threads within a pool of processes. Each job can run 60 channel threads, which will significantly reduce the number of channel jobs on the system (see section 4.4) • WebSphere MQ V5.3 has improvements to prevent channel jobs being started by the channel listener when a queue manager is quiescing. This lowers the overhead on the system when a queue manager shuts down. 3.5 12-Step / Cold Start procedure In the past, some customers have experienced problems starting their WebSphere MQ queue managers on iSeries. These problems have manifested themselves as failures in the Start Message Queue Manager (STRMQM) command, or as apparent “hangs” while running STRMQM (in fact this is more likely to be STRMQM running extremely slowly. A problem in this area was fixed with CSD06 in V5.2). To recover from these problems, the queue manager is often restarted using a procedure referred to as the “12-step” or “cold start” procedure. This procedure involves deleting the existing AMQAJRN journal and journal receivers for the queue manager, and creating new empty journals and journal receivers. WebSphere MQ for iSeries – Best Practice Guide Page 6 of 15 Performing a cold start is not best practice. When WebSphere MQ journals and journal receivers are deleted it must be understood that there is some danger that WebSphere MQ messages could be lost or duplicated. An explanation of why follows: Normal start-up processing involves reconciling the data in the AMQAJRN journal with the data in the IFS queue files. When a queue manager is restarted using the cold start procedure, there is no journal data to replay, so WebSphere MQ cannot perform its normal start-up processing. The start-up reconciliation is necessary because WebSphere MQ uses a technique called “write-ahead” logging. This means that whenever WebSphere MQ puts or gets persistent messages, the put or get is recorded in two ways: 1. With a forced disk write to the AMQAJRN Journal, meaning that WebSphere MQ waits for the OS/400 to confirm that the disk has been physically updated. 2. With a lazy write to the queue, WebSphere MQ caches some data in memory, and writes some data to the IFS queue file with an OS/400 unforced write. The unforced write will be stored in OS/400 file system buffers and written to disk at the operating system's convenience. The journal data on disk is therefore the master copy of WebSphere MQ data and is always more up to date than the queue file data. In normal circumstances when WebSphere MQ is shut down and restarted, STRMQM processing ensures that the data in the queue files is bought up to date with the data in the journals. If a queue manager or system terminates abnormally, then information about messages may exist in the journal, but not in the queue files. Deleting the journal as part of a cold start obviously deletes the only copy of these messages. Thus, the cold start, by deleting the journals and bypassing any STRMQM reconciliation, could lead to loss, duplication, or corruption of data. Because the data in the journal receivers is so important to the recovery of MQ data, we strongly recommend that the journal receivers are stored on RAID protected disk, and that the cold start is avoided wherever possible. IBM is committed to improving the performance of the WebSphere MQ start-up procedure, which will make the cold start procedure unnecessary. However, it may be that IBM cannot investigate some start-up problems because customers use the cold start procedure to restart the queue manager, without reporting the problem to IBM support. These start-up problems will continue to occur unless IBM can get good diagnostic information about the problems to fix them. IBM would therefore request that in these circumstances, the queue manager data (queue manager library and IFS) is saved, and a STRMQM command is run with WebSphere MQ trace (TRCMQM) WebSphere MQ for iSeries – Best Practice Guide Page 7 of 15 turned on prior to taking the 12-step cold start procedure. The resulting data will give IBM the best possible chance of finding and fixing STRMQM problems. 3.6 Backups Two methods to consider when planning a backup and recovery strategy for WebSphere MQ are the: Data backup; and Object definition backup. These methods are complementary, and most enterprises successfully implement a combination of these two techniques. 3.6.1 Data It is necessary to quiesce a queue manager before fully backing up its IFS and journal data. However you can take a backup of just the journal data while the queue manager is running. If the backed up journal data is restored over an older backup of the entire queue manager, then it is possible to fully recover a queue manager and its’ data. WebSphere MQ for iSeries V5.3 System Administration Guide, Chapter 7 has instructions on how to do this. It is a good idea to perform a full backup of the WebSphere MQ libraries and IFS directories occasionally (for example, when the queue manager is quiesced to apply maintenance). Journal backups can be taken more frequently, for example, daily or weekly. 3.6.2 Object definitions Data backups will let you recover object definitions and messages up to the last backup, but if you are using WebSphere MQ for non-persistent messaging, then it is simpler to just backup your WebSphere MQ queue manager definitions. This lets you quickly recreate a copy of a queue manager. SupportPac - MS03 – saves the queue manager definitions as MQSC commands that can be replayed to recreate the objects. The SupportPac can be downloaded from: http://www.ibm.com/software/integration/support/supportpacs/individual/ms03.html MS03 does not save or restore the queue manager’s authority records, but a utility program (AMQOAMD) ships with WebSphere MQ for this purpose. Calling AMQOAMD with the “-s” flag will dump the authority records in the form of GRTMQMAUT commands that can be replayed to recreate the authority records. So to dump the authority commands to a file the following approach can be used: OVRDBF FILE(STDOUT) TOFILE(&OUTLIB/&OUTFILE) MBR(&OUTMEMB) CALL AMQOAMD PARM('-m' &MQMNAME '-s') DLTOVR FILE(STDOUT) MS03 and AMQOAMD provide a quick and lightweight way of backing up and restoring a queue manager’s definitions from one machine to another, or even between queue managers on a single machine. WebSphere MQ for iSeries – Best Practice Guide Page 8 of 15 The alternative to using MS03 is using one of the third-party system management tools that hold queue manager configurations in a central repository. 3.7 Multiple queue managers Prior to V5.1, WebSphere MQ on iSeries only supported a single queue manager per machine, and so, many disparate applications were forced to share a single queue manager. Despite the introduction of multiple queue manager support in v5.1, many customers are still using one queue manager for all their applications. Sharing queue managers in this way can lead to some problems that can be alleviated by allocating individual queue managers to applications. Changing a WebSphere MQ topology from a shared queue manager to multiple queue managers can involve changes to applications, and so is not a trivial undertaking. The following advantages and disadvantages should be reviewed when considering how your applications are distributed over queue managers. Advantages of a single queue manager: • Low overhead - There is a fixed minimum overhead associated with a queue manager. Each queue manager starts at least five jobs before any applications connect. The fewer queue managers, the fewer system resources used. • Management - An increase to the number of queue managers may increase the management tasks that need to be performed. Running each queue manager in a separate subsystem is a good strategy to simplify queue manager maintenance if multiple queue managers are used. Advantages of multiple queue managers: • Failure impact - A single queue manager is a single point of failure for all applications. Multiple queue managers can reduce the impact of a failure to only the application(s) or location(s) being served by that queue manager. • Scalability - On a single queue manager system, one high volume application can adversely affect the performance of all other applications using the queue manager. Having multiple queue managers gives scope for greater throughput on a queue manager by queue manager basis . For example, if a queue manager’s performance is suffering because of a bottleneck when writing to the journals, it is possible to move the journals for that queue manager to a dedicated disk. If the workload demands, it becomes easier to move the queue manager and its applications to a dedicated machine. • Availability planning – Multiple queue managers can simplify planning for high availability. It allows individual queue managers and applications to be considered as a single failover unit that can be failed over to another machine without affecting services for other applications. 3.8 Daylight saving time OS/400 does not have automatic provision to adjust the clock for daylight saving time. iSeries users must adjust the system clock (and UTC offset) when making adjustments for daylight savings time. WebSphere MQ for iSeries – Best Practice Guide Page 9 of 15 This causes problems for WebSphere MQ on iSeries, because WebSphere MQ uses timestamps based on the system clock to access data in the queue manager’s journal. If the system clock changes while WebSphere MQ is running, then WebSphere MQ can fail to access journal data correctly. It is therefore necessary to quiesce WebSphere MQ before changing the system clock. 3.8.1 Spring time change When the clocks go forward one hour in the spring, WebSphere MQ can just be shut down for the time it takes to adjust the clock. The queue manager can be restarted immediately after changing the clock and UTC offset. 3.8.2 Autumn or fall time change When the clocks go backward in the autumn or fall, you cannot restart the queue manager immediately after changing the clock backwards. If you do so, there is a risk that WebSphere MQ will write duplicate timestamps to the journal. You should ensure that WebSphere MQ is stopped for an hour either before, or after the time change and UTC offset update, to avoid the problems associated with setting the system clock backward by an hour. In environments where downtime must be minimized, an enforced outage of one hour may not be acceptable. IBM is looking into providing a better solution, but until this is available, the only alternative to quiescing the queue manager for an hour is to perform a controlled cold start of the system (see section 3.5 for a discussion of the Cold Start). A controlled cold start is one where all queues are emptied of any persistent messages and the queue manager is cleanly shut down. The queue manager journal data can then be deleted per the cold start procedure. This eliminates the risk of losing messages, but it still deletes all media recovery information. You will not be able to recover damaged objects without media recovery information, so you should ensure that you have backed up your object definitions prior to attempting this (see section 3.6.2). Your IBM service representative will be able provide details of the cold start procedure. WebSphere MQ for iSeries – Best Practice Guide Page 10 of 15 4 Performance 4.1 Journal receiver location The critical path for performance of persistent messages is usually the update of the WebSphere MQ log files (journals on iSeries), these updates should ideally be isolated using operating system facilities so that they are written using dedicated disk heads. This is achieved on OS/400 by putting the Queue Manager AMQAJRN and its receivers into a separate Auxiliary Storage Pools. 4.2 Journal receiver switching Some customers have encountered performance problems when WebSphere MQ journal receivers switch. This is because when a receiver switch happens, WebSphere MQ uses a high level lock to protect data while it builds an in-memory image of the journal receiver chain. In extreme cases WebSphere MQ can appear to hang while the AMQALMPX job builds this image. Important: In WebSphere MQ V5.3, a significant amount of work has been done to minimize the impact of switching journal receivers by caching information about the journal receiver chains, and optimizing the API calls used to retrieve journal information. The journal receiver switch should be significantly faster in WebSphere MQ V5.3. This discussion concerns WebSphere MQ V5.2 and earlier versions There is a direct relationship between the amount of journal data that is stored on the system and the amount of time WebSphere MQ takes to perform a journal switch, so it pays to remove redundant receivers as described in section 3.2. You can reduce the number of times you switch log files by using a small number of large journal receivers as opposed to a large number of small journal receivers. The optimum size for journal receivers depends on workload and the amount of persistent data passing through the queue manager. You can avoid journal receiver switches during busy periods by making the journal receivers large enough to contain a full day’s data. At close of day, journal receivers should be switched manually with CHGJRN *GEN to ensure a new receiver is used the next day. You change the size of WebSphere MQ’s journal receiver by creating a new receiver with the desired size, and attaching it to the journal. All subsequent receivers will be created with the new size. Use the following commands to do this. CRTJRNRCV JRNRCV(QMGRLIB/AMQAnnnnnn) THRESHOLD(NEW_SIZE) TEXT('MQM local journal receiver') AUT(*EXCLUDE) CHGOBJOWN OBJ(QMGRLIB/AMQAnnnnnn) OBJTYPE(*JRNRCV) + NEWOWN(QMQM) CHGJRN JRN(QMGRLIB/AMQAJRN) JRNRCV(QMGRLIB/AMQAnnnnnn) WebSphere MQ for iSeries – Best Practice Guide Page 11 of 15 …where QMGRLIB is the name of your queue manager library, AMQAnnnnnn is the name of the next journal receiver in sequence, and NEW_ SIZE is the new receiver size. 4.3 Restart time If a queue manager is not ended with the normal ENDMQM command, STRMQM will take longer than if the queue manager was shut down normally. The queue manager restart time after an abnormal shutdown is heavily dependent on the amount of work needed to replay and recover transactions that were in-flight when the queue manager shut down. If queue data files are found to be corrupt when replaying transactions, then the queue files must be recovered. Queues are recovered from the last recorded media image, and all operations (put/get) are replayed from the journals into this queue. To reduce restart time: • Make sure that applications check for “fail if quiescing” when using MQGET, and roll back their Units of Work. • Where possible, write your applications so that units of work are short lived. • Use the RCDMQMIMG command regularly to update the media recovery images for queues. This will reduce the amount of data replayed if the queue files become corrupt. 4.4 Channel process pooling As already mentioned, the channel process-pooling feature introduced in WebSphere MQ V5.3 lets channels run as threads within a pool of processes. Each job can run 60 channel threads, which will significantly reduce the number of channel jobs on the system. This improves channel start-up performance and reduces overall system overhead. Channel process pooling is the default behaviour for new queue managers in WebSphere MQ V5.3. You can turn on channel process pooling for migrated queue managers, by adding the “ThreadedListener=YES” value to the Channels stanza in the qm.ini file. For example: Channels: ThreadedListener=YES Important: If you use Channel process pooling you must ensure that your channel exits are thread-safe. 4.5 IFS Type 2 As WebSphere MQ uses the IFS extensively you may want to consider converting to IFS Type 2 directories. Please see a discussion of IFS Type 2. WebSphere MQ for iSeries – Best Practice Guide Page 12 of 15 5 Availability 5.1 High Availability (HA) clustering High Availability (HA) clustering is a general term for systems where a service is automatically restarted, perhaps on a different box, when a failure is discovered. WebSphere MQ provides technology for integrating with HA frameworks where the WebSphere MQ data and log files are stored on a disk that can be accessed by more than one machine (not necessarily simultaneously). When a machine fails, the disk is switched to the other machine and the queue manager is restarted. There will be a short delay while the takeover occurs, queue managers outside the HA cluster will automatically reconnect as channel retry kicks in, and no persistent messages are lost. OS/400 V5R2 introduced the ability to place libraries onto disks that can be switched between two machines (Independent Auxiliary Storage Pools or IASPs). This makes it possible to develop an HA clustering solution for WebSphere MQ. IBM will provide a SupportPac towards the end of 2003 that shows how to configure OS/400 HA clusters with WebSphere MQ. This will be available for download from the WebSphere MQ SupportPac Web page: 5.1.1 Remote mirroring Without a shared disk, and therefore without a “standard” HA integration facility, an alternative approach to restarting a queue manager on a second machine is to mirror the contents of the disks holding WebSphere MQ data and logs. Provided the mirror is precisely synchronized with the original data, then this exactly has the same availability characteristics as an HA cluster. If, however, the mirror is an asynchronous process, there is a possibility that log records written by the queue manager might not have been copied before the failure, and therefore that the rebuilt queue manager image can have missed updates. This could result in lost or duplicated messages. There are several vendor products that work using a combination of mirroring disk files and extracting data out of WebSphere MQ journals. Any true mirror is likely to introduce a performance impact, as any forced update (flush) to the disk is going to have to be written to the mirrored system before the queue manager can continue. This performance hit may or may not be acceptable, depending on customer requirements; we would recommend running a production-level workload to ensure performance is adequate and data replication is complete. 5.2 High Availability features WebSphere MQ includes a number of features that let applications become more resilient to queue manager or hardware failures. We consider two in more detail here: WebSphere MQ queue manager clusters, and the client channel table. Both queue manager clustering and client channel table reconnection techniques assume that any one of the queue managers in the cluster / table is capable of delivering the service WebSphere MQ for iSeries – Best Practice Guide Page 13 of 15 that the client needs. By duplicating services on all queue managers, you eliminate single points of failure. 5.2.1 WebSphere MQ clustering WebSphere MQ clustering provides two main benefits: reduced administration and workload distribution (WLM). The WLM algorithm routes each inbound message to one of the available queue managers that contains the target named queue. If a queue manager is not currently running, then the messages are sent elsewhere; if no queue manager is running that hosts the named queue, the messages remain on the originating system until a server queue manager is restarted. This approach allows new work to be injected into the cluster, even when some of the server machines are not running. However messages which have been sent to a queue manager, which subsequently fails, cannot be processed until the queue manager restarts. These messages are often called ‘marooned’. A WebSphere MQ cluster is especially effective if there are no affinities between individual messages enabling them to be processed in any sequence. Affinities can be maintained if the application is written to use options in the MQI, but this has to be a conscious decision based on the business requirements. Many customers are using WebSphere MQ clustering successfully in large configurations; it is our normal recommendation for managing a set of queue managers that need to share workload. More information about Clustering can be found in these online WebSphere MQ manuals. 5.2.2 Client Channel table Available previously on other platforms, the Client Connection Table was introduced in WebSphere MQ V5.3 for iSeries as a way of building more resilient client applications without clustering. The Client Channel table is created by the iSeries server queue manager and contains a list of Client Connection (*CLTCN) channels that have been created on the server. The table allows client applications to select the queue manager they connect to at run time via a list. If the client uses a wildcard in the queue manager name on the MQCONN, the client code will pick the first available queue manager from the preconfigured Client Channel table. If the connection to the queue manager subsequently fails for any reason, the client can be programmed to detect the failure and attempt the connection again. The client code will pick the next available queue manager from the Client Channel table. In this way, the client application can recover from server failures. WebSphere MQ for iSeries – Best Practice Guide Page 14 of 15 5.3 Network topology There are times when it is appropriate to share workload between multiple systems even when a single machine could theoretically handle the entire capacity. The decision to partition the workload here is often made for geographic or organizational reasons. For example, it might be appropriate to have work executed in a regional center instead of sending it all to a central location. There are no fixed rules about “business-driven” partitioning. However you need to consider the availability (including bandwidth) of networks between processing centers and the amount of inter-server messaging. Using WebSphere MQ clustering, perhaps with a custom-written workload exit, would be a good way to direct messages to the nearest available server. Designing an efficient queue manager and network topology is something that requires an analysis of the expected message traffic patterns. This activity needs to consider the other material in this document to ensure good availability and performance of all of the queue managers in the network, avoiding single failure points and bottlenecks. 5.4 Hardware assistance There are a number of hardware technologies that can improve the availability of a WebSphere MQ system. These include Uninterruptible Power Supplies and RAID technologies for disks. Use of these facilities is transparent to WebSphere MQ, but should be considered as part of the planning process when ordering machine configurations. 6 Conclusion This article discussed some of the best practices that will help you to get the most out of WebSphere MQ on iSeries. These practices will help you keep your system up to date, safely backed up, available, and performing well. WebSphere MQ is a versatile product that is used in many environments, and that makes it difficult to describe procedures that will cover every eventuality. If you have a “best practice” that is not covered here, the authors would be interested to hear about it. IBM and WebSphere are trademarks or registered trademarks of IBM Corporation in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. IBM copyright and trademark information WebSphere MQ for iSeries – Best Practice Guide Page 15 of 15