Comments
Description
Transcript
Best practices ®
IBM® InfoSphere Information Server ® Best practices Workload Management Server Overview and Best Practice Yong Li Development Manager – InfoSphere High Performance Engine Xiaoyan Pu Software Development Engineer Hanson Lieu Software Development Engineer Ron Liu InfoSphere Information Server Performance Len Greenwood Issued: January 2013 Software Architect Ashley Holland DataStage Software Developer Workload Management Server Overview and Best Practices Page 2 of 40 Introduction .......................................................................................................... 5 WLM Server Architecture................................................................................... 6 Installation and Configuration........................................................................... 6 Enable DataStage Operations Console and WLM Server ........................ 7 Unix Platform ......................................................................................................................................................7 Windows Platform..............................................................................................................................................7 Restart DataStage Server............................................................................... 8 Unix Platform ......................................................................................................................................................8 Windows Platform..............................................................................................................................................8 Start DataStage AppWatcher Process ....................................................... 10 Unix Platform ....................................................................................................................................................10 Windows Platform............................................................................................................................................12 WLM User Interface .......................................................................................... 12 System Policies ............................................................................................. 14 Job Count ...........................................................................................................................................................15 CPU Usage.........................................................................................................................................................15 Memory Usage ..................................................................................................................................................16 Job Start ..............................................................................................................................................................17 Queued Jobs Tab .......................................................................................... 18 Queue Management Tab............................................................................. 20 DataStage Administrator Client Integration............................................ 23 DataStage Designer Client Integration ..................................................... 24 DataStage Director Client Integration....................................................... 25 dsjobs Integration......................................................................................... 26 DataStage Operational Console Integration ............................................ 27 Workload Priority Rules ................................................................................... 27 Priority Weight ............................................................................................. 28 Job Run Ratio ................................................................................................ 28 Elapsed Time ................................................................................................ 29 WLM Server Security ........................................................................................ 29 Advanced Configurations ................................................................................ 30 WLM Configuration File............................................................................. 30 Workload Management Server Overview and Best Practices Page 3 of 40 Advanced Error Handling Configuration................................................ 30 uvconfig Tuning ........................................................................................... 31 Troubleshooting ................................................................................................. 31 Best Practice ........................................................................................................ 32 Defining queues based on different criteria............................................. 32 Job Characteristics ............................................................................................................................................32 Projects ...............................................................................................................................................................33 Parallel Configurations ....................................................................................................................................33 Development/Testing/Production Environments ........................................................................................33 Dynamic Resource Update ......................................................................... 33 Dynamic Priority Rule Update .................................................................. 34 Use Cases....................................................................................................... 34 You want no more than 20 jobs to run in the whole system, regardless of where they come from......34 You have 4 application groups who submit jobs at random. you want to ensure that at any time that a group submits a job, they can get a slot to run it in - another group can't hog the system. ................35 Conclusion .......................................................................................................... 37 Further reading................................................................................................... 38 Contributors.................................................................................................. 38 Notices ................................................................................................................. 39 Trademarks ................................................................................................... 40 Contacting IBM ............................................................................................ 40 Workload Management Server Overview and Best Practices Page 4 of 40 Introduction Many systems have a limit on one of the following resources: IO Memory CPU Network bandwidth When the system is underutilized, it causes inefficiency. When the system is overloaded, jobs often run into different problems. Common symptoms include network timeout, slow job startup, job hang, and even job crash due to out of memory. Even if some jobs run, they tend to run much slower than if they are properly scheduled due to excessive system swapping (Context switching). IBM InfoSphere Information Server Workload Management (WLM) solves this problem by regulating the workload execution environment to maximize system throughput and maintain a stable and much more predictable runtime environment. WLM is a new server component available in IBM Information Server 9.1 release. This server component is installed on the Engine tier. Its main role is to monitor the machine resource usage such as CPU/Memory, keep tracking of workload (job) count, and dynamically dispatch jobs for execution or place jobs into a queue and release the job for execution when resource becomes available. This article describes the WLM Server system architecture, installation and configuration process, WLM user interface, workload dispatching rules and best practices to leverage the power of WLM Server. Workload Management Server Overview and Best Practices Page 5 of 40 WLM Server Architecture The following diagram shows the high level WLM Server architecture: The WLM user interface is embedded as a tab in IBM InfoSphere Information Server DataStage Operations Console user interface and can be accessed using the following URL: http://host:port/ibm/iis/ds/console WLM user interface communicates with DataStage services running inside IBM WebSphere Application Server via HTTP/REST APIs. DataStage services further communicate with WLM Server process via ASBAgent proxy. Installation and Configuration WLM Server is installed as part of DataStage server installation. Workload Management Server Overview and Best Practices Page 6 of 40 By default, DataStage Operations Console and WLM Server are disabled. Follow the following procedures to turn on DataStage Operations Console and WLM Server post installation. Note that the actual WLM binary is located in /opt/IBM/InformationServer/Server/DSWLM. Enable DataStage Operations Console and WLM Server UNIX Platform Login as dsadm, edit the DataStage Operations Console configuration file: /opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg Change: DSODBON=0 to: DSODBON=1 This will enable the DataStage Operations Console feature. Change: WLMON=0 to: WLMON=1 This will enable the WLM Server feature. There are many other parameters that you can configure to further fine tune the behavior of DataStage Operations Console and WLM Server. To get started, all you need is to turn on DSODBON and WLMON. Windows Platform If your installation is on Windows server machine, then you can edit the C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg setting DSODBON=1 WLMON=1 Workload Management Server Overview and Best Practices Page 7 of 40 To turn on DataStage Operations Console and WLM Server. Restart DataStage Server Unix Platform First source the dsenv file cd /opt/IBM/InformationServer/Server/DSEngine . ./dsenv Then stop and then start the DSEngine: bin/uv -admin -stop bin/uv -admin -start Windows Platform The first option is to bring up DataStage control panel applet: Workload Management Server Overview and Best Practices Page 8 of 40 Then click Stop All Services. And then click Start All Services again to restart the DataStage Engine process. You can also run the following script to stop and start DataStage Engine: C:\work>net stop dstelnet The DataStage Telnet Service is stopping. The DataStage Telnet Service was stopped successfully. Workload Management Server Overview and Best Practices Page 9 of 40 C:\work>net stop dsengine The InfoSphere Engine Resource Service is stopping............ The InfoSphere Engine Resource Service was stopped successfully. C:\work>net stop dsrpc The DSRPC Service is stopping. The DSRPC Service was stopped successfully. C:\work>net start dsrpc The DSRPC Service is starting. The DSRPC Service was started successfully. C:\work>net start dsengine The InfoSphere Engine Resource Service is starting. The InfoSphere Engine Resource Service was started successfully. C:\work>net start dstelnet The DataStage Telnet Service is starting. The DataStage Telnet Service was started successfully. Once WLM Server is enabled in DSODBConfig.cfg file, and the DataStage server is started, WLM will be started. When the DataStage server is stopped, WLM will be stopped as well. You do not need to separately start or stop WLM. Start DataStage AppWatcher Process Unix Platform Run the following command to check the correctness of the configuration file: bash-3.2$ pwd Workload Management Server Overview and Best Practices Page 10 of 40 /opt/IBM/InformationServer/Server/DSODB bash-3.2$ bin/DSAppWatcher.sh -test DSODB is turned ON in the DSODBConfig.cfg file. Link Monitoring is OFF. Job Run Usage is ON. Resource Monitoring is ON. Checking Database Connection: Driver: com.ibm.db2.jcc.DB2Driver Connection URL: jdbc:db2://host:50000/xmeta Successfully loaded the database driver. Successfully connected to the database. Schema: DSODB DB Schema version number: 2 Test Successful. bash-3.2$ Then run the following command to start it: bash-3.2$ bin/DSAppWatcher.sh -start AppWatcher:STARTED EngMonApp:STARTING ODBQueryApp:STARTING ResMonApp:STARTING bash-3.2$ Workload Management Server Overview and Best Practices Page 11 of 40 Windows Platform Run the following command to check correctness of configuration file: C:\IBM\InformationServer\Server\DSODB>bin\DSAppWatcher.sh -test DSODB is turned ON in the DSODBConfig.cfg file. Link Monitoring is OFF. Job Run Usage is ON. Resource Monitoring is ON. Checking Database Connection: Driver: com.ibm.db2.jcc.DB2Driver Connection URL: jdbc:db2://host:50000/xmeta Successfully loaded the database driver. Successfully connected to the database. Schema: DSODB DB Schema version number: 2 Test Successful. Note that on Windows, DataStage Application Watcher process runs as Windows service, to start it properly, run the following command line or use Windows services applet to start it. C:\IBM\InformationServer\Server\DSODB>net start IBMAPWSrv The DataStage AppWatcher Service is starting......... The DataStage AppWatcher Service was started successfully. C:\IBM\InformationServer\Server\DSODB> WLM User Interface Once you have turned on DataStage Operations Console and WLM Server, you can log in to DataStage Operations Console to review all available features in the Operations Console and WLM. Workload Management Server Overview and Best Practices Page 12 of 40 When you first login to DataStage Operations Console application, you will see the following: In the Engine Status panel, wlmserver: OK indicates that WLM Server process is running properly. When you switch to the Workload Management tab, you see a screen similar to the following screen: Workload Management Server Overview and Best Practices Page 13 of 40 This interface has two main sections: • System Policies setting • Queuing System (includes Queued Jobs and Queue Management) System Policies System policies define system wide settings for the WLM Server instance. When determining whether to dispatch jobs for immediate execution, or put a job run request into the queuing system, WLM first checks System Policies before it checks queue level policies. Workload Management Server Overview and Best Practices Page 14 of 40 Job Count This setting controls how many concurrent running jobs are allowed on this server. You can adjust Job Count depending on the capacity of the system and the characteristics of the job. The default value for this setting is 20. You can increase it if the server has many CPU cores and a large amount of memory, or decrease it otherwise; Click Currently Running to open the Job Activity graph. CPU Usage When WLM detects that system CPU usage exceeds this configured threshold, it will place incoming job run requests into queue. The default value for this setting is 80%. Workload Management Server Overview and Best Practices Page 15 of 40 Click Current CPU (%) to open the CPU Usage graph. Memory Usage When WLM detects that memory usage exceeds this configured threshold, it will place incoming job run request into queue. The default value for this setting is 80%. Click Current Memory (%) to open the Memory Usage graph. Workload Management Server Overview and Best Practices Page 16 of 40 Job Start This setting defines a sliding time window, in which the number of job start requests can not exceed the configured threshold. In the example above, it means in a 10 seconds sliding window, WLM will ensure no more than 100 jobs be started. Depending on your specific requirements and use cases, you can use one, two, three or all four settings concurrently to throttle the system. For instance, if you set CPU usage to 100%, then you effectively disable CPU check. If you set CPU to 0%, then you effectively put all incoming jobs into queues. Workload Management Server Overview and Best Practices Page 17 of 40 You can dynamically change system policies and click Apply to force WLM Server to apply new settings. There is no need to restart WLM Server, the new configuration is instantly applied and enforced. Queued Jobs Tab The queued jobs tab lists all of the available queues in the WLM Server instance, and the queuing information on those queues. In the example above, there are a number of queues on the system: Workload Management Server Overview and Best Practices Page 18 of 40 • DataWarehouseQueue • MyQueue • IA (Reserved queue) • ISD (Reserved queue) • DataClick (Reserved queue) • HighPriorityJobs • LowPriorityJobs • MediumPriorityJobs Reserved queues (IA, ISD, and DataClick) are special queues for IBM InfoSphere Information Analyzer, IBM InfoSphere Information Service Director, and IBM InfoSphere DataClick. They can not be deleted or modified. For each non-empty queue, the tab also lists the number of jobs, the name of the job, project name, process id of the job etc. A user with the DataStage Administrative role will see all jobs in the queue. A non-administrative user can only see jobs that belong to projects that they have access to. There is also a special link for active running jobs. Click this link to bring up a new page that shows a list of currently running jobs. Workload Management Server Overview and Best Practices Page 19 of 40 If you are the IBM Information Server administrator, you can also perform the following operations in this user interface: • Move jobs into different queues • Move jobs to the top of the queue • Remove jobs from a queue If you are a non-IBM Information Server administrator, you can remove a job from a queue if those jobs belong to projects that you have access to. Queue Management Tab The Queue management tab provides queue administration functions. Here you can select a priority rule, create new queues, modify existing queue properties or delete Workload Management Server Overview and Best Practices Page 20 of 40 queues that are no longer required. Note that in order to delete any queue, it must be empty. If there are jobs in it, the queue can not be deleted. The queue level policy for Job Count limits the number of jobs that were sent to that queue and can be concurrently running. If WLM Server successfully checks system level policies, then it will check queue level policy when determining job execution. To create a new queue, click New Queue. In the Queue Management - New Queue dialog box, enter a queue name, specify whether this queue will be treated as default queue, set queue priority, max running jobs on this queue and give a short description. Click Save to create this queue. Workload Management Server Overview and Best Practices Page 21 of 40 You can also modify the following queue properties of existing queues by double clicking cells in the queue management table: • Default • Priority setting • Max running jobs • Queue description After you have modified the settings, select Save to save the changes. Workload Management Server Overview and Best Practices Page 22 of 40 DataStage Administrator Client Integration When WLM is turned on, at the DataStage project level, you can define the default queue for each project. When users run DataStage jobs, if a queue is not specified, by default it will be routed to the project default queue. Workload Management Server Overview and Best Practices Page 23 of 40 DataStage Designer Client Integration When you run job from DataStage Designer, you will see an option to select which queue to run the job from: Workload Management Server Overview and Best Practices Page 24 of 40 DataStage Director Client Integration When you run job from DataStage Director, you will see an option to select which queue to run the job from: Workload Management Server Overview and Best Practices Page 25 of 40 dsjobs Integration You can pass -queue option when invoking the dsjob command to run a job. For example: C:\IBM\InformationServer\Clients\Classic\dsjob.exe -domain NONE user <user> -password <password> -server <Engine > -run -queue HighPriorityJobs -warn 0 -wait -jobstatus dstage1 MyTestJob In addition, you can also use the following command to query information regarding available queues: C:\work>C:\IBM\InformationServer\Clients\Classic\dsjob.exe domain NONE -user <user> -password <password> -server <Engine> lqueues LowPriorityJobs MediumPriorityJobs HighPriorityJobs WarehouseIntegrationQueue Workload Management Server Overview and Best Practices Page 26 of 40 Status code = 0 DataStage Operational Console Integration If you directly run jobs from the DataStage Operations Console, you will see an option to select which queue to run the job from: Workload Priority Rules There are three built in queue priority rules: Workload Management Server Overview and Best Practices Page 27 of 40 • Priority Weight • Job Run Ratio • Elapsed Time The following sections discuss the semantics of each rule and the specific use case. Priority Weight The priority of a job is derived from the priority of the queue it was submitted to and from the elapsed time since the job was submitted to the queue. This rule is the default rule. The priority weight offset is roughly 15 minutes. If three jobs are submitted to high, medium, and low priority queue at the same time, respectively, assuming enough resources are available, the medium priority job will start 15 minutes later than the high priority job; likewise, the low priority job will start 15 minutes later than the medium priority job. A high priority job submitted within this 15 minutes window will run before the medium priority job. You should select this rule if priority is important. In terms of resource allocation, a high priority queue should not take more resources than it actually needs. The rule here is to ensure high priority jobs are run as soon as possible. Having fewer high priority concurrent jobs helps achieve this goal as each job can get more physical resources such as CPU and memory. This approach also makes sure that when there are no high priority jobs, more resources can be utilized by medium priority jobs, or low priority jobs. This rule, if applied to the queues with the same priority, falls back to the ElapsedTime rule. Job Run Ratio If you select Job Run Ratio, you also need to specify the High to Medium and Medium to Low ratios. The priority of a job is derived from the priority of the queue it was submitted to. The ratios determine how many jobs are started from a high priority queue before a job is started from a medium priority queue, and how many jobs are started from a medium priority queue before a job is started from a low priority queue. Although JobRunRatio is designed to support different priorities, you can set this rule and assign the same priority to all queues. For example, the job run ratio 0:20:0 means that 20 jobs from medium priority queues can run concurrently assuming there are no high an low priority queues. This rule should be used if queued time and priority are not a concern. This rule should be considered if you want to maintain priorities and also balance jobs across queues. The queued time is no longer a factor in this rule. If there are multiple queues with the same priority, the job that calls back to WLM first gets the chance to run. For example, the job run ratio 5:2:1 means 5 high priority jobs, 2 medium priority jobs, and 1 low priority job. If there are 2 high priority queues, 3 medium priority queues, and 1 low priority queue, the 5 high priority jobs can come from one or both high priority Workload Management Server Overview and Best Practices Page 28 of 40 queues, the 2 medium priority jobs can come from one or two medium priority queues, all depending on which job calls back first. Please note that the job run ratio is defined based on medium priority queues. If all the medium priority queues are empty, jobs from high and low priority queues will run as soon as they call back until one of resource constraints is met. If low priority queues are empty, the High to Medium ratio still applies. Similarly, if high priority queues are empty, the Medium to Low ratio still applies. Elapsed Time The priority of a job is derived from the elapsed time since the job was submitted to the queue. This rule gives the highest priority to the job that was submitted first, irrespective of the queue it was submitted to. You can select ElapsedTime as the priority rule and assign the same default priority to all the queues in the system. A simply way to allocate resources is to evenly apply JobCount to all queues. You can determine JobCount per queue first then multiply it by the number of queues to determine JobCount for the entire system. This approach makes it easy to achieve fairness, but resources claimed by empty queues cannot be re-allocated even if other queues have pending jobs. Another way to allocate resources is to distribute more JobCount resources to some queues than others depending on the specific needs of a queue. This approach allows you to take into consideration the importance of a queue without having to change ElapsedTime to PriorityWeight. The ElapsedTime rule is based on the queued time only regardless of the priority of a queue. If you want to take priorities into consideration, you need to select PriorityWeight. WLM Server Security To access the WLM user interface, a user must have the DataStage Administrator or a DataStage project role (DataStage Operations Console viewer role or above). With DataStage Administrator role, the user can perform the following operations: • See all queues in WLM Server • See all jobs in all queues • Change system policies • Create, edit, or delete any queues except system reserved queues for IA, ISD and DataClick • Move jobs between different queues Workload Management Server Overview and Best Practices Page 29 of 40 With a DataStage project role, the user can perform the following operations: • See all queues • See jobs in the queue if the jobs are in the projects that current users have access right, otherwise, current users can only see a number indicating the total number of jobs in the queue but not specific job details • See system policies but can not update them • Can not move jobs to different queues or promote jobs to the top of the queue. Advanced Configurations WLM Configuration File WLM configuration file is located in /opt/IBM/InformationServer/Server/DSWLM/dist/lib/wlm.config.xml, this file is updated by WLM user interface, and it should be backed up whenever you backup InfoSphere Information Server. Advanced Error Handling Configuration If the default WLM behavior does not completely fit your needs, there are three additional variables you can tune to adjust the behavior of interactions between DataStage and WLM. C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg # The following allows a job to run outside of WLM if communication between the DataStage runtime and WLM failed. # A setting of 0 will stop the job if communication with the WLM failed. # A setting of 1 will not send the job to the WLM. It will run immediately. WLM_CONTINUE_ON_COMMS_ERROR=0 # The following sends a job to the default queue if the queue specified is no longer valid. # A setting of 0 will stop the job if the queue specified in invalid # A setting of 1 will send the job to the default WLM queue. Workload Management Server Overview and Best Practices Page 30 of 40 WLM_CONTINUE_ON_QUEUE_ERROR=0 # The following specifies the time a job will wait on the pending queue. # If this time has been exceeded, the job will be stopped and removed from the queue. # A value of 0 means do not time out. WLM_QUEUE_WAIT_TIMEOUT=0 uvconfig Tuning When jobs are in a running state or queuing state, they consume DSEngine lock resource. Specifically, DSD_RUN process will acquire the following locks: • RT_CONFIG lock to prevent others from compiling or deleting the current job • UV.ACCOUNT lock to prevent others from deleting the current project Default configuration in uvconfig can support up to 20 concurrent running jobs and 150 queuing jobs. If you need to run more jobs or queue more jobs, you may need to adjust uvconfig parameters. The following table lists some of the tested configuration from internal performance study: T30FILE RLTABSZ Concurrent Run Max Queue Size 512 300 20 150 1024 375 20 300 4096 480 20 1200 Troubleshooting WLM generates a trace file in the /opt/IBM/InformationServer/Server/DSWLM/logs folder. It will log to file on daily basis, and when the size of the log file exceeds 100MB, it will switch to next log file automatically. Workload Management Server Overview and Best Practices Page 31 of 40 If you need to trace how DataStage runtime sends jobs to WLM, you can turn on tracing from DataStage Administrator client. Select project name, click Properties, then switch to tracing tab and check the enabled option. Best Practice WLM provides flexibility for you to add and update queues in terms of priority and resource constraints in addition to system-level resource configuration capability. However, flexibility can lead to complexity if it is not clear how priority rules and resource policies can work together to better mange workloads. This section describes some use cases that can help demonstrate WLM queue management functionality. It starts with defining queues with the same priority but different criteria, then moves on to cover various scenarios for different priority rules. Defining queues based on different criteria The queues can be created based on different criteria as shown below: Job Characteristics In this scenario, job characteristic is used to define a queue, which can be either design or performance characteristic. Queues • ISD job queue • IA job queue • QS job queue • Data Bridge job queue • Data Connectivity job queue • Parallel job queue • Server job queue • Sequence job queue • CPU intensive job queue • IO intensive job queue • Large job queue with over 50 stages per job • Medium job queue with stages between 15 and 50 per job • Small job queue with stages fewer than 15 per job Workload Management Server Overview and Best Practices Page 32 of 40 Projects You can map queues to on-going projects, one project per queue. There could be one or multiple projects depending on business application of a group or multiple groups. Parallel Configurations You can balance system resources by defining queues based on parallel configuration. • Large parallel job queue where job runs on more than 4-node • Medium parallel job queue where job runs on 2-node • Small parallel job queue where job runs on 1-node JobCount resource is not evenly distributed; instead, more resources are allocated to 1node jobs than 4-node jobs as the 1-node job creates fewer processes. An example JobCount allocation could be: total JobCount is 10, 5 for 1-node jobs, 3 for 2-node jobs, and 2 for 4-node jobs. Development/Testing/Production Environments If a system is shared by development, testing, and production, you may want to create queues based on this sharing characteristic. • Development queue with jobs under development • Testing queue with jobs being tested • Production queue with jobs on production Similar to scenario 9.1.4, total JobCount is 20 with 10 for production, 5 for testing, and 5 for development. Dynamic Resource Update Resources can be adjusted dynamically among queues for specific application needs. In example 9.1.4, you can adjust queue-level JobCount to accommodate different schedules. Production jobs usually run overnight, so JobCount for production queue can be changed to 20 overnight and back to 0 during the day. Testing and development queues share the same system during the day. If there are 5 developers, you can set JobCount for development queue to 5, and leave 15 for testing queue. Development jobs tend to have frequent and quick runs, so development queue should have medium priority. If there is a given time window in the afternoon when intensive testing may occur, the user can change testing queue to high priority and change it back to low priority after scheduled testing is done. Without workload management, when testing workload gets heavy, developers may have to log out of the system to save resources for testing. Workload Management Server Overview and Best Practices Page 33 of 40 JobCount is a rough estimate for system resource utilization so it may cause the system overloaded if 10 CPU intensive jobs are running versus 10 I/O intensive jobs. You can decrease the value of JobCount to let fewer CPU intensive jobs run. However, you need to increase it back again if non-CPU intensive jobs are running. If it is not easy to tune system-level JobCount, you can keep JobCount as it is, and apply CPU cap, memory cap, or StartJob policy to prevent the system from being overloaded. CPU and memory caps are also designed for the system where InfoSphere Information Server needs to share physical resources with other application software. You can set a limit on how much CPU (or memory) Information Server can utilize and make sure other software application also gets enough physical resources to run on the same system. Dynamic Priority Rule Update Unlike resources, priority rules don’t need to be changed with queue updates. You set the priority rule up front before creating queues. PriorityWeight should be selected if both priority and queued time are important. ElapsedTime should be selected if queued time is important but priority is not. JobRunRatio should be selected if priority is important but queued time is not. However, you can dynamically update the priority rule if priority versus queued time changes. For example, if you have selected PriorityWeight, but there are too many medium or low priority jobs being queued, instead of moving those to high priority queues, you can switch PriorityWeight to ElapsedTime. Another example is if you have selected JobRunRatio, but it blocks due to low priority jobs, instead of changing all medium priority queues to high priority, you can switch JobRunRatio to PriorityWeight. Third example is if you have selected ElapsedTime, but it slows down high priority jobs, you can switch to JobRunRatio so that jobs from different priorities can all get chances to run. Use Cases This section describes some common ways you might want to restrict job runs on a system, and how you could achieve that with the WLM controls currently defined. The examples go from simple to more complex. Scenario 9.4.3 is an attempt to cover why you would set up more than one queue at the same priority. Scenario 9.4.3b shows when to use JobRunRatio which does not take queued time into consideration and does not emphasize on the order of job submission. You want no more than 20 jobs to run in the whole system, regardless of where they come from o o o Set up a single queue, with job count max = 20. Ensure System job count also = 20. Queue priority rule: irrelevant so leave as default 9.4.1b. You don't want to overstress the system because you know that when it runs at near capacity things start to fail. Workload Management Server Overview and Best Practices Page 34 of 40 Add: - Set system CPU limit and/or a memory limit and/or a "no more than N jobs in X seconds" policy. Note: This applies to all scenarios. You only want 20 jobs to run in the whole system, but some are higher priority than others, and need to go first when there is a spare slot. o o o o Set up 2 queues: High and Medium. Each queue has a job count max = 20 Ensure System job count also = 20. Queue priority rule: leave the default, i.e.Priority Weight, will mean that anything that has to wait on the High queue will get to run before anything on the Medium queue. 9.4.2b. You want to ensure that at least 2 high priority jobs can run at all times. As above, but: - High queue has a job count max = 20 - Medium queue has a job count max = 18 This means that even if there are lots of medium jobs, they will leave at least 2 system slots for high jobs if they appear. On the other hand, up to 20 high jobs could run, if there were that many, and medium jobs would have to wait. 9.4.2c. You don't want the Medium queue to be completely locked out when a lot of high jobs get submitted Add: - Queue priority rule: use ratio of High:Medium = 5:1, which means a Medium job will get a go every 5 High jobs, rather than get blocked. You have 4 application groups who submit jobs at random. You want to ensure that at any time that a group submits a job, they can get a slot to run it in - another group can't hog the system. - Set up 4 Medium queues, e.g. Q1, Q2, Q3, and Q4. - Set job count max for each queue = 5 - Ensure System job count = 20. Workload Management Server Overview and Best Practices Page 35 of 40 That means each group can occupy up to 5 slots. Whenever a group submits a job to their Q, they can always get up to 5 slots from the system. If 5 jobs are running from Q1, I can still submit jobs to Q2 and 5 of them will run, regardless of what is queued. The same applies to Q3 and Q4. 9.4.3b If there is spare capacity on the system, a group can use it; but if another group submits a job, they will get a slot as soon as possible - they will not be locked out just because the first group got in early. And, each group should have equal access to available slots, - Set Priority Rule to JobRunRatio with value of 0:20:0 - Set up 4 Medium queues, e.g. Q1, Q2, Q3, and Q4. - Set job count max for each queue = 20 - Ensure System job count = 20. That means that up to 20 jobs can be running from any queue. However when something turns up on another queue, that other queue can get in rather than having to wait for all older jobs to drain first. If you submit 400 jobs to Q1 at the start of the day, you will get to run 20 jobs at a time; if you submit a job to Q2 later on, you don’t have to wait until 381 jobs off Q1 have run before Q2 gets a look in. 9.4.3c. In conjunction with the above, you need a High priority system so that some jobs get preference over anything else that may be running. - Set Priority Rule to JobRunRatio with value of 20:0:20 - Set up 1 High queue, e.g. Q1 - Set up 3 Low queues, e.g. Q2, Q3, and Q4. - Set job count max for each queue = 20 - Ensure System job count = 20. Having a higher priority queue around means it ill take precedence when there is a free slot at system level. If the system limit has been reached, the High job will have to wait; but as soon as a job finished, the high queued job will start next. Workload Management Server Overview and Best Practices Page 36 of 40 Conclusion DataStage WLM allows proactive management of the system resources, especially when multiple teams share a common hardware infrastructure, it provides user intuitive methods to restrict number of concurrent job runs based on system policies and queue level policies, protects the critical server system from overloading based on real time CPU usage, memory usage and concurrent job run statistics. Workload Management Server Overview and Best Practices Page 37 of 40 Further reading • Information Management best practices: http://www.ibm.com/developerworks/data/bestpractices/ • IBM Information Server best practices: https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/ communityview?communityUuid=23355691-5fbb-4d69-bcd9-1dd5358daa45 Contributors Yong Li Development Manager – InfoSphere High Performance Engine Xiaoyan Pu Software Development Engineer Hanson Lieu Software Development Engineer Ron Liu InfoSphere Information Server Performance Len Greenwood Software Architect Ashley Holland DataStage Software Developer Workload Management Server Overview and Best Practices Page 38 of 40 Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. Without limiting the above disclaimers, IBM provides no representations or warranties regarding the accuracy, reliability or serviceability of any information or recommendations provided in this publication, or with respect to any results that may be obtained by the use of the information or observance of any recommendations provided herein. The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any recommendations or techniques herein is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environment does so at their own risk. This document and the information contained herein may be used solely in connection with the IBM products discussed in this document. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Workload Management Server Overview and Best Practices Page 39 of 40 Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: © Copyright IBM Corporation 2012, 2013. All Rights Reserved. This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml Windows is a trademark of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. Contacting IBM To contact IBM in your country or region, check the IBM Directory of Worldwide Contacts at http://www.ibm.com/planetwide To learn more about IBM Information Management products, go to http://www.ibm.com/software/data/ Workload Management Server Overview and Best Practices Page 40 of 40