...

Best practices ®

by user

on
Category: Documents
296

views

Report

Comments

Transcript

Best practices ®
IBM® InfoSphere Information Server
®
Best practices
Workload Management Server
Overview and Best Practice
Yong Li
Development Manager – InfoSphere High
Performance Engine
Xiaoyan Pu
Software Development Engineer
Hanson Lieu
Software Development Engineer
Ron Liu
InfoSphere Information Server Performance
Len Greenwood
Issued: January 2013
Software Architect
Ashley Holland
DataStage Software Developer
Workload Management Server Overview and Best Practices
Page 2 of 40
Introduction .......................................................................................................... 5
WLM Server Architecture................................................................................... 6
Installation and Configuration........................................................................... 6
Enable DataStage Operations Console and WLM Server ........................ 7
Unix Platform ......................................................................................................................................................7
Windows Platform..............................................................................................................................................7
Restart DataStage Server............................................................................... 8
Unix Platform ......................................................................................................................................................8
Windows Platform..............................................................................................................................................8
Start DataStage AppWatcher Process ....................................................... 10
Unix Platform ....................................................................................................................................................10
Windows Platform............................................................................................................................................12
WLM User Interface .......................................................................................... 12
System Policies ............................................................................................. 14
Job Count ...........................................................................................................................................................15
CPU Usage.........................................................................................................................................................15
Memory Usage ..................................................................................................................................................16
Job Start ..............................................................................................................................................................17
Queued Jobs Tab .......................................................................................... 18
Queue Management Tab............................................................................. 20
DataStage Administrator Client Integration............................................ 23
DataStage Designer Client Integration ..................................................... 24
DataStage Director Client Integration....................................................... 25
dsjobs Integration......................................................................................... 26
DataStage Operational Console Integration ............................................ 27
Workload Priority Rules ................................................................................... 27
Priority Weight ............................................................................................. 28
Job Run Ratio ................................................................................................ 28
Elapsed Time ................................................................................................ 29
WLM Server Security ........................................................................................ 29
Advanced Configurations ................................................................................ 30
WLM Configuration File............................................................................. 30
Workload Management Server Overview and Best Practices
Page 3 of 40
Advanced Error Handling Configuration................................................ 30
uvconfig Tuning ........................................................................................... 31
Troubleshooting ................................................................................................. 31
Best Practice ........................................................................................................ 32
Defining queues based on different criteria............................................. 32
Job Characteristics ............................................................................................................................................32
Projects ...............................................................................................................................................................33
Parallel Configurations ....................................................................................................................................33
Development/Testing/Production Environments ........................................................................................33
Dynamic Resource Update ......................................................................... 33
Dynamic Priority Rule Update .................................................................. 34
Use Cases....................................................................................................... 34
You want no more than 20 jobs to run in the whole system, regardless of where they come from......34
You have 4 application groups who submit jobs at random. you want to ensure that at any time that
a group submits a job, they can get a slot to run it in - another group can't hog the system. ................35
Conclusion .......................................................................................................... 37
Further reading................................................................................................... 38
Contributors.................................................................................................. 38
Notices ................................................................................................................. 39
Trademarks ................................................................................................... 40
Contacting IBM ............................................................................................ 40
Workload Management Server Overview and Best Practices
Page 4 of 40
Introduction
Many systems have a limit on one of the following resources:
ƒ
IO
ƒ
Memory
ƒ
CPU
ƒ
Network bandwidth
When the system is underutilized, it causes inefficiency. When the system is overloaded,
jobs often run into different problems. Common symptoms include network timeout,
slow job startup, job hang, and even job crash due to out of memory. Even if some jobs
run, they tend to run much slower than if they are properly scheduled due to excessive
system swapping (Context switching). IBM InfoSphere Information Server Workload
Management (WLM) solves this problem by regulating the workload execution
environment to maximize system throughput and maintain a stable and much more
predictable runtime environment.
WLM is a new server component available in IBM Information Server 9.1 release. This
server component is installed on the Engine tier. Its main role is to monitor the machine
resource usage such as CPU/Memory, keep tracking of workload (job) count, and
dynamically dispatch jobs for execution or place jobs into a queue and release the job for
execution when resource becomes available.
This article describes the WLM Server system architecture, installation and configuration
process, WLM user interface, workload dispatching rules and best practices to leverage
the power of WLM Server.
Workload Management Server Overview and Best Practices
Page 5 of 40
WLM Server Architecture
The following diagram shows the high level WLM Server architecture:
The WLM user interface is embedded as a tab in IBM InfoSphere Information Server
DataStage Operations Console user interface and can be accessed using the following
URL:
http://host:port/ibm/iis/ds/console
WLM user interface communicates with DataStage services running inside IBM
WebSphere Application Server via HTTP/REST APIs. DataStage services further
communicate with WLM Server process via ASBAgent proxy.
Installation and Configuration
WLM Server is installed as part of DataStage server installation.
Workload Management Server Overview and Best Practices
Page 6 of 40
By default, DataStage Operations Console and WLM Server are disabled. Follow the
following procedures to turn on DataStage Operations Console and WLM Server post
installation.
Note that the actual WLM binary is located in
/opt/IBM/InformationServer/Server/DSWLM.
Enable DataStage Operations Console and WLM Server
UNIX Platform
Login as dsadm, edit the DataStage Operations Console configuration file:
/opt/IBM/InformationServer/Server/DSODB/DSODBConfig.cfg
Change:
DSODBON=0
to:
DSODBON=1
This will enable the DataStage Operations Console feature.
Change:
WLMON=0
to:
WLMON=1
This will enable the WLM Server feature.
There are many other parameters that you can configure to further fine tune the behavior
of DataStage Operations Console and WLM Server. To get started, all you need is to turn
on DSODBON and WLMON.
Windows Platform
If your installation is on Windows server machine, then you can edit the
C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg setting
DSODBON=1
WLMON=1
Workload Management Server Overview and Best Practices
Page 7 of 40
To turn on DataStage Operations Console and WLM Server.
Restart DataStage Server
Unix Platform
First source the dsenv file
cd /opt/IBM/InformationServer/Server/DSEngine
. ./dsenv
Then stop and then start the DSEngine:
bin/uv -admin -stop
bin/uv -admin -start
Windows Platform
The first option is to bring up DataStage control panel applet:
Workload Management Server Overview and Best Practices
Page 8 of 40
Then click Stop All Services. And then click Start All Services again to restart the
DataStage Engine process.
You can also run the following script to stop and start DataStage Engine:
C:\work>net stop dstelnet
The DataStage Telnet Service is stopping.
The DataStage Telnet Service was stopped successfully.
Workload Management Server Overview and Best Practices
Page 9 of 40
C:\work>net stop dsengine
The InfoSphere Engine Resource Service is stopping............
The InfoSphere Engine Resource Service was stopped successfully.
C:\work>net stop dsrpc
The DSRPC Service is stopping.
The DSRPC Service was stopped successfully.
C:\work>net start dsrpc
The DSRPC Service is starting.
The DSRPC Service was started successfully.
C:\work>net start dsengine
The InfoSphere Engine Resource Service is starting.
The InfoSphere Engine Resource Service was started successfully.
C:\work>net start dstelnet
The DataStage Telnet Service is starting.
The DataStage Telnet Service was started successfully.
Once WLM Server is enabled in DSODBConfig.cfg file, and the DataStage server is
started, WLM will be started. When the DataStage server is stopped, WLM will be
stopped as well. You do not need to separately start or stop WLM.
Start DataStage AppWatcher Process
Unix Platform
Run the following command to check the correctness of the configuration file:
bash-3.2$ pwd
Workload Management Server Overview and Best Practices
Page 10 of 40
/opt/IBM/InformationServer/Server/DSODB
bash-3.2$ bin/DSAppWatcher.sh -test
DSODB is turned ON in the DSODBConfig.cfg file.
Link Monitoring is OFF.
Job Run Usage is ON.
Resource Monitoring is ON.
Checking Database Connection:
Driver: com.ibm.db2.jcc.DB2Driver
Connection URL: jdbc:db2://host:50000/xmeta
Successfully loaded the database driver.
Successfully connected to the database.
Schema: DSODB
DB Schema version number: 2
Test Successful.
bash-3.2$
Then run the following command to start it:
bash-3.2$ bin/DSAppWatcher.sh -start
AppWatcher:STARTED
EngMonApp:STARTING
ODBQueryApp:STARTING
ResMonApp:STARTING
bash-3.2$
Workload Management Server Overview and Best Practices
Page 11 of 40
Windows Platform
Run the following command to check correctness of configuration file:
C:\IBM\InformationServer\Server\DSODB>bin\DSAppWatcher.sh -test
DSODB is turned ON in the DSODBConfig.cfg file.
Link Monitoring is OFF.
Job Run Usage is ON.
Resource Monitoring is ON.
Checking Database Connection:
Driver: com.ibm.db2.jcc.DB2Driver
Connection URL: jdbc:db2://host:50000/xmeta
Successfully loaded the database driver.
Successfully connected to the database.
Schema: DSODB
DB Schema version number: 2
Test Successful.
Note that on Windows, DataStage Application Watcher process runs as Windows
service, to start it properly, run the following command line or use Windows services
applet to start it.
C:\IBM\InformationServer\Server\DSODB>net start IBMAPWSrv
The DataStage AppWatcher Service is starting.........
The DataStage AppWatcher Service was started successfully.
C:\IBM\InformationServer\Server\DSODB>
WLM User Interface
Once you have turned on DataStage Operations Console and WLM Server, you can log in
to DataStage Operations Console to review all available features in the Operations
Console and WLM.
Workload Management Server Overview and Best Practices
Page 12 of 40
When you first login to DataStage Operations Console application, you will see the
following:
In the Engine Status panel, wlmserver: OK indicates that WLM Server process is running
properly.
When you switch to the Workload Management tab, you see a screen similar to the
following screen:
Workload Management Server Overview and Best Practices
Page 13 of 40
This interface has two main sections:
•
System Policies setting
•
Queuing System (includes Queued Jobs and Queue Management)
System Policies
System policies define system wide settings for the WLM Server instance. When
determining whether to dispatch jobs for immediate execution, or put a job run request
into the queuing system, WLM first checks System Policies before it checks queue level
policies.
Workload Management Server Overview and Best Practices
Page 14 of 40
Job Count
This setting controls how many concurrent running jobs are allowed on this server. You
can adjust Job Count depending on the capacity of the system and the characteristics of
the job. The default value for this setting is 20. You can increase it if the server has many
CPU cores and a large amount of memory, or decrease it otherwise;
Click Currently Running to open the Job Activity graph.
CPU Usage
When WLM detects that system CPU usage exceeds this configured threshold, it will
place incoming job run requests into queue. The default value for this setting is 80%.
Workload Management Server Overview and Best Practices
Page 15 of 40
Click Current CPU (%) to open the CPU Usage graph.
Memory Usage
When WLM detects that memory usage exceeds this configured threshold, it will place
incoming job run request into queue. The default value for this setting is 80%.
Click Current Memory (%) to open the Memory Usage graph.
Workload Management Server Overview and Best Practices
Page 16 of 40
Job Start
This setting defines a sliding time window, in which the number of job start requests can
not exceed the configured threshold. In the example above, it means in a 10 seconds
sliding window, WLM will ensure no more than 100 jobs be started.
Depending on your specific requirements and use cases, you can use one, two, three or
all four settings concurrently to throttle the system. For instance, if you set CPU usage to
100%, then you effectively disable CPU check. If you set CPU to 0%, then you effectively
put all incoming jobs into queues.
Workload Management Server Overview and Best Practices
Page 17 of 40
You can dynamically change system policies and click Apply to force WLM Server to
apply new settings. There is no need to restart WLM Server, the new configuration is
instantly applied and enforced.
Queued Jobs Tab
The queued jobs tab lists all of the available queues in the WLM Server instance, and the
queuing information on those queues.
In the example above, there are a number of queues on the system:
Workload Management Server Overview and Best Practices
Page 18 of 40
•
DataWarehouseQueue
•
MyQueue
•
IA (Reserved queue)
•
ISD (Reserved queue)
•
DataClick (Reserved queue)
•
HighPriorityJobs
•
LowPriorityJobs
•
MediumPriorityJobs
Reserved queues (IA, ISD, and DataClick) are special queues for IBM InfoSphere
Information Analyzer, IBM InfoSphere Information Service Director, and IBM InfoSphere
DataClick. They can not be deleted or modified.
For each non-empty queue, the tab also lists the number of jobs, the name of the job,
project name, process id of the job etc. A user with the DataStage Administrative role will
see all jobs in the queue. A non-administrative user can only see jobs that belong to
projects that they have access to.
There is also a special link for active running jobs. Click this link to bring up a new page
that shows a list of currently running jobs.
Workload Management Server Overview and Best Practices
Page 19 of 40
If you are the IBM Information Server administrator, you can also perform the following
operations in this user interface:
•
Move jobs into different queues
•
Move jobs to the top of the queue
•
Remove jobs from a queue
If you are a non-IBM Information Server administrator, you can remove a job from a
queue if those jobs belong to projects that you have access to.
Queue Management Tab
The Queue management tab provides queue administration functions. Here you can
select a priority rule, create new queues, modify existing queue properties or delete
Workload Management Server Overview and Best Practices
Page 20 of 40
queues that are no longer required. Note that in order to delete any queue, it must be
empty. If there are jobs in it, the queue can not be deleted.
The queue level policy for Job Count limits the number of jobs that were sent to that
queue and can be concurrently running. If WLM Server successfully checks system level
policies, then it will check queue level policy when determining job execution.
To create a new queue, click New Queue. In the Queue Management - New Queue
dialog box, enter a queue name, specify whether this queue will be treated as default
queue, set queue priority, max running jobs on this queue and give a short description.
Click Save to create this queue.
Workload Management Server Overview and Best Practices
Page 21 of 40
You can also modify the following queue properties of existing queues by double
clicking cells in the queue management table:
•
Default
•
Priority setting
•
Max running jobs
•
Queue description
After you have modified the settings, select Save to save the changes.
Workload Management Server Overview and Best Practices
Page 22 of 40
DataStage Administrator Client Integration
When WLM is turned on, at the DataStage project level, you can define the default queue
for each project. When users run DataStage jobs, if a queue is not specified, by default it
will be routed to the project default queue.
Workload Management Server Overview and Best Practices
Page 23 of 40
DataStage Designer Client Integration
When you run job from DataStage Designer, you will see an option to select which queue
to run the job from:
Workload Management Server Overview and Best Practices
Page 24 of 40
DataStage Director Client Integration
When you run job from DataStage Director, you will see an option to select which queue
to run the job from:
Workload Management Server Overview and Best Practices
Page 25 of 40
dsjobs Integration
You can pass -queue option when invoking the dsjob command to run a job. For
example:
C:\IBM\InformationServer\Clients\Classic\dsjob.exe -domain NONE user <user> -password <password> -server <Engine > -run -queue
HighPriorityJobs -warn 0 -wait -jobstatus dstage1 MyTestJob
In addition, you can also use the following command to query information regarding
available queues:
C:\work>C:\IBM\InformationServer\Clients\Classic\dsjob.exe domain NONE -user <user> -password <password> -server <Engine> lqueues
LowPriorityJobs
MediumPriorityJobs
HighPriorityJobs
WarehouseIntegrationQueue
Workload Management Server Overview and Best Practices
Page 26 of 40
Status code = 0
DataStage Operational Console Integration
If you directly run jobs from the DataStage Operations Console, you will see an option to
select which queue to run the job from:
Workload Priority Rules
There are three built in queue priority rules:
Workload Management Server Overview and Best Practices
Page 27 of 40
•
Priority Weight
•
Job Run Ratio
•
Elapsed Time
The following sections discuss the semantics of each rule and the specific use case.
Priority Weight
The priority of a job is derived from the priority of the queue it was submitted to and
from the elapsed time since the job was submitted to the queue. This rule is the default
rule. The priority weight offset is roughly 15 minutes. If three jobs are submitted to high,
medium, and low priority queue at the same time, respectively, assuming enough
resources are available, the medium priority job will start 15 minutes later than the high
priority job; likewise, the low priority job will start 15 minutes later than the medium
priority job. A high priority job submitted within this 15 minutes window will run before
the medium priority job.
You should select this rule if priority is important. In terms of resource allocation, a high
priority queue should not take more resources than it actually needs. The rule here is to
ensure high priority jobs are run as soon as possible. Having fewer high priority
concurrent jobs helps achieve this goal as each job can get more physical resources such
as CPU and memory. This approach also makes sure that when there are no high priority
jobs, more resources can be utilized by medium priority jobs, or low priority jobs.
This rule, if applied to the queues with the same priority, falls back to the ElapsedTime
rule.
Job Run Ratio
If you select Job Run Ratio, you also need to specify the High to Medium and Medium to
Low ratios. The priority of a job is derived from the priority of the queue it was
submitted to. The ratios determine how many jobs are started from a high priority queue
before a job is started from a medium priority queue, and how many jobs are started
from a medium priority queue before a job is started from a low priority queue.
Although JobRunRatio is designed to support different priorities, you can set this rule
and assign the same priority to all queues. For example, the job run ratio 0:20:0 means
that 20 jobs from medium priority queues can run concurrently assuming there are no
high an low priority queues. This rule should be used if queued time and priority are not
a concern.
This rule should be considered if you want to maintain priorities and also balance jobs
across queues. The queued time is no longer a factor in this rule. If there are multiple
queues with the same priority, the job that calls back to WLM first gets the chance to run.
For example, the job run ratio 5:2:1 means 5 high priority jobs, 2 medium priority jobs,
and 1 low priority job. If there are 2 high priority queues, 3 medium priority queues, and
1 low priority queue, the 5 high priority jobs can come from one or both high priority
Workload Management Server Overview and Best Practices
Page 28 of 40
queues, the 2 medium priority jobs can come from one or two medium priority queues,
all depending on which job calls back first. Please note that the job run ratio is defined
based on medium priority queues. If all the medium priority queues are empty, jobs from
high and low priority queues will run as soon as they call back until one of resource
constraints is met. If low priority queues are empty, the High to Medium ratio still
applies. Similarly, if high priority queues are empty, the Medium to Low ratio still
applies.
Elapsed Time
The priority of a job is derived from the elapsed time since the job was submitted to the
queue. This rule gives the highest priority to the job that was submitted first, irrespective
of the queue it was submitted to.
You can select ElapsedTime as the priority rule and assign the same default priority to all
the queues in the system. A simply way to allocate resources is to evenly apply JobCount
to all queues. You can determine JobCount per queue first then multiply it by the number
of queues to determine JobCount for the entire system. This approach makes it easy to
achieve fairness, but resources claimed by empty queues cannot be re-allocated even if
other queues have pending jobs.
Another way to allocate resources is to distribute more JobCount resources to some
queues than others depending on the specific needs of a queue. This approach allows
you to take into consideration the importance of a queue without having to change
ElapsedTime to PriorityWeight.
The ElapsedTime rule is based on the queued time only regardless of the priority of a
queue. If you want to take priorities into consideration, you need to select
PriorityWeight.
WLM Server Security
To access the WLM user interface, a user must have the DataStage Administrator or a
DataStage project role (DataStage Operations Console viewer role or above).
With DataStage Administrator role, the user can perform the following operations:
•
See all queues in WLM Server
•
See all jobs in all queues
•
Change system policies
•
Create, edit, or delete any queues except system reserved queues for IA, ISD
and DataClick
•
Move jobs between different queues
Workload Management Server Overview and Best Practices
Page 29 of 40
With a DataStage project role, the user can perform the following operations:
•
See all queues
•
See jobs in the queue if the jobs are in the projects that current users have
access right, otherwise, current users can only see a number indicating the total
number of jobs in the queue but not specific job details
•
See system policies but can not update them
•
Can not move jobs to different queues or promote jobs to the top of the queue.
Advanced Configurations
WLM Configuration File
WLM configuration file is located in
/opt/IBM/InformationServer/Server/DSWLM/dist/lib/wlm.config.xml, this file is updated
by WLM user interface, and it should be backed up whenever you backup InfoSphere
Information Server.
Advanced Error Handling Configuration
If the default WLM behavior does not completely fit your needs, there are three
additional variables you can tune to adjust the behavior of interactions between
DataStage and WLM.
C:\IBM\InformationServer\Server\DSODB\DSODBConfig.cfg
# The following allows a job to run outside of WLM if communication between the
DataStage runtime and WLM failed.
# A setting of 0 will stop the job if communication with the WLM failed.
# A setting of 1 will not send the job to the WLM. It will run immediately.
WLM_CONTINUE_ON_COMMS_ERROR=0
# The following sends a job to the default queue if the queue specified is no longer valid.
# A setting of 0 will stop the job if the queue specified in invalid
# A setting of 1 will send the job to the default WLM queue.
Workload Management Server Overview and Best Practices
Page 30 of 40
WLM_CONTINUE_ON_QUEUE_ERROR=0
# The following specifies the time a job will wait on the pending queue.
# If this time has been exceeded, the job will be stopped and removed from the queue.
# A value of 0 means do not time out.
WLM_QUEUE_WAIT_TIMEOUT=0
uvconfig Tuning
When jobs are in a running state or queuing state, they consume DSEngine lock resource.
Specifically, DSD_RUN process will acquire the following locks:
•
RT_CONFIG lock to prevent others from compiling or deleting the current job
•
UV.ACCOUNT lock to prevent others from deleting the current project
Default configuration in uvconfig can support up to 20 concurrent running jobs and 150
queuing jobs. If you need to run more jobs or queue more jobs, you may need to adjust
uvconfig parameters.
The following table lists some of the tested configuration from internal performance
study:
T30FILE
RLTABSZ
Concurrent Run
Max Queue Size
512
300
20
150
1024
375
20
300
4096
480
20
1200
Troubleshooting
WLM generates a trace file in the /opt/IBM/InformationServer/Server/DSWLM/logs
folder. It will log to file on daily basis, and when the size of the log file exceeds 100MB, it
will switch to next log file automatically.
Workload Management Server Overview and Best Practices
Page 31 of 40
If you need to trace how DataStage runtime sends jobs to WLM, you can turn on tracing
from DataStage Administrator client. Select project name, click Properties, then switch to
tracing tab and check the enabled option.
Best Practice
WLM provides flexibility for you to add and update queues in terms of priority and
resource constraints in addition to system-level resource configuration capability.
However, flexibility can lead to complexity if it is not clear how priority rules and
resource policies can work together to better mange workloads.
This section describes some use cases that can help demonstrate WLM queue
management functionality. It starts with defining queues with the same priority but
different criteria, then moves on to cover various scenarios for different priority rules.
Defining queues based on different criteria
The queues can be created based on different criteria as shown below:
Job Characteristics
In this scenario, job characteristic is used to define a queue, which can be either design or
performance characteristic.
Queues
•
ISD job queue
•
IA job queue
•
QS job queue
•
Data Bridge job queue
•
Data Connectivity job queue
•
Parallel job queue
•
Server job queue
•
Sequence job queue
•
CPU intensive job queue
•
IO intensive job queue
•
Large job queue with over 50 stages per job
•
Medium job queue with stages between 15 and 50 per job
•
Small job queue with stages fewer than 15 per job
Workload Management Server Overview and Best Practices
Page 32 of 40
Projects
You can map queues to on-going projects, one project per queue. There could be one or
multiple projects depending on business application of a group or multiple groups.
Parallel Configurations
You can balance system resources by defining queues based on parallel configuration.
•
Large parallel job queue where job runs on more than 4-node
•
Medium parallel job queue where job runs on 2-node
•
Small parallel job queue where job runs on 1-node
JobCount resource is not evenly distributed; instead, more resources are allocated to 1node jobs than 4-node jobs as the 1-node job creates fewer processes. An example
JobCount allocation could be: total JobCount is 10, 5 for 1-node jobs, 3 for 2-node jobs,
and 2 for 4-node jobs.
Development/Testing/Production Environments
If a system is shared by development, testing, and production, you may want to create
queues based on this sharing characteristic.
•
Development queue with jobs under development
•
Testing queue with jobs being tested
•
Production queue with jobs on production
Similar to scenario 9.1.4, total JobCount is 20 with 10 for production, 5 for testing, and 5
for development.
Dynamic Resource Update
Resources can be adjusted dynamically among queues for specific application needs. In
example 9.1.4, you can adjust queue-level JobCount to accommodate different schedules.
Production jobs usually run overnight, so JobCount for production queue can be changed
to 20 overnight and back to 0 during the day. Testing and development queues share the
same system during the day. If there are 5 developers, you can set JobCount for
development queue to 5, and leave 15 for testing queue. Development jobs tend to have
frequent and quick runs, so development queue should have medium priority. If there is
a given time window in the afternoon when intensive testing may occur, the user can
change testing queue to high priority and change it back to low priority after scheduled
testing is done. Without workload management, when testing workload gets heavy,
developers may have to log out of the system to save resources for testing.
Workload Management Server Overview and Best Practices
Page 33 of 40
JobCount is a rough estimate for system resource utilization so it may cause the system
overloaded if 10 CPU intensive jobs are running versus 10 I/O intensive jobs. You can
decrease the value of JobCount to let fewer CPU intensive jobs run. However, you need
to increase it back again if non-CPU intensive jobs are running. If it is not easy to tune
system-level JobCount, you can keep JobCount as it is, and apply CPU cap, memory cap,
or StartJob policy to prevent the system from being overloaded.
CPU and memory caps are also designed for the system where InfoSphere Information
Server needs to share physical resources with other application software. You can set a
limit on how much CPU (or memory) Information Server can utilize and make sure other
software application also gets enough physical resources to run on the same system.
Dynamic Priority Rule Update
Unlike resources, priority rules don’t need to be changed with queue updates. You set
the priority rule up front before creating queues. PriorityWeight should be selected if
both priority and queued time are important. ElapsedTime should be selected if queued
time is important but priority is not. JobRunRatio should be selected if priority is
important but queued time is not. However, you can dynamically update the priority
rule if priority versus queued time changes. For example, if you have selected
PriorityWeight, but there are too many medium or low priority jobs being queued,
instead of moving those to high priority queues, you can switch PriorityWeight to
ElapsedTime. Another example is if you have selected JobRunRatio, but it blocks due to
low priority jobs, instead of changing all medium priority queues to high priority, you
can switch JobRunRatio to PriorityWeight. Third example is if you have selected
ElapsedTime, but it slows down high priority jobs, you can switch to JobRunRatio so that
jobs from different priorities can all get chances to run.
Use Cases
This section describes some common ways you might want to restrict job runs on a
system, and how you could achieve that with the WLM controls currently defined. The
examples go from simple to more complex.
Scenario 9.4.3 is an attempt to cover why you would set up more than one queue at the
same priority. Scenario 9.4.3b shows when to use JobRunRatio which does not take
queued time into consideration and does not emphasize on the order of job submission.
You want no more than 20 jobs to run in the whole system,
regardless of where they come from
o
o
o
Set up a single queue, with job count max = 20.
Ensure System job count also = 20.
Queue priority rule: irrelevant so leave as default
9.4.1b. You don't want to overstress the system because you know that when it runs at
near capacity things start to fail.
Workload Management Server Overview and Best Practices
Page 34 of 40
Add:
- Set system CPU limit and/or a memory limit and/or a "no more than N jobs in X
seconds" policy.
Note: This applies to all scenarios.
You only want 20 jobs to run in the whole system, but some are
higher priority than others, and need to go first when there is a
spare slot.
o
o
o
o
Set up 2 queues: High and Medium.
Each queue has a job count max = 20
Ensure System job count also = 20.
Queue priority rule: leave the default, i.e.Priority Weight, will mean that
anything that has to wait on the High queue will get to run before
anything on the Medium queue.
9.4.2b. You want to ensure that at least 2 high priority jobs can run at all times.
As above, but:
- High queue has a job count max = 20
- Medium queue has a job count max = 18
This means that even if there are lots of medium jobs, they will leave at least 2 system
slots for high jobs if they appear. On the other hand, up to 20 high jobs could run, if there
were that many, and medium jobs would have to wait.
9.4.2c. You don't want the Medium queue to be completely locked out when a lot of
high jobs get submitted
Add:
- Queue priority rule: use ratio of High:Medium = 5:1, which means a Medium job will
get a go every 5 High jobs, rather than get blocked.
You have 4 application groups who submit jobs at random. You
want to ensure that at any time that a group submits a job, they
can get a slot to run it in - another group can't hog the system.
- Set up 4 Medium queues, e.g. Q1, Q2, Q3, and Q4.
- Set job count max for each queue = 5
- Ensure System job count = 20.
Workload Management Server Overview and Best Practices
Page 35 of 40
That means each group can occupy up to 5 slots. Whenever a group submits a job to their
Q, they can always get up to 5 slots from the system. If 5 jobs are running from Q1, I can
still submit jobs to Q2 and 5 of them will run, regardless of what is queued. The same
applies to Q3 and Q4.
9.4.3b If there is spare capacity on the system, a group can use it; but if another group
submits a job, they will get a slot as soon as possible - they will not be locked out just
because the first group got in early. And, each group should have equal access to
available slots,
- Set Priority Rule to JobRunRatio with value of 0:20:0
- Set up 4 Medium queues, e.g. Q1, Q2, Q3, and Q4.
- Set job count max for each queue = 20
- Ensure System job count = 20.
That means that up to 20 jobs can be running from any queue. However when something
turns up on another queue, that other queue can get in rather than having to wait for all
older jobs to drain first. If you submit 400 jobs to Q1 at the start of the day, you will get to
run 20 jobs at a time; if you submit a job to Q2 later on, you don’t have to wait until 381
jobs off Q1 have run before Q2 gets a look in.
9.4.3c. In conjunction with the above, you need a High priority system so that some
jobs get preference over anything else that may be running.
- Set Priority Rule to JobRunRatio with value of 20:0:20
- Set up 1 High queue, e.g. Q1
- Set up 3 Low queues, e.g. Q2, Q3, and Q4.
- Set job count max for each queue = 20
- Ensure System job count = 20.
Having a higher priority queue around means it ill take precedence when there is a free
slot at system level. If the system limit has been reached, the High job will have to wait;
but as soon as a job finished, the high queued job will start next.
Workload Management Server Overview and Best Practices
Page 36 of 40
Conclusion
DataStage WLM allows proactive management of the system resources, especially when
multiple teams share a common hardware infrastructure, it provides user intuitive
methods to restrict number of concurrent job runs based on system policies and queue
level policies, protects the critical server system from overloading based on real time
CPU usage, memory usage and concurrent job run statistics.
Workload Management Server Overview and Best Practices
Page 37 of 40
Further reading
•
Information Management best practices:
http://www.ibm.com/developerworks/data/bestpractices/
•
IBM Information Server best practices:
https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/
communityview?communityUuid=23355691-5fbb-4d69-bcd9-1dd5358daa45
Contributors
Yong Li
Development Manager – InfoSphere High
Performance Engine
Xiaoyan Pu
Software Development Engineer
Hanson Lieu
Software Development Engineer
Ron Liu
InfoSphere Information Server Performance
Len Greenwood
Software Architect
Ashley Holland
DataStage Software Developer
Workload Management Server Overview and Best Practices
Page 38 of 40
Notices
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other
countries. Consult your local IBM representative for information on the products and services
currently available in your area. Any reference to an IBM product, program, or service is not
intended to state or imply that only that IBM product, program, or service may be used. Any
functionally equivalent product, program, or service that does not infringe any IBM
intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in
this document. The furnishing of this document does not grant you any license to these
patents. You can send license inquiries, in writing, to:
IBM Director of Licensing
IBM Corporation
North Castle Drive
Armonk, NY 10504-1785
U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where
such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES
CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER
EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do
not allow disclaimer of express or implied warranties in certain transactions, therefore, this
statement may not apply to you.
Without limiting the above disclaimers, IBM provides no representations or warranties
regarding the accuracy, reliability or serviceability of any information or recommendations
provided in this publication, or with respect to any results that may be obtained by the use of
the information or observance of any recommendations provided herein. The information
contained in this document has not been submitted to any formal IBM test and is distributed
AS IS. The use of this information or the implementation of any recommendations or
techniques herein is a customer responsibility and depends on the customer’s ability to
evaluate and integrate them into the customer’s operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee
that the same or similar results will be obtained elsewhere. Anyone attempting to adapt
these techniques to their own environment does so at their own risk.
This document and the information contained herein may be used solely in connection with
the IBM products discussed in this document.
This information could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in new
editions of the publication. IBM may make improvements and/or changes in the product(s)
and/or the program(s) described in this publication at any time without notice.
Any references in this information to non-IBM websites are provided for convenience only
and do not in any manner serve as an endorsement of those websites. The materials at those
websites are not part of the materials for this IBM product and use of those websites is at your
own risk.
IBM may use or distribute any of the information you supply in any way it believes
appropriate without incurring any obligation to you.
Any performance data contained herein was determined in a controlled environment.
Therefore, the results obtained in other operating environments may vary significantly. Some
measurements may have been made on development-level systems and there is no
guarantee that these measurements will be the same on generally available systems.
Furthermore, some measurements may have been estimated through extrapolation. Actual
results may vary. Users of this document should verify the applicable data for their specific
environment.
Workload Management Server Overview and Best Practices
Page 39 of 40
Information concerning non-IBM products was obtained from the suppliers of those products,
their published announcements or other publicly available sources. IBM has not tested those
products and cannot confirm the accuracy of performance, compatibility or any other
claims related to non-IBM products. Questions on the capabilities of non-IBM products should
be addressed to the suppliers of those products.
All statements regarding IBM's future direction or intent are subject to change or withdrawal
without notice, and represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To
illustrate them as completely as possible, the examples include the names of individuals,
companies, brands, and products. All of these names are fictitious and any similarity to the
names and addresses used by an actual business enterprise is entirely coincidental.
COPYRIGHT LICENSE: © Copyright IBM Corporation 2012, 2013. All Rights Reserved.
This information contains sample application programs in source language, which illustrate
programming techniques on various operating platforms. You may copy, modify, and
distribute these sample programs in any form without payment to IBM, for the purposes of
developing, using, marketing or distributing application programs conforming to the
application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions.
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these
programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and
other IBM trademarked terms are marked on their first occurrence in this information with a
trademark symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published. Such trademarks may
also be registered or common law trademarks in other countries. A current list of IBM
trademarks is available on the Web at “Copyright and trademark information” at
www.ibm.com/legal/copytrade.shtml
Windows is a trademark of Microsoft Corporation in the United States, other countries, or
both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
Contacting IBM
To contact IBM in your country or region, check the IBM Directory of Worldwide
Contacts at http://www.ibm.com/planetwide
To learn more about IBM Information Management products, go to
http://www.ibm.com/software/data/
Workload Management Server Overview and Best Practices
Page 40 of 40
Fly UP