...

Introduction to troubleshooting InfoSphere MDM Collaboration Server (MDMCS) performance issues Stefan Reil

by user

on
Category: Documents
15

views

Report

Comments

Transcript

Introduction to troubleshooting InfoSphere MDM Collaboration Server (MDMCS) performance issues Stefan Reil
Stefan Reil
24.11.2015
Introduction to troubleshooting
InfoSphere MDM Collaboration Server
(MDMCS) performance issues
© 2009 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Agenda
 Architectural overview
 Tools for performance troubleshooting and MDMCS SQL logging mechanism
 Locate the performance bottleneck in component stack
 Client-side checks
 Server-side checks
 Database Server-side checks
2
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Architectural overview
A short architectural overview is provided, as it will help to
understand approaches to narrow down performance bottlenecks.
MDMCS includes a web-based application with a three-tier
architecture that consists of:
 A web-based user interface for rendering PIM content in a
browser, including static, dynamic or cached data in the client
side. This tier runs on a web browser (Microsoft Internet Explorer)
 A middle tier with the functional modules and services that
process user requests and data on the central MDMCS server
 A database management system (DBMS) that stores the data
that is required by the middle tier. This tier runs on a database
server (DB2 / Oracle).
(Usually Application and Database Server are separate physical
servers, but can be run on one physical server as well)
3
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Architectural overview (cont.)
The MDMCS application (middle tier) consists of 6 services (implemented as separate JVM
processes), where each one is responsible for specific tasks
 Appsvr Service for UI event processing
 Workflow Engine Service processes workflow events
that are posted to the database
 Scheduler Service runs all scheduled jobs in the
background (imports, exports, reports)
 Admin Service starts and stops the services
 Eventprocessor Service is the event dispatcher
between all the product modules
 Queuemanager Service to send documents outside of
MDMCS
Each service logs debugging or error messages in its own directory under $TOP/logs in a
directory with name “servicename_hostname” e.g. appsvr?_<hostname>
4
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Architectural overview (cont.)
5
relevant for scheduled jobs,
workflowprocessing
relevant for UI interactions
Runtime
Accumulation
Layered view of main performance impacting components
User Experience
Browser (IE)
OS / Hardware
IE version & settings, JavaScript logic
HW resources (CPU, memory)
Network
Network speed and latency
WAS
MDMCS Services /
Custom Code
OS / Hardware
Configuration settings, complexity of
data model, custom code logic,
concurrent users, concurrent jobs
HW resources (CPU, memory,disk I/O)
Network
Network speed and latency
DB2 / Oracle
OS / Hardware
Statistics, bufferpools, indexes,
fragmentation, old version data
HW resources (CPU, memory, disk I/O)
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Architectural overview
Exemplified information flow for simple UI interaction use case: “Open item in UI”
An http request is send via network from
client to “MDMCS server” to display item
Appsvr_hostname service retrieves all
required base information (e.g. spec
definitions, item details, container scripts
…) from database (if not cached already)
using SQL statements. Number and
complexity of triggered SQLs depends on
data model and custom code complexity.
The database server processes the SQL
requests. SQL runtime highly affected by
up to dateness of statistics, memory
(bufferpools), indexes, data
fragmentation, old version data ...
6
Web browser renders the received data.
Depending on complexity of result page
and java script code this might be
resource intensive and highly affected by
IE version and hardware resources
Appsvr_hostname service processes the
received data, applies custom code logic
(e.g. entry build scripts, value rules ...),
generates the response and sends it
back to client
SQL results are send back to MDMCS
server
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Some common tools to diagnose performance issues
User Experience
Browser (IE)
OS / Hardware
Stop watch
Windows Task Manager, HTTP tracing tools
e.g. Fiddler, IE Developer Tool (F12)
Windows resource monitor
Network
Netstat, traceroute
OS / Hardware
Sumall; PIM Cache Snapshot; Custom code
debug statements; Deactivate container scripts;
JVM verbose gc; IBM Monitoring and Diagnostic
Tools for Java, ...
Nmon, vmstat ...
Network
netstat, traceroute
DB2 / Oracle
OS / Hardware
Db2 clp, db2pd, snapshots / sqlplus, AWR ...
WAS
MDMCS Services /
Custom Code
7
Nmon, vmstat ...
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Background information: MDMCS SQL logging mechanism
MDMCS services enabled to log triggered SQL queries along with its runtime in db.log files
Configuration settings influence logging behavior:
 Logging level as defined in $TOP/etc/default/log.xml
<appender name="DB" class="org.apache.log4j.RollingFileAppender">
<param name="File"
value="/opt/envs/..../logs/${svc_name}/db.log" />
<param name="Append" value="true" />
<param name="maxFileSize" value="10MB" />
<param name="maxBackupIndex" value="2" />
<param name="encoding" value="UTF-8" />
<layout class="org.apache.log4j.PatternLayout">
<param name="ConversionPattern" value="%d [%t] %-5p %c %x- %m%n"/>
</layout>
</appender>
<category name="com.ibm.ccd.common.db" additivity="false">
<priority value="debug" />
<appender-ref ref="DB" />
</category>
DB
DBappender
appenderisisused
usedfor
for
SQL
tracking
SQL tracking
Defines
Definessize
sizeand
andnumber
number
of
oflog
logfiles
files
Defines
Defineslevel
levelof
oflogging:
logging:
--“debug”
mode
“debug” modeto
totrack
trackall
allSQLs
SQLs
--“info”
“info”to
totrack
trackdelayed
delayedqueries
queriesonly
only
 Delayed SQL threshold value as defined in $TOP/etc/default/common.properties
profiling_query_delay_threshold= <milliseconds>
It defines a runtime threshold, at which queries are marked as "delayed queries".
When using logging level info, only such “delayed queries” will be tracked.
8
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Background information: MDMCS SQL logging mechanism
Note:
 Using debug mode might impact performance, but should be less than 5% (unless IO is very
slow)
 profiling_query_delay_threshold setting is ambiguous.
Always use debug mode when investigating performance issues to track all SQLs
 Run times tracked in the db.log files do not represent the SQL runtime on DB server side.
It reflects the time-span between when the SQL was triggered by MDMCS service and when
the service was able to pick up the result.
Logged runtimes might be higher than actual runtimes on DB server due to:
- slow network
- constrained CPU and thread resources on MDMCS server side.
When in doubt, test runtime of selected SQLs directly on DB server side to check extra delay
9
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Background information: MDMCS SQL logging mechanism
Sample db.log content
2015-03-24 08:43:06,360 [jsp_48: classicornew.wpc] DEBUG com.ibm.ccd.common.db.Query - Time: (2453 ms)
Dynamic Query [select itd_item_id, itd_blob from itd, ( select itm_id from ( select * from (
SELECT itm_id
FROM tctg_itm_item
WHERE itm_company_id = 30001
Runtime
Runtimeof
ofquery
queryin
in
Name
Nameof
ofthe
thethread
threadwhich
which
AND itm_container_id = 25206
milliseconds
milliseconds
triggered
triggeredthe
thequery
query
AND itm_version_id <= 999999999
AND itm_next_version_id >= 999999999
) AS OM98 fetch first 857 rows only ) AS DUMMY_ALIAS_01 ) AS DUMMY_ALIAS_02 where itd_item_id = itm_id and
itd_next_version_id = 999999999 order by itd_item_id]
2015-03-24 08:43:06,360 [jsp_48: classicornew.wpc] WARN com.ibm.ccd.common.db.Query - CWPCM0018W:The query
is delayed: ( 2,453 ms)\n Dynamic Query [select itd_item_id, itd_blob from itd, ( select itm_id from ( select
* from (
SELECT itm_id
The
Thequery
querywill
willbe
betracked
trackedaasecond
secondtime
timewith
withflag
flag“The
“Thequery
queryisisdelayed”
delayed”
FROM tctg_itm_item
ififthe
runtime
is
above
the
profiling_query_delay_threshold
value
the runtime is above the profiling_query_delay_threshold value
WHERE itm_company_id = 30001
...
2015-03-24 08:43:06,927 [jsp_48: classicornew.wpc] DEBUG com.ibm.ccd.common.db.Query - Time: (0 ms)
Static Query [name: GET_LKP_TIMESTAMP]
[id: 1780]
select rtrim(caa_value) from tctg_caa_catalog_attributes where caa_name = 'LAST_MODIFIED' and caa_catalog_id
= 25206 and caa_next_version_id = 999999999
Timestamp
Timestampwhen
whenthread
threadreceived
receivedquery
queryresult
result
10
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Sumall: Generate SQL workload/performance statistics
Utility Sumall can be used to parse db.log files and generate a statistical summary of logged
SQL runtimes.
The tool is already bundled with MDMCS starting with release V10.0.0, but depending on
release and fixpack, might not provide latest features. It is invoked with:
$JAVA_RT com.ibm.ccd.common.wpcsupport.util.Sumall <db.log files> [options]
A standalone version of the tool containing latest enhancements is published in technote
http://www-01.ibm.com/support/docview.wss?uid=swg24041140 (How to create SQL workload
and performance statistics from db.log files)
and can be used to parse as well db.log files from MDMCS V9.1.0:
java -jar <fullpath>/Sumall.jar <db.log files> [options]
To get list of options run: invoke with “h” appended
java -jar <fullpath>/Sumall.jar h
11
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Sumall/Sumall.jar: Generate SQL workload/performance statistics
Usage:
$JAVA_RT com.ibm.ccd.common.wpcsupport.util.Sumall <db.log files>
[leng=n characters] [delayed] [printsql] [cvon] [cleanoff]
[fromtime="yyyy-MM-dd HH:mm:ss[,SSS]"] [totime="yyyy-MM-dd HH:mm:ss[,SSS]"]
[threadname="<threadname>"] [threadlist] [version] [h|help]
Options
h|help:
12
print usage info
leng:
Default Value = 70. It defines how many characters from the beginning of the
SQL are used for statistical aggregation
printsql:
Extract SQLs and its runtime in sequential order, print one line per SQL
delayed:
Only SQLs marked as delayed will be analyzed
cvon:
Use to turn on calculation of coefficient of variation (cv)
It provides a normalized metric how much single runtimes vary compared to average runtime
cleanoff:
When set, parameter values will not be replaced in Dynamic SQL strings. Depending on leng
parameter setting, this might cause Dynamic queries of the same type to be reported on
seperate lines, in case individual parameter values are contained in the first n (leng)
Characters. This might be desired if you want to investigate Dynamic queries with their
actual parameter values
fromtime:
Analyze only SQLs with a timestamp > fromtime, required input format: yyyy-MM-dd HH:mm:ss
totime:
Analyze only SQLs with a timestamp < totime, required input format: yyyy-MM-dd HH:mm:ss
threadname:
SQLs will be analyzed only for the provided thread. Use a threadname as listed by
threadlist option
threadlist:
Print sqlstatistics per thread ordered by total SQLruntime
version:
Print current version
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Sumall: Sample output
“threadname”
“threadname”option
optioncan
canbe
beused
usedto
toextract
extractSQL
SQL
workload
triggered
by
one
specific
thread
workload triggered by one specific threadonly
only
SQL
SQLstatistics
statisticsare
aregenerated
generatedbased
basedon
onfirst
first70
70characters
characters
(default),
but
can
be
varied
using
“leng”
option
(default), but can be varied using “leng” option
java -jar Sumall.jar db.log* threadname="jsp_57678: moveEntriesToNextStep.wpc"
…
Output
Outputisissorted
sortedbased
basedon
ontotal
total
runtime
in
descending
order.
runtime in descending order.SQLs
SQLs
with
withhighest
highestimpact
impactare
areshown
shownatat
the
thebottom
bottom
For
Foreach
eachunique
uniqueSQL
SQLstatement,
statement,
33statistical
values
statistical valuesare
areprinted:
printed:
--total
totalruntime
runtime(milliseconds)
(milliseconds)
--number
numberof
ofexecutions
executions
--average
runtime
average runtime(milliseconds)
(milliseconds)
130
130
1,0
GEN_CTG_CSA_CAT_SYS_ATTR_GETBYCATEGORYID] SELECT * FROM tctg_csa_cat_s
134
130
1,0
GEN_CTG_CHI_CATEGORY_HIERARCHY_GETBYCHILDID] SELECT * FROM tctg_chi_ca
134
130
1,0
GEN_CTG_CFP_CAT_FULL_PATHS_GETBYCATEGORYID] SELECT * FROM tctg_cfp_cat
158
1
158,0
GET_EVENT_ENTRY_MATCHING_SIGNATURE] SELECT wee_entry_id FROM twfl_wfe_
219
235
0,9
GEN_SEC_SCU_USER_GETBYID] SELECT * FROM tsec_scu_user WHERE scu_user_i
223
220
1,0
GEN_SEC_CMP_COMPANY_GETBYID] SELECT * FROM tsec_cmp_company WHERE cmp_
267
278
1,0
GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm
276
250
1,1
GEN_CTG_ITD_ITEM_DETAIL_GETBYIDVERSION] SELECT * FROM tctg_itd_item_de
431
428
1,0
GEN_CTG_ITM_ITEM_GETBYPRIMARYKEYVERSION] SELECT * FROM tctg_itm_item W
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2015-07-24 10:37:54,467 - 2015-07-24 10:37:57,563 : 0,00 h | 0,00 m | 3,10 s
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
2862
2733
1
0,0
Monitored
Monitoredtime
timeperiod
period(MT)
(MT)shows
showsthe
the
duration
of
the
thread
based
on
duration of the thread based onfirst
firstand
and
last
lasttriggered
triggeredSQL
SQL
13
Total Runtime
(hours)
0,0
% of MT
(%)
92,4
Aggregated
Aggregatedstatistics
statistics
%
%of
ofMT:
MT:ititreflects
reflectsthe
theSQL
SQLfootprint
footprintin
inrelation
relationto
to
Monitored
time
period.
Indicative
especially
when
Monitored time period. Indicative especially whenstatistics
statistics
are
arecalculated
calculatedfor
foraasingle
singlethread
thread
Rule
Ruleof
ofthumb:
thumb:
-->=80%:
>=80%:thread
threadperformance
performanceisislimited
limitedby
bySQL
SQLperformance
performance
--<=80%:
data
processing
becomes
CPU
intensive
<=80%: data processing becomes CPU intensive(complex
(complex
data,
data, extensive
extensivecustom
customcode
codelogic
logic...)
...)
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Basic health checks to ensure optimal performance
Basic healthchecks should be run before starting any performance troubleshooting
 Is the database configured according to installation requirements?
 Does the current table and index layout match the default schema?
 Are table and index statistics up to date or is table and index data very fragmented?
 Are old versions being deleted regularly?
 Are MDMCS caches configured sufficiently high?
 Are there any CPU or memory (also DB bufferpool) related bottlenecks on the application or
database server side ?
Details to be found at:
http://www-01.ibm.com/support/docview.wss?uid=swg21641292
14
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Understand and define the performance problem
 What exactly is slow
General slowness may point to some basic configuration/maintenance deficiencies or
resource bottlenecks
Only a specific use case
==> precisely define and document the reproduction steps and involved objects
Use always the same data set for tests, if possible run tests on isolated environment
Clarify what custom logic is involved (container scripts, specs, views)
 What is the current and expected runtime
 When did the problem manifest
Performance degraded over time (indication for general maintenance tasks)
Abrupt change in performance? What has changed ?
(New custom code or data model deployment, manipulation of high volume of data ...)
 Who is affected
All users working with the user interface(which one used: classic or new simplified UI), or
specific users only (specific permissions, location in a different network, different client
workstation hardware and Internet Explorer version …)
15
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack
If unclear, do an initial diagnostic data collection session to narrow down the performance
bottleneck in the stack before starting specific investigations.
16
Network
relevant for scheduled jobs,
workflow processing
relevant for UI interactions
User Experience
Browser (IE)
OS / Hardware
WAS
MDMCS Services /
Custom Code
OS / Hardware
Network
Overall execution time: stop watch
Time iexplore.exe peaks in CPU usage:
Windows Task Manager / stop watch
Time to receive response from MDMCS server:
Http tracing tool like Fiddler, IE Developer
tools(F12)
Execution time on MDMCS server along with
SQL runtime footprint: db.log analysis (captured
in debug mode) using Sumall utility
General monitoring of CPU or memory usage
peaks at MDMCS or DB server: Nmon, vmstat ...
DB2 / Oracle
OS / Hardware
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack (example)
Reported Problem:
Rendering all items in a specific step within worklist summary page takes some 35 seconds.
To collect:
As the use case involves UI interaction, client side checks need to be included:
 Overall execution time till page is rendered
Use stop watch to measure. Track start and stop time for the use case by running the “date”
command on MDMCS shell console so log files can be matched more easily
 Time till when response is received from MDMCS server:
Use http tracing tool like Fiddler
 When does iexplore.exe start to peak in CPU usage and for how long
Monitor iexplore.exe in process tab of Windows Task Manager
 Execution time on MDMCS server and SQL runtime footprint:
use pimSupport.sh -t start / stop to capture all log files in debug mode automatically (only
use if it is a "short" reproduction usecase, otherwise data volume might be too big)
or
manually enable debug mode (Technote 1383862) and make sure to archive db.log* files
immediately when the reproduction use case is finished to avoid log file wrapping
 Monitor CPU and memory usage on MDMCS server:
Use nmon, vmstat, top ...
17
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack (example)
Observations:
 After clicking “verify attributes” link, iexplore.exe CPU started to peak after some 10 seconds
and stayed high for another 25 seconds. Use case completed in total after some 35 seconds
 As shown by Fiddler, response from MDMCS server was received after some 10 seconds
18
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack (example)
 Runtime footprint on MDMCS server side
First run Sumall with fromtime and threadlist option to identify thread that was
serving the client request
java -jar Sumall.jar db.log* threadlist fromtime="2014-02-27 14:28:00"
…
992
1828
0.5
GEN_SEC_CMP_COMPANY_GETBYID] SELECT * FROM tsec_cmp_company WHERE cmp_
1566
2304
0.7
OBJECT_ACCESS_FOR_USER_ROLE_ALL_GROUPS] SELECT orm_access, orm_access_
2920
1600
1.8
GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2014-02-27 14:28:35,999
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
14282
15754
0
0.2
PRINTING THREADSUMMARY
Total
Count
Average
(ms)
(cnt)
(avg ms)
1
1
1.0
30
14
2.1
69
97
0.7
124
19
6.5
5661
9600
0.6
8397
6023
1.4
Total
Count
Average
(ms)
(cnt)
(avg ms)
-
2014-02-27 16:04:49,337 :
Total Runtime
(hours)
0.0
1.60 h | 96.00 m | 5773.34 s
% of MT
(%)
0.2
first timestamp
last timestamp
Thread
2014-02-27 14:28:35
2014-02-27 15:02:30
2014-02-27 14:28:48
2014-02-27 14:32:41
2014-02-27 14:28:59
2014-02-27 14:28:36
first timestamp
2014-02-27 15:02:30
2014-02-27 16:04:49
2014-02-27 16:02:41
2014-02-27 16:04:06
2014-02-27 14:28:46
last timestamp
unregistered (formely: jsp_274: QuickLinks.wpc)
logoutListener:277
heartbeat_appsvr
jvm_maintenance_appsvr
Thread-45
jsp_276: ItemListQueryStore.wpc
Thread
Thread
Thread“jsp_276:
“jsp_276:ItemListQueryStore.wpc”
ItemListQueryStore.wpc”has
hasserved
servedthe
the
client
clientrequest
request(matches
(matchesbased
basedon
ontimestamp
timestampand
andname)
name)
19
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack (example)
 Calculate the SQL runtime statistics for only the thread jsp_276
java -jar Sumall.jar db.log* threadname="jsp_276: ItemListQueryStore.wpc"
sum(ms)
cnt
avg(ms) QUERY
----------------------------------------------...
8
2
4.0
GEN_SEC_SCU_USER_GETBYID] SELECT * FROM tsec_scu_user WHERE scu_user_i
47
7
6.7
GEN_WFL_CAE_COLL_AREA_ENTRY_GETBYCASTEPPATHID] SELECT * FROM twfl_cae_
239
400
0.6
Dynamic: [ select distinct csm_spec_id from tctg_csm_category_spec_ma
326
400
0.8
POPULATE_CSM_MAP] SELECT coa_src_container_id FROM twfl_coa_collaborat
349
400
0.9
GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm
366
400
0.9
GEN_CTG_ITM_ITEM_GETBYPRIMARYKEYVERSION] SELECT * FROM tctg_itm_item W
369
400
0.9
GEN_CTG_ITD_ITEM_DETAIL_GETBYIDVERSION] SELECT * FROM tctg_itd_item_de
407
400
1.0
GET_COL_AREA_ENTRY_LOCKED_COUNT_IN_STEP] select count(*) from cae, cel
498
400
1.2
GET_CSM_INFO_FOR_ITEMSET] SELECT distinct csm_category_id, csm_spec_id
529
400
1.3
GET_ITM_INFO_FOR_ITM_LIST] SELECT itm_id, itm_primary_key, itm_version
594
400
1.5
GET_ICM_INFO_FOR_ITEMSET] SELECT * FROM tctg_icm_item_category_map WHE
784
400
2.0
GET_CA_HISTORY_BY_DATES_FOR_PK] SELECT * FROM twfl_ceh_coll_entry_hist
946
400
2.4
GET_ITD_FOR_ITEMSET] SELECT * FROM tctg_itd_item_detail WHERE itd_item
2920
1600
1.8
GEN_WFL_CEH_COLL_ENTRY_HISTORY_GETBYCAENTRY] SELECT * FROM twfl_ceh_co
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2014-02-27 14:28:36,138
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
8397
6023
1
0.1
2014-02-27 14:28:46,358 :
Total Runtime
(hours)
0.0
6023
6023queries
querieswere
weretriggered
triggeredwith
withaatotal
total
runtime
runtimeof
of8.397
8.397seconds
seconds
20
-
% of MT
(%)
82.2
0.00 h | 0.00 m | 10.22 s
Thread
Threadwas
wasactive
activeon
onMDMCS
MDMCS
server
for
10.22
seconds
server for 10.22 seconds
Total
TotalSQL
SQLruntime
runtimecontributed
contributed
to
82%
of
monitored
to 82% of monitoredtimeperiod
timeperiod
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Locate performance bottleneck in component stack (example)
Overview:
User Experience
Browser (IE)
OS / Hardware
Network
WAS
MDMCS Services /
Custom Code
Iexplore.exe showing peak CPU usage for some 25 s
Time to receive response from MDMCS: ~10 s
Time to receive response matches Execution time on
MDMCS server ==> network in between client and
server does not add any reasonable delay
OS / Hardware
Execution time on MDMCS server: ~10s
SQL runtime footprint to receive data from DB: 8 s
Network
CPU on MDMCS and DB server not exhausted
DB2 / Oracle
OS / Hardware
21
Overal execution time: ~35 s (seconds)
Main performance bottleneck on client side (25s rendering time)
but
SQL workload is high as well and could be even more relevant
on less performing environments
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
If performance bottleneck is located on Client-side
 Clean your browser cache before using the user interface the first time after
a fix pack, interim fix, or test fix has been applied.
 Test with using a zoom level of 100% in Microsoft Internet Explorer (IE)
 Check whether IE browser version is compatible with MDMCS system requirements.
In general: latest approved (check system requirements) IE version performs best
 When working with multiple MDMCS client sessions in parallel, ensure that each IE window
uses its own session
 Run test with recent client hardware (especially new CPU processor)
If above checks and tests do not improve rendering performance on client, engage with
MDMCS support for more in depth investigations
22
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
If performance bottleneck is seen on MDMCS server side
Decide on next steps based on generated SQL footprint statistics
 What is the SQL runtime footprint in comparison to monitored timeperiod.
Is slowness rather due to processing results or retrieving data from DB server ?
 Large SQL runtime footprint (% of MT > 80) due to slow performing SQLs:
Do a SQL runtime differentiation check on both MDMCS server and DB server to
evaluate whether it is a DB server side issue, network issue or some JVM issue
picking up the results fast enough
If it is a DB server side issue, recheck recommended data maintenance, DB configuration,
bufferpool hitratios, OS resources …
 Large SQL runtime footprint due to massive number of SQLs:
Check MDMCS cache hitratio (if low, spec information is retrieved over and over again)
Check custom code logic whether same data is retrieved repeatedly
 CPU and memory utilization bottlenecks on MDMCS server
 Issues with JVM memory usage – rarely seen nowadays (use verbose gc to track memory
usage and garbage collection)
23
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Slow processing due to low MDMCS cache hit ratios
Spec
Specrelated
relatedqueries
queries(nom,
(nom,spa,
spa,spc,
spc,nod,
nod,
noa,
noh,
agn)
dominate
SQL
runtime
noa, noh, agn) dominate SQL runtime
java -jar Sumall.jar db.log*
…
18328
42138
0,4
Dynamic: [SELECT nom_master_spec_id, nom_master_spec_node_id, nom_sub
18553
42162
0,4
Dynamic: [SELECT DISTINCT spa_version_id FROM tctg_spa_spec_attribute
18667
42139
0,4
Dynamic: [SELECT noh_spec_id, noh_parent_node_id, noh_child_node_id,
18742
42149
0,4
GET_MAX_VERSION] SELECT max ( ver_id ) FROM tctg_ver_version
19012
42138
0,5
Dynamic: [SELECT * FROM tctg_spc_specification WHERE (spc_id in (? ))
19907
42138
0,5
Dynamic: [SELECT * FROM tctg_spa_spec_attributes WHERE (spa_spec_id i
20245
414
48,9
GET_CA_HISTORY_BY_DATES_FOR_PK] SELECT * FROM twfl_ceh_coll_entry_hist
24226
42138
0,6
Dynamic: [SELECT distinct nod_id, noh_spec_id, nod_spec_id, nod_name,
28697
42138
0,7
Dynamic: [SELECT noa_node_id, noa_name, noa_value FROM tctg_noa_node_
96318
54
1783,7
Dynamic: [ SELECT agn_spec_id b_spec_id, agn_node_id b_node_id, cvd_v
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2015-04-09 15:45:27,126
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
364204 437814
0
6,1
-
2015-04-09 16:04:26,913 :
Total Runtime
(hours)
0,1
0,32 h | 18,00 m | 1139,79 s
% of MT
(%)
32,0
Action:
Check Cache hit ratio using … SupportUtil –cmd=getRunTimeCacheDetail
24
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Slow processing due low MDMCS cache hit ratios
Check Cache hit utilization after MDMCS services have been busy with regular
workload:
$JAVA_RT com.ibm.ccd.common.wpcsupport.util.SupportUtil –cmd=getRunTimeCacheDetail
Cache utilization is shown for each service. Cache utilization might be different in
between services
For example
PIM CACHE SNAPSHOT:
Title
Current Cache Size
--------------appsvr_HOAPMDM001FST---------------Spec Cache
600
Spec Key To Current Start Version Cache 600
Spec Key Version to Start Version Cache 600
Lookup Table Cache
100
Role Cache
2000
Access Cache
5000
Spec Name Cache
5000
Attribute Collections Cache
100
WSDL Cache
0
Workflow Definition Cache
250
Script Cache
2000
Catalog Cache
100
Catalog Definition Cache
100
Node Id Cache
5000
View Cache
500
...
Max Cache Size
Cache Hits
Cache Requests
Hit Percentage
599
600
600
10
24
153
9
19
0
14
147
0
30
989
500
24878675
25305758
0
84691
90825
106736
16364
755635
0
45369
68387
0
203432
1016559
1880926
30335844
35777562
0
84822
91128
109308
16381
759681
0
45415
68663
0
203496
1018396
2286372
82%
71%
0%
100%
100%
98%
100%
99%
0%
100%
100%
0%
100%
100%
82%
Increase cache settings whenever “Max Cache Size” has reached “Current Cache Size” and
“Cache Requests” are higher than “Cache Hits”
25
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Inefficient custom code affecting item display performance
Observation:
According
Accordingto
tohttp
httptrace,
trace,itittakes
takes>>12
12seconds
secondsto
to
receive
data
on
client
receive data on client
SQL
SQLfootprint
footprintisisonly
onlysome
some33
33%
%of
ofthread
threadruntime,
runtime,
but
buthigh
highnumber
numberof
ofexecutions
executionsof
of22SQL
SQLtypes
types
Performance
Performancelimited
limitedby
bycustom
customor
orMDMCS
MDMCSserver
server
code
code
high
highCPU
CPUusage
usageseen
seenfor
forappsvr*
appsvr*service
servicewhile
while
request
requestisisprocessed
processedon
onMDMCS
MDMCSserver
serverside
side
java -jar Sumall.jar db.log threadname="jsp_347: classicornew.wpc"
...
64
122
0.5
GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm
1259
2597
0.5
GEN_CTG_AGL_ATTR_GROUP_LOCALE_GETBYATTRGROUPVERSION] SELECT * FROM tct
2538
5180
0.5
GEN_CTG_ATG_ATTR_GROUP_GETBYID] SELECT * FROM tctg_atg_attr_group WHER
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2015-03-23 17:30:34,301
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
4116
8260
0
0.1
26
-
2015-03-23 17:30:46,637 :
Total Runtime
(hours)
0.0
0.00 h | 0.00 m | 12.34 s
% of MT
(%)
33.4
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Inefficient custom code affecting item display performance
Using
Usingadditional
additionalSumall
Sumalloptions
options“cleanoff”
“cleanoff”and
and“leng”
“leng”reveals,
reveals,that
thatactually
actually
the
same
data
is
retrieved
over
and
over
again
and
thus
increases
the same data is retrieved over and over again and thus increasesSQL
SQL
footprint
footprintsignificantly
significantly
What
is
triggering
What is triggeringthis
this??
java -jar Sumall.jar db.log threadname="jsp_347: classicornew.wpc" leng=5000 cleanoff
...
25
1
25.0
RELEASE_MASTER_LOCKS_FOR_THREAD_JVM] UPDATE tutl_lck_lock SET lck_thread_id = '-...
1147
2590
0.4
GEN_CTG_ATG_ATTR_GROUP_GETBYID] SELECT * FROM tctg_atg_attr_group WHERE atg_id = -1
1257
2590
0.5
GEN_CTG_AGL_ATTR_GROUP_LOCALE_GETBYATTRGROUPVERSION] SELECT * FROM tctg_agl_attr...
1391
2590
0.5
GEN_CTG_ATG_ATTR_GROUP_GETBYID] SELECT * FROM tctg_atg_attr_group WHERE atg_id = 6421
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
Monitored timeperiod (MT): 2015-03-23 17:30:34,301
Statistical values for all queries
Total
Count
Average
Total
(ms)
(cnt)
(avg ms)
(minutes)
4116
8260
0
0.1
27
-
2015-03-23 17:30:46,637 :
Total Runtime
(hours)
0.0
0.00 h | 0.00 m | 12.34 s
% of MT
(%)
33.4
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Inefficient custom code affecting item display performance
Possible diagnostic actions to expose custom code issues:
 Test runtime with selectively disabled container scripts or suspected value rules
 Especially review custom code which is run in loops
 In complex custom code, add logging statements at major codeblocks,
containerscripts to see where time is spend
2015-03-24 16:33:00,028 [jsp_93: /ccd_rendering/renderer.jsp] DEBUG com.ibm.ccd.common.interpreter.engine.Script - @@@ Script
[ [ContainerA.PostSave]] started at: Tue Mar 24 16:33:00 CET 2015
...
2015-03-24 16:33:08,343 [jsp_93: /ccd_rendering/renderer.jsp] DEBUG com.ibm.ccd.common.interpreter.engine.Script - @@@ Script
[ [ContainerA.PostSave]] ended at: Tue Mar 24 16:33:08 CET 2015
 If too many SQLs are triggered by custom code, but you do not know which function calls
trigger these, combine timestamp information from custom code debug statements with
selectively parsing db.log files with Sumall utility, making use of options fromtime,
totime, threadname
 Eventually profile the respective JVM using some java profiler.
E.g. IBM Healthcenter: http://www.ibm.com/developerworks/java/jdk/tools/healthcenter/
28
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Inefficient custom code affecting item display performance
Solution in documented usecase:
Move call setAttributeGroupsToProcess(hmAttGroups) out of for loop reduced
the number of executions of queries
GEN_CTG_ATG_ATTR_GROUP_GETBYID
GEN_CTG_AGL_ATTR_GROUP_LOCALE_GETBYATTRGROUPVERSION
GEN_CTG_ATG_ATTR_GROUP_GETBYID
tremendously and thus reduced total SQL footprint and processing time
Original custom code Library
29
Fixed custom code Library
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
When to consider average SQL runtime slow?
java -jar Sumall.jar db.log
…
8405
7850
1,1
GEN_CTG_CSA_CAT_SYS_ATTR_GETBYCATEGORYID] SELECT * FROM tctg_csa_cat_s
8412
7850
1,1
GEN_CTG_CFP_CAT_FULL_PATHS_GETBYCATEGORYID] SELECT * FROM tctg_cfp_cat
10556
50
211,1
Dynamic: [select count(distinct ita? .ita_item_id) AS DUMMY_ALIAS_? f
12600
12146
1,0
GEN_SEC_CMP_COMPANY_GETBYID] SELECT * FROM tsec_cmp_company WHERE cmp_
14707
92
159,9
GET_EVENT_ENTRY_MATCHING_SIGNATURE] SELECT wee_entry_id FROM twfl_wfe_
27968
47450
0,6
Dynamic: [ select distinct csm_spec_id from tctg_csm_category_spec_ma
56704
52561
1,1
GEN_CTG_ICM_ITEM_CATEGORY_MAP_GETBYITEMVERSION] SELECT * FROM tctg_icm
57138
51090
1,1
GEN_CTG_ITM_ITEM_GETBYPRIMARYKEYVERSION] SELECT * FROM tctg_itm_item W
58288
49922
1,2
GEN_CTG_ITD_ITEM_DETAIL_GETBYIDVERSION] SELECT * FROM tctg_itd_item_de
----------------------------------------------sum(ms)
cnt
avg(ms) QUERY
?
?
?
No definite answer, depends on hardware and network and relevance (how often is it
triggered)
- Static queries are often simple and show average runtimes in the range of 0.4 – 20ms
- Dynamic queries are more complex and runtimes of several seconds might be expected
- overall use case performance might be affected by even slight changes in average SQL
runtime if it involves large volume of executions
Recommendation
Generate SQL statistics at intervals and different environments:
- get familiar with “expected” runtimes
- discover changes in frequency of specific SQLs which allows proactive investigations
- changes in avg runtimes of frequent SQLs are indicators of changed workload, need for
maintenance tasks, resource issues ...
30
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Variation of average runtimes in between environments
What is the reason for this difference?
Server hardware, network, DB configuration and maintenance, workload ...
Check influence of network e.g. by doing SQL runtime differentiation test (next slides)
31
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
SQL runtime differentiation test
Extract a set of commonly seen “select” SQLs from db.log files and measure
execution time when run against DB server from different locations using native DB
clients like sqlplus (Oracle) or db2 clp (DB2)
How to generate SQLfile for runtime differentiation test (extract only select SQLs)
java
java -jar
-jar Sumall.jar
Sumall.jar db.log
db.log printsql
printsql leng=500
leng=500 cleanoff
cleanoff || grep
grep -vi
-vi update
update
|| grep
-vi
insert
|
grep
-vi
delete
|grep
-vi
merge
|head
-500
grep -vi insert | grep -vi delete |grep -vi merge |head -500 >> sql500
sql500
awk
awk -f
-f gentestsqlfile.awk
gentestsqlfile.awk sql500
sql500 >> test500sql.tvf
test500sql.tvf
Content
Content of
of gentestsqlfile.awk
gentestsqlfile.awk
BEGIN{}
BEGIN{}
{{
sqlstring1
sqlstring1 == substr($0,52);
substr($0,52);
pos
=
index($0,"Dynamic:");
pos = index($0,"Dynamic:");
if(pos
if(pos >> 0){
0){
split(sqlstring1,v,"[");
split(sqlstring1,v,"[");
split(v[2],z,"]");
split(v[2],z,"]");
print
print z[1]
z[1] ";"
";" ;;
}}
else
else {{
split(sqlstring1,v,"]");
split(sqlstring1,v,"]");
print
print v[2]
v[2] ";"
";" ;;
}}
}}
END{}
END{}
32
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
SQL runtime differentiation test
Run the SQLs multiple times (allow population of bufferpools) against the DB server
from both, DB and MDMCS server
date; time db2 -tvf test500sql.tvf >foo
Executions on MDMCS server
Tue Mar 24 13:00:37 CET 2015
real
0m3.59s
user
0m0.16s
sys
0m0.29s
Executions directly on DB server
Tue Mar 24 13:03:06 CET 2015
real
0m0.93s
user
0m0.07s
sys
0m0.13s
Tue Mar
real
user
sys
24 13:00:41 CET 2015
0m3.18s
0m0.13s
0m0.22s
Tue Mar
real
user
sys
24 13:03:07 CET 2015
0m0.83s
0m0.07s
0m0.14s
Tue Mar
real
user
sys
24 13:05:44 CET 2015
0m2.39s
0m0.11s
0m0.20s
Tue Mar
real
user
sys
24 13:03:08 CET 2015
0m0.93s
0m0.08s
0m0.14s
To execute the same set of SQLs on MDMCS server against DB takes more than 3 times
longer than when running the SQLs directly on DB server.
==> network adds reasonable delay, so it should be checked
(one exemption: if CPU capability of DB server is much higher than on MDMCS server, then
difference might be less due to network than CPU difference)
33
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
One simple Network check: Traceroute
Latency and bandwidth between the application server and database
should be < 0.25 ms
Path to destination should be short
Check with traceroute, e.g.
traceroute 192.168.97.104
traceroute to 192.168.97.104 (192.168.97.104) from 10.252.41.244
(10.252.41.244), 30 hops max
outgoing MTU = 1500
1 10.252.40.1 (10.252.40.1) 10 ms 1 ms 4 ms
2 10.211.247.250 (10.211.247.250) 7 ms 5 ms 3 ms
3 df0000001670026.comp.com (192.168.97.104) 4 ms 3 ms 6 ms
==> rather long hop times and 2 routers involved
34
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Database Server-side checks
One time task after migration / changing setup
 Ensure DB configuration matches MDMCS DB setup recommendations and table
and index layout conforms with installed version
http://www-01.ibm.com/support/docview.wss?uid=swg21641292
Regular tasks
 Table and index statistics are generated in detail mode at regular intervals
 Tables and indexes are reorganized when needed (e.g. reorgchk)
Monitor
 Bufferpools configured sufficiently high to ensure high bufferpool hitratio AND low # of
physical reads
 Resource intensive SQLs (e.g. dbsnapshots, AWR reports) and evaluate impact with
MDMCS Administrator
 Free space in tablespaces respectively filesystem to allow growth of data
 CPU and disk I/O not at its limit
35
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Database Server-side checks
If there are SQLs that perform slowly, investigate data access plan.
 What is the expected resultset ?
 Does it utilize indexes ?
 Would an additional index accelerate data access ?
You may add at any time new “non unique” indexes for testing purposes.
But when found beneficial, recheck with support so helpful indexes can be
included in the default schema
Use db2bach & db2exfmt to investigate single SQL performance issues on DB2
- Collect runtime info along with database snapshots, which show e.g. bufferpool reads
db2batch -d <dbname> -a <schemaname>/<passwd> -f <file containing SQL> -i complete
ICM_INFO_dyn.batch -o f -1 p 5 o 5 e yes
Note:
-r
option "-o ... e yes" sets the explain mode, so data accessplan gets collected
- Extract the data accessplan
db2exfmt -d <dbname> -# 0 -w -1 -g TFIC -n % -s % -o <accessplanoutput> -e <schemaname>
Attention: SQLs tracked as “Static Query” in db.log are run as prepared statements using
parameter markers. In rare situations it might be observed that running same SQL using
literals instead of parameter markers might be faster.
http://www.ibm.com/developerworks/data/library/techarticle/dm-1104db2statementconcentratorreoptimization/
36
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Importance of high bufferpool hitratios on DB server
Need to read data from disk severely affects SQL
runtime.
Tests done in isolated DB server environment
with a complex SQL (as reported in some PMR)
by using db2batch show this influence.
Only parameter changed was bufferpool size
which influences number of physical reads (see
next slide)
37
SELECT DISTINCT ita1.ita_item_id AS c1
FROM cat cat1
,icm icm1
,ita ita1
,ita ita2
,ita ita3
WHERE ita1.ita_catalog_id = 7825
AND ita1.ita_version_id <= 999999999
AND ita1.ita_next_version_id >= 999999999
AND cat1.cat_id = icm1.icm_category_id
AND ita1.ita_node_id = 11192
AND ita2.ita_node_id = 11198
AND ita2.ita_item_id = ita1.ita_item_id
AND ita3.ita_node_id = 15467
AND ita3.ita_item_id = ita1.ita_item_id
AND icm1.icm_item_id = ita1.ita_item_id
AND cat1.cat_primary_key = '00208'
AND cat1.cat_version_id <= 999999999
AND cat1.cat_next_version_id >= 999999999
AND ita1.ita_value_numeric <= 1390950000000
AND (
(
ita2.ita_value_numeric > 1390950000000
AND ita2.ita_version_id <= 999999999
AND ita2.ita_next_version_id >= 999999999
)
OR (
ita2.ita_value_numeric IS NULL
AND ita2.ita_version_id <= 999999999
AND ita2.ita_next_version_id >= 999999999
)
)
AND ita3.ita_value_string = 'true'
AND ita3.ita_version_id <= 999999999
AND ita3.ita_next_version_id >= 999999999
AND icm1.icm_version_id <= 999999999
AND icm1.icm_next_version_id >= 999999999
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
Importance of high bufferpool hitratios on DB server
 By lowering bufferpool size from 50 to 30K, physical reads started to show up (6700)
and caused SQL runtime to double (13,8 versus 28,2 seconds)
 Each physical read accounted for an additional delay of 2.2 ms
38
© 2014 IBM Corporation
Introduction to troubleshooting MDMCS performance issues
39
© 2014 IBM Corporation
Fly UP