IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide
by user
Comments
Transcript
IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide
IBM® Cloud and Smarter Infrastructure Software IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Document version 2.4.0 IBM Cloud Orchestrator Performance Team © Copyright International Business Machines Corporation 2015. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide CONTENTS Contents .............................................................................................................................. iii List of Figures ...................................................................................................................... vi Author List ......................................................................................................................... viii Revision History .................................................................................................................. ix 1 Introduction............................................................................................................ 10 2 IBM Cloud Orchestrator 2.4 Overview .................................................................. 11 3 4 5 6 2.1 Functional Overview.................................................................................. 11 2.2 Architectural Overview .............................................................................. 13 Performance Overview ......................................................................................... 15 3.1 Sample Benchmark Environment ............................................................. 15 3.2 Key Performance Indicators ..................................................................... 16 3.2.1 Concurrent User Performance ............................................................... 17 3.2.2 Provisioning Performance ...................................................................... 19 Performance Benchmark Approaches .................................................................. 21 4.1 Monitoring and Analysis Tools .................................................................. 21 4.2 Infrastructure Benchmark Tools ................................................................ 22 4.3 Cloud Benchmarks .................................................................................... 22 Capacity Planning Recommendations .................................................................. 23 5.1 Cloud Capacity Planning Spreadsheet ..................................................... 23 5.2 IBM Cloud Orchestrator Management Server Capacity Planning ........... 24 5.3 Provisioned Virtual Machines Capacity Planning ..................................... 25 Cloud Configuration Recommendations ............................................................... 29 iii 6.1 OpenStack Keystone Worker Support...................................................... 29 6.2 Disabling the IWD Service ........................................................................ 29 6.3 IBM Workload Deployer Configuration ..................................................... 30 6.4 Virtual Machine IO Scheduler Configuration ............................................ 30 6.5 Advanced Configuration and Power Interface Management ................... 30 6.6 Java Virtual Machine Heap Configuration ................................................ 31 6.7 Database Configuration ............................................................................ 32 6.8 Database Management............................................................................. 32 6.9 7 6.8.1 DBMS Versions ...................................................................................... 32 6.8.2 Automatic Maintenance .......................................................................... 32 6.8.3 Operating System Configuration (Linux) ................................................ 33 Database Hygiene Overview .................................................................... 33 6.9.1 Database Backup Management ............................................................. 33 6.9.2 Database Statistics Management ........................................................... 36 6.9.3 Database Reorganization ....................................................................... 36 Summary Cookbook.............................................................................................. 38 7.1 Base Installation Recommendations ........................................................ 38 7.2 Post Installation Recommendations ......................................................... 39 7.3 High Scale Recommendations ................................................................. 39 Appendix A: IBM Cloud Orchestrator Monitoring Options ................................................ 40 A.1 OpenStack Monitoring .............................................................................. 40 A.2 IBM Cloud Orchestrator Monitoring .......................................................... 41 A.3 Infrastructure Monitoring ........................................................................... 42 Appendix B: OpenStack Keystone Monitoring ................................................................. 45 iv IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide B.1 PvRequestFilter ......................................................................................... 45 B.2 Enabling PvRequestFilter ......................................................................... 45 References ........................................................................................................................ 47 v LIST OF FIGURES Figure 1: Revision History ................................................................................................................ ix Figure 2: IBM Cloud Orchestrator Benefits Estimator .................................................................... 11 Figure 3: ICO Cloud Marketplace View .......................................................................................... 12 Figure 4: ICO Architecture Reference Topology ............................................................................ 13 Figure 5: ICO Sample Benchmark Environment ............................................................................ 15 Figure 6: Benchmark Data Model Population ................................................................................ 18 Figure 7: Load Driving (User) Scenarios ........................................................................................ 19 Figure 8: Provisioning Performance in a Closed System ............................................................... 20 Figure 9: Monitoring and Analysis Tools ........................................................................................ 21 Figure 10: Infrastructure Benchmark Tools .................................................................................... 22 Figure 11: ICO Management Server Capacity Planning ................................................................ 24 Figure 12: Capacity Planning Tool: Inquiry Form ........................................................................... 25 Figure 13: Capacity Planning Tool: User Demographic Information .............................................. 26 Figure 14: Capacity Planning Tool: Systems and Storage ............................................................. 26 Figure 15: Capacity Planning Tool: System and Workload Options ............................................... 27 Figure 16: Capacity Planning Tool: Virtual Machine Requirements ............................................... 27 Figure 17: Planning Tool: Confirmation Screen ............................................................................. 27 Figure 18: Planning Tool: System Summary .................................................................................. 28 Figure 19: Keystone Worker Configuration .................................................................................... 29 Figure 20: IWD Configuration ........................................................................................................ 30 Figure 21: Modifying the IO Scheduler........................................................................................... 30 Figure 22: Java Virtual Machine Heap Change Sets ..................................................................... 31 Figure 23: Database Configuration Change Sets .......................................................................... 32 Figure 24: DBMS Versions ............................................................................................................ 32 Figure 25: Database Automatic Maintenance Configuration .......................................................... 33 Figure 26: Database Backup with Compression Command ........................................................... 33 Figure 27: Database Offline Backup Restore ................................................................................. 33 Figure 28: Database Online Backup Schedule .............................................................................. 34 Figure 29: Database Incremental Backup Enablement .................................................................. 34 Figure 30: Database Online Backup Manual Restore .................................................................... 34 Figure 31: Database Online Backup Automatic Restore ................................................................ 34 Figure 32: Database Log Archiving to Disk .................................................................................... 34 Figure 33: Database Log Archiving to TSM ................................................................................... 35 vi IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Figure 34: Database Roll Forward Recovery: Sample A................................................................ 35 Figure 35: Database Roll Forward Recovery: Sample B................................................................ 35 Figure 36: Database Backup Cleanup Command .......................................................................... 35 Figure 37: Database Backup Automatic Cleanup Configuration .................................................... 35 Figure 38: Database Statistics Collection Command ..................................................................... 36 Figure 39: Database Statistics Collection Table Iterator ................................................................ 36 Figure 40: Database Reorganization Commands .......................................................................... 36 Figure 41: Database Reorganization Table Iterator ....................................................................... 37 Figure 42: Base Installation Recommendations ............................................................................. 38 Figure 43: Post Installation Recommendations .............................................................................. 39 Figure 44: High Scale Recommendations...................................................................................... 39 Figure 45: OpenStack Ceilometer Metrics ..................................................................................... 40 Figure 46: OpenStack Ceilometer Core Metrics ............................................................................ 41 Figure 47: Infrastructure Core Metrics ........................................................................................... 44 Figure 48: Keystone Monitoring PvRequestFilter Format .............................................................. 45 Figure 49: Keystone Monitoring PvRequestFilter Sample Output .................................................. 45 Figure 50: Keystone Monitoring Log Messages Example .............................................................. 46 Figure 51: Keystone Monitoring Statistics Example ....................................................................... 46 vii AUTHOR LIST This paper is the team effort of a number of cloud performance specialists comprising the IBM Cloud Orchestrator performance team. Additional recognition goes out to the entire IBM Cloud Orchestrator and OpenStack development teams. Mark Leitch (primary contact for this paper) IBM Toronto Laboratory Amadeus Podvratnik Marc Schunk IBM Boeblingen Laboratory Nate Rockwell IBM USA Tiarnán Ó Corráin IBM Ireland viii Alessandro Chiantera Andrea Tortosa Massimo Marra Michele Licursi Paolo Cavazza Sandro Piccinini IBM Rome Laboratory IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide REVISION HISTORY Date Version Revised By Comments March 12th, 2015 Draft MDL Initial version for review. April 1st, 2015 2.4.0 MDL Incorporated review comments. Figure 1: Revision History ix 1 Introduction Capacity planning involves the specification of the various components of an installation to meet customer requirements, often with growth or timeline considerations. A key aspect of capacity planning for cloud, or virtualized, environments is the specification of sufficient physical resources to provide the illusion of infinite resources in an environment that may be characterized by highly variable demand. This document will provide an overview of capacity planning for the IBM Cloud Orchestrator (ICO) Version 2.4. In addition, it will offer management best practices to achieve a well performing installation that demonstrates service stability. ICO Version 2.4 offers end to end management of service offerings across a number of cloud technology offerings including VMware, Kernel-based Virtual Machine (KVM), IBM PowerVM, and IBM System z. A key implementation aspect is integration with OpenStack, the de facto leading open virtualization technology. OpenStack offers the ability to control compute, storage, and network resources through an open, community based architecture. In this document we will provide an ICO 2.4 overview, including functionality, architecture, and performance. We will then offer the capacity planning recommendations, including considerations for hardware configuration, software configuration, and cloud maintenance best practices. A summary “cookbook” is provided to manage installation and configuration for specific instances of ICO. Note: This document is considered a work in progress. Capacity planning recommendations will be refined and updated as new ICO releases are available. While the paper in general is considered suitable for all ICO Version 2.4 releases, it is best oriented towards ICO Version 2.4.0.1. In addition, a number of references are provided in the References section. These papers are highly recommended for readers who want detailed knowledge of ICO server configuration, architecture, and capacity planning. Note: Some artifacts are distributed with this paper. The distributions are in zip format. However Adobe protects against files with a “zip” suffix. As a result, the file suffix is set to “zap” per distribution. To use these artifacts, simply rename the distribution to “zip” and process as usual. 10 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 2 IBM Cloud Orchestrator 2.4 Overview An overview of ICO Version 2.4 will be provided from the following perspectives: 1. Functional 2. Architectural 2.1 Functional Overview The basic functional capability of ICO involves the management of cloud computing resources for dynamic data centers. In a nutshell, ICO offers infrastructure, platform, and orchestration services that make it possible to lower the cost of service delivery (both in terms of time and skill) while delivering higher degrees of standardization and automation. In order to determine the benefits of deploying ICO in business terms, the IBM Cloud Orchestrator Benefits Estimator (URL) is available. A screenshot of the estimator is provided below. Figure 2: IBM Cloud Orchestrator Benefits Estimator 11 A more detailed cloud marketplace view of the ICO solution follows. Figure 3: ICO Cloud Marketplace View The core functional capabilities of ICO include the following. Workflow Orchestration. The Business Process Manager (BPM) component offers a standard library as well as a graphical editor for workflow orchestration. Overall, this provides a powerful mechanism for complex and custom business process in the cloud context. Pattern Management. The IBM Workload Deployer (IWD) offers sophisticated pattern support for deploying multi node applications that may consist of complex middleware. Once again, graphical editor support for pattern management is provided. Service Management. Service management options are available in the ICO Enterprise edition. It provides a set of management utilities to further facilitate business process management. Not shown in the diagram is a Scalable Web Infrastructure to facilitate cloud self service offerings. For more information please consult the ICO knowledge center (URL). In addition, the ICO resource center is available (URL). 12 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 2.2 Architectural Overview The following diagram shows the reference deployment topology for ICO. A description of the reference topology follows. Figure 4: ICO Architecture Reference Topology The core of the reference topology is based on a core set of virtual machines: Deployment Server. The installation or deployment service for the ICO instance(s). Central Server 1. This server hosts the DB2 Database Management System (DBMS). The performance of the DBMS is critical to the overall solution and is dealt with extensively in Section 6. Central Server 2. This is essentially the “super node” for the ICO instance. This server hosts OpenStack Keystone, providing identity, token, catalog, and policy services. It also hosts Business Process Manager (BPM), the primary mechanism for driving business process workflows. The most critical aspect of this server is managing Keystone and BPM, as described in Section 6. Central Server 3. This server hosts the IBM Workload Deployer pattern engine. Performance configuration of this component is described in Section 6. Associated with these core server virtual machines are a number of region servers. Region servers may represent a specific cluster or geographic zone of cloud compute nodes. Sample compute nodes are shown for VMware, KVM, and PowerVM, with associated communication paths. For example, for VMware the VMware community driver is used to drive the operation of the VMware cluster. For KVM, the OpenStack control node is used to coordinate the KVM instance. Given this is a virtual implementation, some considerations should be kept in mind: In general, it is more difficult to manage performance in a virtual environment due to the additional hypervisor management overhead and system configuration. Device parallelism via dedicated storage arrays/LUNs is preferred. Sample approaches, from most impactful to least impactful, are provided below. o Separate data stores for “managed from” and “managed to” environments. 13 o Spread data stores across several physical disks to maximize storage capability. o Separate data stores for image templates and provisioned images. o Employ the “deadline” or “noop” scheduler algorithm for management server and provisioned VMs (see Section 6.4). o Optimize base storage capability (i.e. SSD with “VMDirectPath” enablement for VMware). Servers where this may be critical, due to their dependency on disk IO capabilities, are Central Server 1 and the VMware vCenter instances. Network optimization, for example 10GbE adoption. In addition, segment customer networks to an acceptable level to reduce address lookup impact. 14 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 3 Performance Overview There are two distinct aspects of cloud performance: 1. Performance of the ICO management server itself. This is the primary focus of this section. 2. Performance of the provisioned server instances. This is more of a capacity planning statement, and is covered in Section 5.3. We will provide a general overview of the Key Performance Indicators (KPIs) for the ICO management server. The following sections will describe the general benchmark environment, and the associated KPIs. 3.1 Sample Benchmark Environment The following figure shows a sample configuration that has been used for ICO benchmarks. Figure 5: ICO Sample Benchmark Environment The environment is characterized by the following features, broken down in terms of the ICO management server (aka “managed from”) and the associated cloud (aka “managed to”). Managed from: o Server configuration: 15 o 4/5 HS22V Blades with 2 x 4 cores Intel Xeon x5570 2.93 GHz. 8 physical cores per blade, 16 logical cores when hyper-threading is enabled. 72 GB RAM per blade. 2 x Redundant 10G Ethernet Networking (Janice HSSM). 2 x Redundant 8G FC Network (Qlogic FC SM). Storage configuration: 1 x DS3400 with 4 Exp with 12 Disk 600 GB SAS 10K each (48 x 600 GB = 28.8 TB raw). Managed to: o 3.2 Server configuration: Tens of HS22V Blades with 2 x 6 cores Intel Xeon x5670 2.93 GHz. 12 physical cores per blade, 24 logical cores when hyper-threading is enabled. 72 GB RAM per blade. 2 x Redundant 10G Ethernet Networking (Janice HSSM). 2 x Redundant 8G FC Network (Qlogic FC SM). o Storage configuration: 1 x Storwize v7000 with 3 Exp with 12 Disks 2 TB NL-SAS 7.2k each (36 x 2 TB = 72 TB raw). o Storage access has been configured to use the multi-path access granted by Storwize. In particular, VMware ESXi servers have been configured to use all of the 8 active paths to access LUNs using a round robin policy. Key Performance Indicators The following Key Performance Indicators are managed for ICO through a set of comprehensive benchmarks. 1. Concurrent User Performance, comprising: a. Average response time for ICO pages related to administrative tasks. b. Average response time for ICO pages related to end user tasks. 2. Provisioning throughput, comprising: a. Provisioning throughput for a vSys with a single part. b. Average service time for provisioned VMs. 3. LAMP (Linux, Apache, MySQL, Python) stack performance, comprising: a. vApp deployment time. b. vApp stop time. c. vApp deletion time. 4. Bulk windows stack performance comprising vSys with multiple parts (15 VMs) provisioning time. A key aspect of the benchmarks is they are run with associated background workloads and for a long duration (e.g. weeks or months). The rationale behind this is very simple: to run benchmarks that closely emulate the customer experience and will drive “real world” 16 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide results (versus overly optimistic lab based results). We will describe the concurrent user and provisioning throughput KPIs in more detail. 3.2.1 Concurrent User Performance ICO User Interface performance is established through concurrent user benchmark tests. In order to understand the applicability of such a benchmark, it is important to understand what is meant by a concurrent user. Consider: P = total population for an instance of ICO (including cloud administrators, end users, etc.). C = the concurrent user population for an instance of ICO. Concurrent users are considered to be the set of users within the overall population P that are actively managing the cloud environment at a point in time (e.g. administrator operations in the User Interface, provisioning operations, etc.). In general, P is a much larger value than C (i.e. P >> C). For example, it is not unrealistic that a total population of 200 users may have a concurrent user population of 40 users (i.e. 20%). For the concurrent user workload driven for ICO, there are three sets of criteria that drive the benchmark: 1. Load driving parameters. 2. Data population. 3. Load driving (user) scenarios. Load Driving Parameters The following load driving parameters apply. 1. User transaction rate control. The frequency that simulated users drive actions against the back end is managed via loop control functions. Closed loop simulation approaches are used where a new user will enter the system only when a previous user completes. Through the closed loop system, steady state operations under load may be driven. 2. Think times. Think times are the “pause” between user operations, meant to simulate the behavior of a human user. The think time interval used is [100%,300%] (meaning, the replay via the load driver is up to three times the rate of the scenario recording rate). 3. Bandwidth throttling. In order to simulate low speed or high latency lines, bandwidth throttling is employed for some client workloads. The throttle is set to a value that represents a moderate speed ADSL connection (cable/DSL simulation setting of 1.5 Mbps download, 384 Kbps upload). Data Population Parameters The benchmark is run against a data model that represents a large scale customer environment. The following table shows a sample configuration where the system is populated with data to represent a large number of users, active Virtual System instances, and active Virtual Machines existing prior to ICO installation. Through this approach, the workload for managing the solution is representative of some customer environments. 17 Benchmark Parameter Value Cloud Administrators Cloud Domains Tenants Users 1 11 200 1000 2 (KVM, VMware) 1 1 40 (20Linux, 20 Windows) 20 + 1 (20 Linux vSys patterns, 1 bulk Windows pattern) 1 (LAMP vApp for VMware domain) 5 (1 flavor for RHEL, 3 flavors for Windows, 1 flavor for vApp) 20 (1 per Linux vSys Pattern) 400 (10 per image template 200 Linux, 200 Windows) Hypervisor Types Cloud Groups Environment Profile Image Templates vSys Patterns vApp Patterns Flavors Active vSys instances Standalone (Unmanaged) VMs Figure 6: Benchmark Data Model Population Load Driving (User) Scenarios The concurrent user population (i.e. C) is broken down into the following user profile distribution and scenarios. User Profile Number of Users: 20 (50% overall) User Type: End User Task Type: VM Provisioning Activity: vSys with single part (Linux) provisioning through SelfService Catalog (SSC) offering on VMware. Number of Users: 16 (40% overall) User Type: End User Task Type: User Management Activity: End user operations through Self-Service Catalog (SSC) offering. Scenario per User 1. Login. 2. Provision vSys single part using SSC offering. 3. Wait until available. 4. Go to the vSys instance details page. 5. Delete vSys using SSC offering. 6. Wait until deletion complete. 7. Logout. 8. Enter next cycle according to arrival rate. 1. Login. 2. Submit SSC offering "Create User in VM", selecting one of the VMs belonging to one of the pre-populated vSys. 3. Wait until done. 4. Submit SSC offering "Delete User in VM", selecting the same VM. 5. Wait until done. 6. Logout. 7. Enter next cycle according to arrival rate. 18 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Number of Users: 2 (5% overall) User Type: Administrator Task Type: Monitoring Activity: Administrative operations through the IBM Workload Deployer user interface. Number of Users: 1 (2.5% overall) User Type: End User Task Type: Provisioning Activity: vApp (LAMP) provisioning through IBM Workload Deployer user interface on VMware. Number of Users: 1 (2.5% overall) User Type: End User Task Type: Provisioning Activity: vSys with multiple parts (bulk Windows) provisioning through Self-Service Catalog (SSC) offering on VMware. 1. 2. 3. 4. 5. 6. 7. 8. 9. 1. 2. 3. 4. 5. 6. 7. 8. 9. Login. List hypervisors. Select a hypervisor. List VMs in hypervisor. Show all instances. Go to "My Requests". Sort the requests by status. View the trace log. Logout. Login. Provision vApp using the IWD UI. Wait until available. Stop vApp using the IWD UI. Wait until done. Delete vApp using the IWD UI. Wait until deletion complete. Logout. Enter next cycle according to arrival rate. 1. Login. 2. Provision vSys bulk Windows using SSC offering. 3. Wait until available. 4. Go to vSys instance details page. 5. Delete vSys bulk Windows using SSC offering. 6. Wait until deleted. 7. Logout. 8. Enter next cycle according to arrival rate. Figure 7: Load Driving (User) Scenarios In overall terms, 55% of the load driving activities are driving Virtual Machine provisioning scenarios. The remaining 45% of scenarios are general administration and management tasks. For the active workload, the user operations meet the following response time thresholds. Administrative page response times: 90% of pages < 10s, 100% of pages < 15s. End user operations: 90% of pages < 2s, 100% of pages < 5s. 3.2.2 Provisioning Performance Cloud provisioning is enormously complex in performance terms. Hardware configuration, user workloads, image properties, and a multitude of other factors combine to determine overall capability. ICO provisioning performance is typically measured via a closed system, defined as an isolated system where we can demonstrate a constant sustained provisioning workload. In order to achieve this, as requests complete within the system, new requests are initiated. 19 Figure 8: Provisioning Performance in a Closed System The performance systems running ICO workloads literally run for months. These systems are treated like customer systems with 24x7 operations and field ready maintenance approaches in place (as described in Section Error! Reference source not found.). In terms of provisioning performance, the following are sample statistics a long run scenario driven for a number of weeks, once a period of operational stability has been reached based on the recommendations provided in this paper. Number of systems provisioned: > 1,000,000 VMs. Provisioning rate (average): > 400 VMs/hour. Service times (average): 2 minutes 38 seconds (VMware non linked clones). Workflow capability: On the order of 300 workflows per hour (generally short running workflows under a minute in duration). Success rate: > 99.99% Given this is sustained, continuous workload, higher peak workloads are, of course, possible. The success rate is considered especially noteworthy. 20 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Performance Benchmark Approaches 4 As part of cloud management and capacity planning, it is valuable to manage cloud benchmarks. Value propositions include: Understanding the capability of the cloud infrastructure (and potentially poorly configured or underperforming components of the infrastructure). Understanding the base capability of the ICO implementation and associated customization. Understanding the long term performance stability of the system. We will describe basic system monitoring approaches, infrastructure benchmarks, and cloud benchmarks. 4.1 Monitoring and Analysis Tools The following table shows the core recommended monitoring and analysis tools. Tool Description pdcollect ICO log collection tool. Documentation and recommended invocation: ICO Product Knowledge Center esxtop VMware performance collection tool. Documentation: URL Recommended invocation: esxtop -b -a -d 60 -n <number_of_samples> > <output file> nmon nmon is a comprehensive system monitoring tool for the UNIX platform. It is highly useful for understanding system behavior. Documentation: URL Sample invocation: nmon -T -s <samplerate> -c <iterations> -F <output file> Note: On Windows systems, Windows perfmon may be used. db2support Database support collection tool. Documentation: URL Recommended invocation: db2support <result directory> -d <database> -c -f -s -l DBMS Snapshots WAIT DBMS snapshot monitoring can offer insight into SQL workload, and in particular expensive SQL statements. Documentation: URL Java WAIT monitoring can provide a non-invasive view of JVM performance through accumulated Java cores and analytic tools. Documentation and recommended invocation: URL Figure 9: Monitoring and Analysis Tools 21 4.2 Infrastructure Benchmark Tools The following table shows some recommended infrastructure benchmark tools. Tool Description iometer I/O subsystem measurement and characterization tool for single and clustered systems. Documentation: URL Recommended invocation: dynamo /m <client host name or ip> iperf TCP and UDP measurement and characterization tool that reports bandwidth, delay, jitter, and datagram loss. Documentation: URL Recommended server invocation: iperf –s Recommended client invocation #1: iperf -c <server host name or ip> Recommended client invocation #2: iperf -c <server host name or ip> -R UnixBench UNIX measurement and characterization tool, with reference benchmarks and evaluation scores. Documentation: URL Recommended invocation: ./Run Figure 10: Infrastructure Benchmark Tools 4.3 Cloud Benchmarks Cloud benchmarks should be based on enterprise utilization. Sample benchmarks that are easy to manage include the following. 1. Single VM deployment times. 2. Small scale concurrent VM deployment times (e.g. 10 requests in parallel). 3. REST API response times. It is recommended to establish a small load driver, record a baseline, and then use these small benchmarks as a standard to assess ongoing cloud health. More complex benchmarks, including client request monitoring approaches, may of course be established. For OpenStack specific benchmarks, OpenStack Rally may be leveraged (see the References section for further detail). In addition, the Open Systems Group is involved in cloud computing benchmark standards. A report, including the IBM CloudBench tool, is available in the References section. 22 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 5 Capacity Planning Recommendations We will provide capacity planning recommendations through three approaches. 5.1 Static planning via a spreadsheet approach. Capacity planning for the ICO management server (aka the “managed from” infrastructure). Capacity planning for the provisioned Virtual Machines (aka the “managed to” infrastructure). Cloud Capacity Planning Spreadsheet In order to provide a desired hardware and software configuration for an ICO implementation, a wide range of parameters must be understood. The following questions are usually relevant. 1. What operations are expected to be performed with ICO? 2. What are the average and peak concurrent user workloads? 3. What is the enterprise network topology? 4. What is the expected workload for provisioned virtual servers, and how do they map to the physical configuration? 5. For the provisioned servers: a. What is the distribution size? b. What are the application service level requirements? A capacity planning spreadsheet is attached to this paper (“ ICO Capacity Planning Profile v2.4.0.xlsx”). The spreadsheet may be used to provide a cloud profile for further sizing activities (e.g. a capacity planning activity in association with the document authors). 23 5.2 IBM Cloud Orchestrator Management Server Capacity Planning The ICO management server requirements are documented in the ICO Knowledge Center (URL). The summary table is repeated here for discussion purposes. Processor (vCPUs) Memory (GB) Free Storage (GB) Minimum 1 4 117 Recommended 2 8 117 Minimum 2 vCPUs 6 GB 100 GB Recommended 4 vCPUs 12 GB 200 GB Minimum 2 vCPUs 8 GB 50 GB Recommended 6 vCPUs 12 GB 200 GB Minimum 2 vCPUs 6 GB 146 GB Recommended 4 vCPUs 8 GB 160 GB n/a n/a n/a Recommended 4 vCPUs 8 GB 20 GB Minimum 2 vCPUs 4 GB 32 GB Recommended 4 vCPUs 8 GB 32 GB Minimum 2 vCPUs 8 GB 77 GB Recommended 8 vCPUs 8 GB 160 GB Minimum 2 vCPUs 4 GB 77 GB Recommended 8 vCPUs 8 GB 160 GB Minimum 4 vCPUs 32 GB 160 GB Server & Configuration Deployment Server Central Server 1 Central Server 2 Central Server 3 Minimum SA Application Manager Neutron Network Server Region Server: VMware Region Server: KVM, Power, z/VM KVM Compute Node Recommended Application Specific Figure 11: ICO Management Server Capacity Planning While further qualifiers are available in the Knowledge Center, some comments apply. In general, the recommended vCPU and memory allocations should be met. To determine the ratio of virtual to physical CPUs, monitoring of the production system is required. For performance verification, a 1:1 mapping is used. 24 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 5.3 For the physical mapping, it is important to distinguish between “real” cores and hyper threaded (HT) cores. External benchmarks suggest an HT core may yield 30% of the capability of a “real” core. The recommended storage amounts are highly subjective. For example, the minimum recommendations are sufficient for performance verification systems driven for months (with some minor exceptions). Provisioned Virtual Machines Capacity Planning Managing cloud workloads is typically driven as a categorization exercise where workload “sizes” are used to determine the overall capacity requirements. A capacity planning tool is available for managing the cloud workload sizes (URL). We will provide an overview of using this tool. The first step is to provide any relevant business value. In the absence of a defined opportunity, simple “not applicable” entries may be given (per the sample below). Once submitted, you must accept the usage agreement which will bring up the demographic page. Figure 12: Capacity Planning Tool: Inquiry Form 25 The demographic page simply asks for generic information about the submitter. Figure 13: Capacity Planning Tool: User Demographic Information When “Continue” is selected, then the systems and storage page is provided. Figure 14: Capacity Planning Tool: Systems and Storage 26 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Then the target system and associated utilization and Virtual Machine requirements are selected. Note for the utilization we select 20% headroom to support peak cloud workloads. Figure 15: Capacity Planning Tool: System and Workload Options At this point, the virtual machine requirements may be selected. Note a number of entries may be added. Figure 16: Capacity Planning Tool: Virtual Machine Requirements A confirmation screen is then provided to finalize the capacity planning request. Figure 17: Planning Tool: Confirmation Screen The summary capacity planning recommendation is then provided. The summary details the compute node, CPU, memory, and storage requirements based on the selected configuration and associated workloads. 27 Figure 18: Planning Tool: System Summary 28 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 6 Cloud Configuration Recommendations The ICO 2.4 offering provides suitable configuration as part of the default installation. However, there are some specific configuration aspects that may improve the capability. The configuration points follow. 1. OpenStack Keystone worker support. 2. Disabling the IWD service. 3. IBM Workload Deployer configuration. 4. Virtual Machine IO scheduler. 5. Advanced Configuration and Power Interface (ACPI) management. 6. Java Virtual Machine heap. 7. Database configuration. 8. Database management. 6.1 OpenStack Keystone Worker Support The initial SCO 2.3 offering contained a Keystone implementation that is characterized by a single execution thread instance. For ICO 2.4, improvements have been made to exploit multiple concurrent Keystone workers. This change offers advantages when Keystone exhibits high request latency, or is seen to consume a significant amount of a virtual CPU (e.g. > 80%). In order to exploit this support, it is necessary to revise the configuration to exploit multiple workers. Further detail on this is provided below. With the Keystone Worker improvement in place, the following configuration change will allow a pool of four public workers, and four administrative workers. This will permit increased concurrency, at the expense of virtual CPU consumption. As a result, the virtual CPU allocation should be increased based on monitoring data. In the “4+4” worker example below, it is expected to increase the virtual CPU allocation on the order of two to four virtual CPUs. Location: (Central Server 2) /etc/keystone/keystone.conf # The number of worker processes to serve the public WSGI application # (integer value). public_workers=4 # The number of worker processes to serve the admin WSGI application #(integer value). admin_workers=4 Figure 19: Keystone Worker Configuration 6.2 Disabling the IWD Service The IWD service consumes significant resources across the ICO management stack. In the event the service is not required, it should be disabled. 29 6.3 IBM Workload Deployer Configuration The IWD component offers a number of configuration options. One specific option provides the ability to control a polling interval to refresh cloud information. Based on the size of the cloud, this configuration option should be changed. Location: (Central Server 3) /opt/ibm/rainmaker/purescale.app/private/expanded/ibm/rainmaker.vmsupport4.0.0.1/config/vmpublish.properties RuntimeInterval=12000 Original: Recommended: RuntimeInterval=30000 Figure 20: IWD Configuration 6.4 Virtual Machine IO Scheduler Configuration Each Linux instance has an IO scheduler. The intent of the IO scheduler is to optimize IO performance, potentially by clustering or sequencing requests to reduce the physical impact of IO. In a virtual world, however, the operating system is typically disassociated from the physical world through the hypervisor. As a result, it is recommended to alter the IO scheduler algorithm so that it is more efficient in a virtual deployment, with scheduling delegated to the hypervisor. The default scheduling algorithm is typically “cfq” (completely fair queuing). Alternative and recommended algorithms are “noop” and “deadline”. The “noop” algorithm, as expected, does as little as possible with a first in, first out queue. The “deadline” algorithm is more advanced, with priority queues and age as a scheduling consideration. System specific benchmarks should be used to determine which algorithm is superior for a given workload. In the absence of available benchmarks, we would recommend the “deadline” scheduler be used. The following console output shows how to display and modify the IO scheduler algorithm for a set of block devices. In the example, the “noop” scheduler algorithm is set. Note to ensure the scheduler configuration persists, it should be enforced via the operating system configuration (e.g. /etc/rc.local). Figure 21: Modifying the IO Scheduler 6.5 Advanced Configuration and Power Interface Management The Advanced Configuration and Power Interface (ACPI) operating system support may exhibit high virtual CPU utilization and offers limited value in virtual environments. It is recommended to disable ACPI on the ICO “managed from” nodes through the following steps. 1. Disabling “kacpid”. 30 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide To switch off the kernel ACPI daemon, edit “/etc/grub.conf” and append "acpi=off" to the kernel boot command line. For example: title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e9ae7-d540b32b1f35 initrd /boot/initramfs-2.6.32-431.el6.x86_64.img becomes: title Red Hat Enterprise Linux (2.6.32-431.el6.x86_64) root (hd0,0) kernel /boot/vmlinuz-2.6.32-431.el6.x86_64 ro root=UUID=e1131bc1-bdbc-4b2e-9ae7d540b32b1f35 acpi=off initrd /boot/initramfs-2.6.32-431.el6.x86_64.img 2. Disabling the user-space acpi daemon. To disable user space ACPI on managed-from nodes: chkconfig acpid off 3. Reboot the nodes. 6.6 Java Virtual Machine Heap Configuration The default Java Virtual Machine (JVM) heap sizes are intended to be economical. However, in the presence of sufficient available memory, it is recommended to increase the heap allocation. The three change sets below are recommended for application. They apply to Central Server 3 and, in particular, the IBM Workload Deployer instance. The IWD instance should be restarted once the changes are complete. Location: /opt/ibm/rainmaker/purescale.app/config/overrides.config Original: /config/zso/jvmargs = ["-Xms1024M","-Xmx1024M"] Recommended: /config/zso/jvmargs = ["-Xms1536M","-Xmx1536M"] Location: /etc/rc.d/init.d/iwd-utils Original: sed -i -e 's/3072M/1024M/g' $ZERO_DIR/config/overrides.config Recommended: sed -i -e 's/3072M/1536M/g' $ZERO_DIR/config/overrides.config Location: /opt/ibm/rainmaker/purescale.app/config/zero.config Original: "-Xms1024M","-Xmx1024M" Recommended: "-Xms1536M","-Xmx1536M" Figure 22: Java Virtual Machine Heap Change Sets 31 6.7 Database Configuration ICO is deployed with a DB2 database. The performance of the database is critical to the overall capability of the solution. The following database configuration changes are recommended for a base ICO 2.4 installation. Type Configuration Configuration For each relevant database set: STMT_CONC = LITERALS LOCKTIMEOUT = 60 NUM_IOCLEANERS = AUTOMATIC NUM_IOSERVERS = AUTOMATIC AUTO_REORG = ON For example: db2 UPDATE DB CFG FOR OPENSTAC USING LOCKTIMEOUT 60 Foreign Key Modification An OpenStack foreign key should be modified to enable cascading deletes. Please apply the “ICO_MODIFY_FKEY.sh” script provided with this paper. Figure 23: Database Configuration Change Sets 6.8 Database Management Generally speaking, the “out of the box “database configuration will achieve good results for both large and small installations. The following recommendations are primarily in the area of database maintenance. 6.8.1 DBMS Versions The following DBMS versions are recommended. All versions should be 64 bit. Version DB2 10.5 fp5 or later Notes The minimum recommended fixpack level is 10.5 fp3a. Figure 24: DBMS Versions 6.8.2 Automatic Maintenance DB2 offers a number of automatic maintenance options. Automatic statistics collection (aka runstats) is considered a basic and necessary configuration setting, and is enabled for the product by default. Two other recommended configuration settings follow. It is expected these configuration settings will be enabled by default in future versions of the products. 1. Real time statistics. The default runstats configuration generally collects statistics at two hour intervals. The real time statistics option provides far more granular statistics collection, essentially generating statistics as required at statement compilation time. 2. Automatic reorganization. Many customers ignore database reorganization and system performance starts to decline. This can be especially critical in the cloud space. The recommendation is to enable automatic reorganization support so it is self managed by the DBMS. Further discussion of database reorganization is covered in section 6.9.3. 32 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide The following commands may be used to enable these automatic maintenance options. At the time of this writing, they are conditionally recommended. Each of these options has runtime impact and should be monitored to ensure there is no unnecessary system impact. In order to facilitate this, they should only be enabled once the system has been established and monitored. In addition, automatic reorganization is dependent on the definition of a maintenance window (see the DB2 Knowledge Center for more detail). update db cfg for OPENSTAC using AUTO_STMT_STATS ON update db cfg for OPENSTAC using AUTO_REORG ON Figure 25: Database Automatic Maintenance Configuration 6.8.3 Operating System Configuration (Linux) The product installation guides have comprehensive instructions for Operating System prerequisites and configuration. However, on Linux systems improper configuration is common, so we will highlight specific issues. The first configuration point to check is the file system ulimit for the maximum number of open files allowed for a process (i.e. nofiles). The value for this kernel limit should be either “unlimited” or “65536”. The DB2 reference for this configuration setting is available here. In addition, the kernel semaphore and message queue specifications should be correct. These configuration settings are a function of the physical memory available on the machine. The DB2 reference for these configuration settings is available here. 6.9 Database Hygiene Overview The following steps will be described for database hygiene overview: 1. Database backup management. 2. Database statistics management. 3. Database reorganization. 4. Database archive management. 5. Database maintenance automation. Steps make reference to recommended scheduling frequencies. The general purpose “cron” scheduling utility may be used to achieve this. However, other scheduling utilities may also be used. The key aspect of a cron’ed activity is it is scheduled at regular intervals (e.g. nightly, weekly) and typically does not require operator intervention. Designated maintenance windows may be used for these activities. 6.9.1 Database Backup Management It is recommended that nightly database backups be taken. The following figures offer a sample database offline backup (utilizing compression), along with a sample restore. backup db <dbname> user <user> using <password> to <backup directory> compress Figure 26: Database Backup with Compression Command restore db <dbname> from <backup directory> taken at <timestamp> without prompting Figure 27: Database Offline Backup Restore Online backups may be utilized as well. The following figure provides commands that comprise a sample weekly schedule. With the given schedule, the best case scenario is a restore requiring one image to restore (Monday failure using the Sunday night backup). The worst case scenario would require four images (Sunday + Wednesday + Thursday + 33 Friday). An alternate approach would be to utilize a full incremental backup each night to make the worst case scenario two images. The tradeoffs for the backup approaches are the time to take the backup, the amount of disk space consumed, and the restore dependencies. A best practice can be to start with nightly full online backups, and introduce incremental backups if time becomes an issue. (Sun) (Mon) (Tue) (Wed) (Thu) (Fri) (Sat) backup backup backup backup backup backup backup db db db db db db db <dbname> <dbname> <dbname> <dbname> <dbname> <dbname> <dbname> online online online online online online online include logs use tsm incremental delta use incremental delta use incremental use tsm incremental delta use incremental delta use incremental use tsm tsm tsm tsm tsm Figure 28: Database Online Backup Schedule Note to enable incremental backups, the database configuration must be updated to track page modifications, and a full backup taken in order to establish a baseline. update db cfg for OPENSTAC using TRACKMOD YES Figure 29: Database Incremental Backup Enablement To restore the online backups, either a manual or automatic approach may be used. For the manual approach, you must start with the target image, and then revert to the oldest relevant backup and move forward to finish with the target image. A far simpler approach is to use the automatic option and let DB2 manage the images. A sample of each approach is provided below, showing the restore based on the Thursday backup. restore db <dbname> incremental use tsm taken at <Sunday full timestamp> restore db <dbname> incremental use tsm taken at <Wednesday incremental timestamp> restore db <dbname> incremental use tsm taken at <Thursday incremental delta timestamp> Figure 30: Database Online Backup Manual Restore restore db <dbname> incremental auto use tsm taken at <Thursday incremental delta timestamp> Figure 31: Database Online Backup Automatic Restore In order to support online backups, archive logging must be enabled. The next subsection provides information on archive logging, including the capability to restore to a specific point in time using a combination of database backups and archive logs. Database Log Archiving A basic approach we will advocate is archive logging with the capability to support online backups. The online backups themselves may be full, incremental (based on the last full backup), and incremental delta (based on the last incremental backup). In order to enable log archiving to a location on disk, the following command may be used. update db cfg for <dbname> using logarchmeth1 DISK:/path/logarchive Figure 32: Database Log Archiving to Disk 34 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Alternatively, in order to enable log archiving to TSM, the following command may be used1. update db cfg for <dbname> using logarchmeth1 TSM Figure 33: Database Log Archiving to TSM Note that a “logarchmeth2” configuration parameter also exists. If both of the log archive method parameters are set, each log file is archived twice (once per log archive method configuration setting). This will result in two copies of archived log files in two distinct locations (a useful feature based on the resiliency and availability of each archive location). Once the online backups and log archive(s) are in effect, the recovery of the database may be performed via a database restore followed by a roll forward through the logs. Several restore options have been previously described. Once the restore has been completed, roll forward recovery must be performed. The following are sample roll forward operations. rollforward <dbname> to end of logs Figure 34: Database Roll Forward Recovery: Sample A rollforward <dbname> to 2012-02-23-14.21.56 and stop Figure 35: Database Roll Forward Recovery: Sample B It is worth noting the second example recovers to a specific point in time. For a comprehensive description of the DB2 log archiving options, the DB2 Knowledge Center should be consulted (URL). A service window (i.e. stop the application) is typically required to enable log archiving. Database Backup Cleanup Unless specifically pruned, database backups may accumulate and cause issues with disk utilization or, potentially, a stream of failed backups. If unmonitored backups begin to fail, it may make disaster recovery near impossible in the event of a hardware or disk failure. A simple manual method to prune backups follows. find /backup/DB2 -mtime +7 | xargs rm Figure 36: Database Backup Cleanup Command A superior approach is to let DB2 automatically prune the backup history and delete your old backup images and log files. A sample configuration is provided below. update db cfg for OPENSTAC using AUTO_DEL_REC_OBJ ON update db cfg for OPENSTAC using NUM_DB_BACKUPS 21 update db cfg for OPENSTAC using REC_HIS_RETENTN 180 Figure 37: Database Backup Automatic Cleanup Configuration It is also generally recommended to have the backup storage independent from the database itself. This provides a level of isolation in the event volume issues arise (e.g. it ensures that a backup operation will not fill the volume hosting the tablespace containers, which could possibly lead to application failures). 1 The log archive methods (logarchmeth1, logarchmeth2) have the ability to associate configuration options with them (logarchopt1, logarchopt2) for further customization. 35 6.9.2 Database Statistics Management As discussed in the previous “Automatic Maintenance” section, database statistics ensure that the DBMS optimizer makes wise choices for database access plans. The DBMS is typically configured for automatic statistics management. However, it may often be wise to force statistics as part of a nightly or weekly database maintenance operation. A simple command to update statistics for all tables in a database is the “reorgchk” command. reorgchk update statistics on table all Figure 38: Database Statistics Collection Command One issue with the reorgchk command is it does not enable full control over statistics capturing options. For this reason, it may be beneficial to perform statistics updates on a table by table level. However, this can be a daunting task for a database with hundreds of tables. As a result, the following SQL statement may be used to generate administration commands on a table by table basis. select 'runstats on table ' || STRIP(tabschema) || '.' || tabname || ' with distribution and detailed indexes all;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); Figure 39: Database Statistics Collection Table Iterator 6.9.3 Database Reorganization Over time, the space associated with database tables and indexes may become fragmented. Reorganizing the table and indexes may reclaim space and lead to more efficient space utilization and query performance. In order to achieve this, the table reorganization command may be used. Note, as discussed in the previous “Automatic Maintenance” section, automatic database reorganization may be enabled to reduce the requirement for manual maintenance. The following commands are examples of running a “reorg” on a specific table and its associated indexes. Note the “reorgchk” command previously demonstrated will actually have a per table indicator of what tables require a reorg. Using the result of “reorgchk” per table reorganization may be achieved for optimal database space management and usage. reorg table <table name> allow no access reorg indexes all for table <table name> allow no access Figure 40: Database Reorganization Commands It is important to note there are many options and philosophies for doing database reorganization. Every enterprise must establish its own policies based on usage, space considerations, performance, etc. The above example is an offline reorg. However it is possible to also do an online reorg via the “allow read access” or “allow write access” options. The “notruncate” option may also be specified (indicating the table will not be truncated in order to free space). The “notruncate” option permits more relaxed locking and greater concurrency (which may be desirable if the space usage is small or will soon be reclaimed). If full online access during a reorg is required, the “allow write access” and “notruncate” options are both recommended. 36 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Note it is also possible to use our table iteration approach to do massive reorgs across hundreds of tables as shown in the following figure. The DB2 provided snapshot routines and views (e.g. SNAPDB, SNAP_GET_TAB_REORG) may be used to monitor the status of reorg operations. select 'reorg table ' || STRIP(tabschema) || '.' || tabname || ' allow no access;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); select 'reorg indexes all for table ' || STRIP(tabschema) || '.' || tabname || ' allow no access;' from SYSCAT.TABLES where tabschema in ('CIRnnnnn','GLEnnnnn','NOAnnnnn','SCEnnnnn','KSDB'); Figure 41: Database Reorganization Table Iterator 37 7 Summary Cookbook The following tables provide a cookbook for the solution implementation. The cookbook approach implies a set of steps the reader may “check off” as completed to provide a stepwise implementation of the ICO solution. The recommendations will be provided in three basic steps: 1. Base installation recommendations. 2. Post installation recommendations. 3. High scale recommendations. All recommendations are provided in tabular format. The preferred order of implementing the recommendations is in order from the first row of the table through to the last. 7.1 Base Installation Recommendations The base installation recommendations are considered essential to a properly functioning ICO instance. All steps should be implemented. Identifier B1 Description Perform the base ICO installation, ensuring the recommended configuration described in Section 5.2 is achieved. A central DB2 server should be used (i.e. the region servers should not manage a local DBMS unless there are compelling geographic considerations). Where possible it is recommended to install the DBMS on bare metal, or in a DBA managed pool, to facilitate performance management. B2 Enable the OpenStack Keystone worker support (Section 6.1). B3 Disable the IWD service, if possible (Section 6.2). B4 If the IWD service is required, optimize the IWD component (Section 6.3). B5 Configure the Linux IO scheduler (Section 6.4). B6 Disable the ACPI management (Section 6.5). B7 Ensure the Java heaps are optimized (Section 6.6). B8 Configure the central database (Section 6.7). Figure 42: Base Installation Recommendations 38 Status IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide 7.2 Post Installation Recommendations The post installation recommendations will provide additional throughput and superior functionality. All steps should be implemented. Identifier Description P1 Perform a set of infrastructure and ICO benchmarks to determine the viability of the installation (see Sections 4.2 and 4.3). P2 Implement the database statistics maintenance activity per Section 6.9.2. P3 Implement the database reorg maintenance activity per Section 6.9.3. P4 Implement a suitable backup and disaster recovery plan comprising regular backups of all critical server components (including the database and relevant file system objects). Guidelines are provided in the ICO Knowledge Center (URL). Status Figure 43: Post Installation Recommendations 7.3 High Scale Recommendations The high scale recommendations should be incorporated once the production installation wants to support the high water mark for scalability. All steps may be optionally implemented over time based upon workload. Identifier Description S1 Apply the latest ICO fixpack. S2 Monitor the performance of the installation (Section 4.1) and adjust the management server to the recommended installation values (Section 5.2) as appropriate. S3 Optimize Central Server 1 (DBMS) performance. A basic way to achieve this is to have dedicated, high performance storage allocated to the database containers and logs. Figure 44: High Scale Recommendations 39 Status APPENDIX A: IBM CLOUD ORCHESTRATOR MONITORING OPTIONS Monitoring is important to understand and ensure the health of any cloud solution. A number of monitoring approaches are available for ICO. The solutions are described via the following summary sections, broken down into three categories. 1. OpenStack monitoring via Ceilometer. 2. ICO monitoring via IBM BPM. 3. Infrastructure monitoring via IBM Tivoli Monitoring (ITM) and third party solutions. A separate appendix is provided that is specific to OpenStack Keystone monitoring. A.1 OpenStack Monitoring OpenStack monitoring is provided via the Ceilometer component. Ceilometer offers a comprehensive and customizable infrastructure, including support for event and threshold management. Note while Ceilometer is not enabled as part of the base ICO 2.4 distribution, it is a constituent of the OpenStack Grizzly base, with continued enhancement in subsequent OpenStack releases. Ceilometer provides three distinct types of metrics: 1. Cumulative: counters that accumulate or increase over time. 2. Gauge: counters that offer discrete, point in time values. 3. Delta: differential counters showing change rates. A vast array of metrics is provided by Ceilometer. An easy way to interactively derive the set of available metrics is to query Ceilometer directly (see the sample below). In addition, the Ceilometer documentation provides the default set, with associated attributes (URL). ceilometer meter-list -s openstack Figure 45: OpenStack Ceilometer Metrics 40 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide The following table provides a core set of recommended monitoring points for OpenStack. A broader set may of course be used. Component Meters Nova (Compute Node Management) cpu_util disk.read.requests.rate disk.write.requests.rate disk.read.bytes.rate disk.write.bytes.rate network.incoming.bytes.rate network.outgoing.bytes.rate network.incoming.packets.rate network.outgoing.packets.rate The following counters require enablement: compute.node.cpu.kernel.percent compute.node.cpu.idle.percent compute.node.cpu.user.percent compute.node.cpu.iowait.percent Neutron (Network Management) network.create network.update subnet.create subnet.update Glance (Image Management) image.update image.upload image.delete Cinder (Volume Management) volume.size Swift (Object Storage Management) storage.objects storage.objects.size storage.objects.containers storage.objects.incoming.bytes storage.objects.outgoing.bytes Heat (Orchestration) stack.create stack.update stack.delete stack.suspend stack.resume Figure 46: OpenStack Ceilometer Core Metrics In addition, Ceilometer provides a REST API that allows cloud administrators to record KPIs. For instance, infrastructure metrics could be placed in Ceilometer with a HTTP POST request. As Ceilometer includes a data store, as well as some basic statistical functionality, it is a candidate for an integration point for cloud monitoring data. A.2 IBM Cloud Orchestrator Monitoring ICO monitoring should be employed to address the solution layer “above” OpenStack. The primary mechanism for ICO monitoring is enablement of the BPM performance data 41 warehouse (relevant information available in the References section)2. The performance data warehouse may be enabled via “autotracking”, which will enable both custom KPIs as well as the default total time KPIs. The core KPIs to understand BPM capability are: BPM processes executed per second. Average service times per BPM process. It is important to note that given Ceilometer provides a general plugin and distribution infrastructure, it may be combined with the ICO monitoring solution. A sample approach for managing these monitoring points follows. 1. Derive a BPM plugin to retrieve raw times from the BPM performance data warehouse (PDWDB) database. The preferred method is the provided REST interface (versus direct database access). 2. Perform calculations based on the raw data. For example, converting a series of milestones into performance KPIs, or calculating statistical quantities (e.g. standard deviation, harmonic mean). 3. Push the results to Ceilometer as the meter distribution mechanism. 4. Read the results via the Ceilometer REST API and display in the visualization tool of your choice. A.3 Infrastructure Monitoring Infrastructure monitoring can address the operating system and hypervisor health of the cloud. Available tools include IBM Tivoli Monitoring (ITM) or the open source offering Nagios. For example, ITM v6.2 provides the follow infrastructure monitoring agents (for reference, see URL). 1. IBM Tivoli Monitoring Endpoint. 2. Linux OS. 3. UNIX Logs. 4. UNIX OS. 5. Windows OS. 6. i5/OS®. 7. IBM Tivoli Universal Agent. 8. Warehouse Proxy. 9. Summarization and Pruning. 10. IBM Tivoli Performance Analyzer. 2 It is worth noting that BPM is built on IBM WebSphere and as a result, WebSphere monitoring capabilities also apply. 42 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide Critical KPIs to monitor at the infrastructure level are summarized in the following table (VMware is provided as a representative hypervisor sample). Component Meters Operating System DBMS: ITM for DB2 (URL) Application Server: ITCAM Agent for WebSphere Applications (URL) J2EE: ITCAM Agent for J2EE (URL) HTTP: ITCAM Agent for HTTP Servers (URL) Hypervisor: ITM for Virtual Environments (URL) Hypervisor: VMware esxtop sample 43 CPU utilization including kernel, user, IO wait, and idle times. Disk utilization including read/write request and byte rates. Network utilization including incoming and outgoing packet and byte rates. Volume free space across the central and region servers. Special attention should be paid to the Virtual Image Library on Central Server 2 to ensure the “/home/library” space is well managed. Application IO activity workspace. Application lock activity workspace. Application overview workspace. Buffer Pool workspace. Connection workspace. Database workspace. Database Lock Activity workspace. Historical Summarized Capacity Weekly workspace. Historical Summarized Performance Weekly workspace. Locking Conflict workspace. Tablespace workspace. WebSphere Agent Summary workspace. Application Server Summary workspace. Application Health Summary workspace. Web Server Agent workspace. Server workspace. CPU workspace. Disk workspace. Memory workspace. Network workspace. Resource Pools workspace. Virtual Machines workspace. CPU: Run(%RUN), Wait (%WAIT), Ready (%RDY), Co-Stop (%CSTP). Figure 47: Infrastructure Core Metrics 44 Network: Dropped packets (%DRPTX, %DRPRX). IO: Latency (DAVG, KAVG), Queue length (QUED) Memory: Memory reclaim (MCTLSZ), Swap (SWCUR, SWR/s, SWW/s), IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide APPENDIX B: OPENSTACK KEYSTONE MONITORING The Keystone component is critical to overall performance of IBM Cloud Orchestrator. For example, if one component saturates Keystone, the overall throughput of the system will be impacted. This is magnified by the fact that Keystone has only a single execution thread instance. In order to understand Keystone performance, the best method is to look at the requests and responses via a proxy such as the IaaS Gateway. This provides the ability to see requests that are dropped before being processed by Keystone. We will describe an approach for monitoring Keystone via the PvRequestFilter. B.1 PvRequestFilter The PvRequestFilter was designed to output request and response data into the Keystone log. When enabled it prints the data as warning messages, so it is not necessary to turn up the default debug level to generate the log messages. The format of the messages is as follows. All fields except “<duration>” are printed out for both requests and responses. The duration of the request is printed only for the response. WARNING [REQUEST|RESPONSE] <millisecond timestamp to identify request> <REMOTE_ADDR>:<REMOTE_PORT> <REQUEST_METHOD> <RAW_PATH_INFO> [<duration>] Figure 48: Keystone Monitoring PvRequestFilter Format Sample output follows. 2014-07-21 17:16:56.509 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 2014-07-21 17:16:56.785 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.509 172.18.152.103:1278 GET /v3/users 0.276294 2014-07-21 17:16:56.807 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 2014-07-21 17:16:56.824 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.807 172.18.152.103:1278 GET /v3/domains 0.017691 2014-07-21 17:16:56.839 22811 WARNING keystone.contrib.pvt_filter.request [-] REQUEST 2014-07-21_17:16:56.839 172.18.152.103:1279 GET /v3/users/e92b94d7068843ef98d664521bd9c983/projects 2014-07-21 17:16:56.868 22811 WARNING keystone.contrib.pvt_filter.request [-] RESPONSE 2014-07-21_17:16:56.839 172.18.152.103:1279 GET /v3/users/e92b94d7068843ef98d664521bd9c983/projects 0.028558 Figure 49: Keystone Monitoring PvRequestFilter Sample Output B.2 Enabling PvRequestFilter The process to enable PvRequestFilter follows. 1. Log onto Central Server 2. 45 2. Extract the distribution provided with this paper ( keystoneStats.zap). 3. Install the filter and backup the existing configuration: ./deployKeystoneFilter.sh 4. Make the following changes to the “/etc/keystone/keystone.conf” file. Note: Reversing step 2 will disable the filter. a. Add the following lines just above line starting with " [filter:debug]". [filter:pvt] paste.filter_factory = keystone.contrib.pvt_filter.request:PvtRequestFilter.factory b. Add "pvt" to three of the pipeline statements: [pipeline:public_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension user_crud_extension pvt public_service [pipeline:admin_api] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension crud_extension pvt admin_service [pipeline:api_v3] pipeline = access_log sizelimit url_normalize token_auth admin_token_auth xml_body json_body simpletoken ec2_extension s3_extension pvt service_v3 c. Restart the keystone service. service openstack-keystone restart d. Validate that the “/var/log/keystone/keystone.log” is producing the appropriate log messages (sample below). e. Update the “hosts.table” file to reflect your environment. f. Run the workload or scenario for analysis. g. Generate the statistics for the request and response data in the “keystone.log” file (sample below): ./keystoneStats.sh /var/log/keystone/keystone.log > results Figure 50: Keystone Monitoring Log Messages Example Figure 51: Keystone Monitoring Statistics Example 46 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide REFERENCES IBM Cloud Orchestrator and Related Component References IBM Cloud Orchestration Knowledge Center ICO 2.4 Knowledge Center IBM Cloud Orchestrator Resource Center ICO Resource Center IBM Cloud Orchestrator Version 2.4: Security Hardening Guide http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO85 SmartCloud Orchestrator Version 2.3: Capacity Planning, Performance, and Management Guide http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7P SmartCloud Orchestrator Version 2.3: Security Hardening Guide http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7W IBM Cloud Orchestrator Version 2.3: Database Movement Cookbook http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO8T IBM SmartCloud Orchestrator: Offline-backup approach using Tivoli Storage Manager for Virtual Environments http://www.ibm.com/software/ismlibrary?NavCode=1TW10SO7Q IBM Business Process Manager V8.0 Performance Tuning and Best Practices http://www.redbooks.ibm.com/redpapers/pdfs/redp4935.pdf IBM Business Process Manager Performance Data Warehouse http://pic.dhe.ibm.com/infocenter/dmndhelp/v8r5m0/topic/com.ibm.wbpm.admin.doc/topics/ managing_performance_servers.html IBM Tivoli Monitoring Information Center http://publib.boulder.ibm.com/infocenter/tivihelp/v15r1/topic/com.ibm.itm.doc_6.2.3fp1/welc ome.htm IBM DB2 10.5 Knowledge Center DB2 10.5 Knowledge Center OpenStack References OpenStack Performance Presentation (Folsom, Havana, Grizzly) http://www.openstack.org/assets/presentation-media/openstackperformance-v4.pdf OpenStack Ceilometer http://docs.openstack.org/developer/ceilometer OpenStack Rally https://wiki.openstack.org/wiki/Rally 47 Hypervisor References Performance Best Practices for VMware vSphere™ 5.0 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.0.pdf Performance Best Practices for VMware vSphere™ 5.1 http://www.vmware.com/pdf/Perf_Best_Practices_vSphere5.1.pdf VMware: Troubleshooting virtual machine performance issues VMware Knowledge Base VMware: Performance Blog http://blogs.vmware.com/vsphere/performance Linux on System x: Tuning KVM for Performance KVM Performance Tuning Kernel Virtual Machine (KVM): Tuning KVM for performance http://pic.dhe.ibm.com/infocenter/lnxinfo/v3r0m0/topic/liaat/liaattuning_pdf.pdf PowerVM Virtualization Performance Advisor Developer Works PowerVM Performance IBM PowerVM Best Practices http://www.redbooks.ibm.com/redbooks/pdfs/sg248062.pdf Benchmark References Report on Cloud Computing to the OSG Steering Committee, SPEC Open Systems Group, https://www.spec.org/osgcloud/docs/osgcloudwgreport20120410.pdf 48 IBM Cloud Orchestrator Version 2.4: Capacity Planning, Performance, and Management Guide ® © Copyright IBM Corporation 2015 IBM United States of America Produced in the United States of America US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation North Castle Drive Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PAPER “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes may be made periodically to the information herein; these changes may be incorporated in subsequent versions of the paper. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this paper at any time without notice. Any references in this document to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing IBM Corporation 4205 South Miami Boulevard Research Triangle Park, NC 27709 U.S.A. All statements regarding IBM's future direction or intent are subject to change or withdrawal without notice, and represent goals and objectives only. This information is for planning purposes only. The information herein is subject to change before the products described become available. If you are viewing this information softcopy, the photographs and color illustrations may not appear. 49 Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or ™), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at "Copyright and trademark information" at http://www.ibm.com/legal/copytrade.shtml. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Other company, product, or service names may be trademarks or service marks of others. 50